Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
djsumdog666514hWhat models do you plan on running? I've found all the local coding models that can run on a 3080-Ti to be pretty terrible. Are the larger local models better than Claude4 or GPT?
I find the chatbots utterly annoying and worthless. I hate how their bullshit is now crammed into search results on DDG/Google.
What would you use local LLMs for? -
jonathands35913h@djsumdog mostly I want to try the coding models, but would be general lab stuff really,
a friend showed qwen coder running on a 24gb macbook pro I was impressed with the result
My main hangup with going nvidia is the setup would cost more for less total LLM usable ram, for the price of a single 5090 32gb , I can buy an M4 mini pro with 64gb unified ram (at least here in the jungle)
I understand it's not a replacement for claude/codex tho -
jonathands35913h@afaIk two things mostly, having the ability to play and learn locally is a great plus since it's an (very needed) upgrade
and depending on what I'm doing cloud services get incredibly expensive for me due to exchange rates, yes the computer would be expensive, but it's there no matter what.
Still this is a good question, maybe testing some 30b models on groq to test the cost before making any purchase
I'll probably wait till M5 gets released tough -
retoor70713h@djsumdog yeah, sure and fine that you have issues with it, but blame the developers (or their companies they're working for), not the technology. The technology is freaking amazing in the good hands. I may hope that you're seeing that at least.
-
retoor70713hIf the AI is a real goal, it is an interesting bet.
Because, the AI you are used too, is absolutely not the AI on a local machine. The local AI's on default graphic cards can answer a question for sure, but real automation takes more than that. Also, the cost of the local machine is several thousand euro's. Well, do you have any idea how much AI you could buy from that? I am a mayor spender on AI. Sometimes because i ran a loopii in the background or was just inefficient. The costs I made are more on me than the supplier. Also, such light model qwen, maybe he wins according to some statistic chart, but I actually doubt if it would even beat gpt-4o-mini or gpt-4.1-nano. I'm not kidding. Benchmarks do say in AI world completely nothing, especially not automating with it if that is goal.
Your comment remaining claude / codex, that's not just model. It's a very good algorithm behind it in pure code. At this moment, i destroyed roocode with my own build system using light model. -
retoor70713hBut what AI makes slow, uhmz, that is as you know, the context length. And if the context length is not important or keeps small it can be a good choice I guess, but it will take a decade before it will be a good financial choice in comparison to the cloud based solutions. I personally moved everyone to Ollama and ended up vouching for the cloud solutions. Even in a financial matter. But be aware, there is a difference between OpenAI and OpenRouter for example. On OpenAI is blast requests of concurrent using a gather(). Meaning, i do 20 requests at once regularely. Grok is fine with, OpenAI is fine with it. But besides that, NO ONE You'll be rate limited, on any other provider. And groq, the one most famous about their inference, is literally the worst in delivering up to their promise. Fast inference, but no concurrency. What is the point ffs.
But please explain what you actually want with the AI. Your subject is very important to discuss.
Automation takes small context norm. -
retoor70713h@afaIk uhm, owning something yourself? I mean, someone is making profit on it, so it should mean you can do it yourself better and cheaper. But, in case of AI, that is very not true at the moment. But yeah, if I find a way to drop the cloud i would do it immediately. I also do not do AWS or whatever and configure my dedicated servers myself and have never unexpected costs or whatever weird shit. It's mine and that's comfortable.
So many reasons not to choose the cloud, but at this moment, we're tight into it.
On the subject btw: models are getting lighter and lighter. One of my favourite is Gemma3:12b to automate. That's amazing. A year ago, that was impossible with such model.
But if you code well, automate certain stuff yourself, you do not need really heavy models.
Currently playing with grok-code-fast-1 and that is by far the best bang for the buck I have ever seen. No Opus for me. But the new one from Claude (uhmz,) Sonnet 4.5 is also amazing. Sine then no Opus. -
retoor70713h@jonathands In general, if you have nothing better to waste money on, just do the computer. Why not. I can't look in your wallet. I personally work on a x270 and work on AI 24/7 and I've spent around 800 this year. I would not say that is that much. I just ran:
```
Total tokens: 327761
Total AI call time: 691.89s
Total elapsed time: 706.76s
```
Fetched financial data from perplexity:
If all tokens were output tokens
327,761 tokens × ($1.50 / 1,000,000) = $0.49
If all tokens were input tokens
327,761 tokens × ($0.20 / 1,000,000) = $0.07
Typical blended scenario (3:1 input/output ratio)
Using the industry-standard 3:1 ratio:
Input: 245,821 tokens × ($0.20 / 1,000,000) = $0.05
Output: 81,940 tokens × ($1.50 / 1,000,000) = $0.12
Total: $0.17
So, to give you an idea, i spend literally that much time on full time AI calling and it costed me 17 cents in Blended scenario (out/in balance) and around 0.50 in worst scenario. -
12bitfloat1073612h@jonathands Mac mini pro M4 with 64 GB is $2500, RTX 5090 is ~$2200
I'd personally keep your current pc and upgrade just the GPU
Then you have better support too! Most serious stuff is in CUDA only (e.g. if you want to train some weird model or whatever) -
jonathands35912h@retoor Right now I'm thinking mostly of using it for Coding with tools like SST/OpenCode and Cline with models medium/smaller models like QwenCoder 30b and DevMistral etc
Maybe using local models as a testbed for future LLM enabled applications before going online.
imagine a local AI enabled development machine.
Also there is the economics/politics of it, while the hardware is (obscenelly) expensive out here, it's something you own.
While I don't think I'll get rid of APIs/Subscriptions I don't have any hope they will become cheaper, so it's nice to have options. -
jonathands35912h@12bitfloat also, for some reason specifically here the m4 is slighty cheaper than the RTX 5090,
I have to keep to local vendor due to warranty (super important for me) and taxes (which are many and probably why the RTX is so expensive here) -
jonathands35912h@12bitfloat upgrading is not an option, my pc is just too old, I'll have to upgrade the whole thing
-
retoor70711h@afaIk there is a chance that I buy it again, but recently I also have seen system86 machine that could be the only correct replacement. I am looking for months for different machines and i just do not like them. But this one seems to be very interesting: https://system76.com/laptops/...
The x270, I just have to admit that by now I also notice myself that the screen is a bit old. The HD is a bit small and don't want to fuck with that shit myself anymore and so on. I mainly live in the terminal and do a lot on server because I use vim anyway. And for just running terminals, it's like a monster beast :P There is no upside to upgrade enough for now. But this one: https://system76.com/laptops/.... What a cuty.
Because of my low end laptop, I decided to make my own roocode and my own version is actually now better than the roocode itself. It completely works in terminal. It builds software, and runs playwright browser for you to show it works. Sick huh. -
retoor70710hA working MVP of Reddit, built by my system costs around 60 cents and is covered with playwright fucking tests. It is extremely efficiently made. My older versions costed often 60 cents per minute :P I think I've discovered a golden hammer, also for continues improvement of this system. I don't work anymore, also don't want to. But this system is like wow, would be sad if I won't make it commercial :P But I have all the time in the world and will just work on it, improve it. These days, I do not believe that new-comers enter the business still. So, why even bother. Fucking marketing world.
-
BordedDev249210h@djsumdog Yes, the models improve exponentially when they are that small, you can run the 480b qwen-coder model if you have enough vram/ram (24+Gb VRAM + 100+Gb RAM) and it can write pretty good code. For the isspam challenge, it even created a good performing implementation. But for the cost it can be a lot cheaper to just rent a cloud GPU for the 10 seconds it needs to generate (if you can fit it in the cloud GPU vram) over buying it outright, depending on usage of course)
-
-
@jonathands I just run LLMs locally, mostly because I have so little time to experiment and I have the resources (3090 + 128gb ram), which often means I let it run while I'm doing other things. I'm tempted by the same as you, but I know it's kinda wasting money because of that limited time. But I have used runpod before. But for the big AIs you're not going to be let to run them yourself, I'm mainly playing around with STT (voxtral) and TTS
AFAIK @retoor uses both openrouter and groq
I'm honestly tempted to buy an M4 Apple Silicon computer mainly for their ability to run local LLM models with unified ram.
overall I think they are too expensive for the offering, but being able to play around will LLMs without shelling out RTX5090 kinds of money is tipping the balance.
I wonder what apple people experiencies have been?
question