I'm honestly tempted to buy an M4 Apple Silicon computer mainly for their ability to run local LLM models with unified

Ranter

jonathands

405

Comments

10

djsumdog

6887

44d

What models do you plan on running? I've found all the local coding models that can run on a 3080-Ti to be pretty terrible. Are the larger local models better than Claude4 or GPT?

I find the chatbots utterly annoying and worthless. I hate how their bullshit is now crammed into search results on DDG/Google.

What would you use local LLMs for?
3

afaIk

166

44d

What's the point if you can run them on the cloud?
2

jonathands

405

44d

@djsumdog mostly I want to try the coding models, but would be general lab stuff really,

a friend showed qwen coder running on a 24gb macbook pro I was impressed with the result

My main hangup with going nvidia is the setup would cost more for less total LLM usable ram, for the price of a single 5090 32gb , I can buy an M4 mini pro with 64gb unified ram (at least here in the jungle)

I understand it's not a replacement for claude/codex tho
4

jonathands

405

44d

@afaIk two things mostly, having the ability to play and learn locally is a great plus since it's an (very needed) upgrade

and depending on what I'm doing cloud services get incredibly expensive for me due to exchange rates, yes the computer would be expensive, but it's there no matter what.

Still this is a good question, maybe testing some 30b models on groq to test the cost before making any purchase

I'll probably wait till M5 gets released tough
2

retoor

1312

44d

@djsumdog yeah, sure and fine that you have issues with it, but blame the developers (or their companies they're working for), not the technology. The technology is freaking amazing in the good hands. I may hope that you're seeing that at least.
2

retoor

1312

44d

If the AI is a real goal, it is an interesting bet.

Because, the AI you are used too, is absolutely not the AI on a local machine. The local AI's on default graphic cards can answer a question for sure, but real automation takes more than that. Also, the cost of the local machine is several thousand euro's. Well, do you have any idea how much AI you could buy from that? I am a mayor spender on AI. Sometimes because i ran a loopii in the background or was just inefficient. The costs I made are more on me than the supplier. Also, such light model qwen, maybe he wins according to some statistic chart, but I actually doubt if it would even beat gpt-4o-mini or gpt-4.1-nano. I'm not kidding. Benchmarks do say in AI world completely nothing, especially not automating with it if that is goal.

Your comment remaining claude / codex, that's not just model. It's a very good algorithm behind it in pure code. At this moment, i destroyed roocode with my own build system using light model.
3

retoor

1312

44d

But what AI makes slow, uhmz, that is as you know, the context length. And if the context length is not important or keeps small it can be a good choice I guess, but it will take a decade before it will be a good financial choice in comparison to the cloud based solutions. I personally moved everyone to Ollama and ended up vouching for the cloud solutions. Even in a financial matter. But be aware, there is a difference between OpenAI and OpenRouter for example. On OpenAI is blast requests of concurrent using a gather(). Meaning, i do 20 requests at once regularely. Grok is fine with, OpenAI is fine with it. But besides that, NO ONE You'll be rate limited, on any other provider. And groq, the one most famous about their inference, is literally the worst in delivering up to their promise. Fast inference, but no concurrency. What is the point ffs.

But please explain what you actually want with the AI. Your subject is very important to discuss.

Automation takes small context norm.
3

retoor

1312

44d

@afaIk uhm, owning something yourself? I mean, someone is making profit on it, so it should mean you can do it yourself better and cheaper. But, in case of AI, that is very not true at the moment. But yeah, if I find a way to drop the cloud i would do it immediately. I also do not do AWS or whatever and configure my dedicated servers myself and have never unexpected costs or whatever weird shit. It's mine and that's comfortable.

So many reasons not to choose the cloud, but at this moment, we're tight into it.

On the subject btw: models are getting lighter and lighter. One of my favourite is Gemma3:12b to automate. That's amazing. A year ago, that was impossible with such model.

But if you code well, automate certain stuff yourself, you do not need really heavy models.

Currently playing with grok-code-fast-1 and that is by far the best bang for the buck I have ever seen. No Opus for me. But the new one from Claude (uhmz,) Sonnet 4.5 is also amazing. Sine then no Opus.
2

retoor

1312

44d

@jonathands In general, if you have nothing better to waste money on, just do the computer. Why not. I can't look in your wallet. I personally work on a x270 and work on AI 24/7 and I've spent around 800 this year. I would not say that is that much. I just ran:

```

Total tokens: 327761

Total AI call time: 691.89s

Total elapsed time: 706.76s

```

Fetched financial data from perplexity:

If all tokens were output tokens

327,761 tokens × ($1.50 / 1,000,000) = $0.49

If all tokens were input tokens

327,761 tokens × ($0.20 / 1,000,000) = $0.07

Typical blended scenario (3:1 input/output ratio)

Using the industry-standard 3:1 ratio:

Input: 245,821 tokens × ($0.20 / 1,000,000) = $0.05

Output: 81,940 tokens × ($1.50 / 1,000,000) = $0.12

Total: $0.17

So, to give you an idea, i spend literally that much time on full time AI calling and it costed me 17 cents in Blended scenario (out/in balance) and around 0.50 in worst scenario.
4

12bitfloat

11033

44d

@jonathands Mac mini pro M4 with 64 GB is $2500, RTX 5090 is ~$2200

I'd personally keep your current pc and upgrade just the GPU

Then you have better support too! Most serious stuff is in CUDA only (e.g. if you want to train some weird model or whatever)
3

afaIk

166

44d

@retoor x270, that's a nice machine, I got a x23O
3

jonathands

405

44d

@retoor Right now I'm thinking mostly of using it for Coding with tools like SST/OpenCode and Cline with models medium/smaller models like QwenCoder 30b and DevMistral etc

Maybe using local models as a testbed for future LLM enabled applications before going online.

imagine a local AI enabled development machine.

Also there is the economics/politics of it, while the hardware is (obscenelly) expensive out here, it's something you own.

While I don't think I'll get rid of APIs/Subscriptions I don't have any hope they will become cheaper, so it's nice to have options.
3

jonathands

405

44d

@12bitfloat also, for some reason specifically here the m4 is slighty cheaper than the RTX 5090,

I have to keep to local vendor due to warranty (super important for me) and taxes (which are many and probably why the RTX is so expensive here)
3

jonathands

405

44d

@12bitfloat upgrading is not an option, my pc is just too old, I'll have to upgrade the whole thing
2

jonathands

405

44d

@retoor that's the kind of data I don't know how to get for my use case.
2

retoor

1312

44d

@afaIk there is a chance that I buy it again, but recently I also have seen system86 machine that could be the only correct replacement. I am looking for months for different machines and i just do not like them. But this one seems to be very interesting: https://system76.com/laptops/...

The x270, I just have to admit that by now I also notice myself that the screen is a bit old. The HD is a bit small and don't want to fuck with that shit myself anymore and so on. I mainly live in the terminal and do a lot on server because I use vim anyway. And for just running terminals, it's like a monster beast :P There is no upside to upgrade enough for now. But this one: https://system76.com/laptops/.... What a cuty.

Because of my low end laptop, I decided to make my own roocode and my own version is actually now better than the roocode itself. It completely works in terminal. It builds software, and runs playwright browser for you to show it works. Sick huh.
2

retoor

1312

44d

A working MVP of Reddit, built by my system costs around 60 cents and is covered with playwright fucking tests. It is extremely efficiently made. My older versions costed often 60 cents per minute :P I think I've discovered a golden hammer, also for continues improvement of this system. I don't work anymore, also don't want to. But this system is like wow, would be sad if I won't make it commercial :P But I have all the time in the world and will just work on it, improve it. These days, I do not believe that new-comers enter the business still. So, why even bother. Fucking marketing world.
2

BordedDev

3131

44d

@djsumdog Yes, the models improve exponentially when they are that small, you can run the 480b qwen-coder model if you have enough vram/ram (24+Gb VRAM + 100+Gb RAM) and it can write pretty good code. For the isspam challenge, it even created a good performing implementation. But for the cost it can be a lot cheaper to just rent a cloud GPU for the 10 seconds it needs to generate (if you can fit it in the cloud GPU vram) over buying it outright, depending on usage of course)
2

afaIk

166

44d

@retoor The specs are sick! my x230 is a bit too old on the specs too, anyway nice rugged machine.
1

jonathands

405

44d

@BordedDev which providers are you guys using?

something like openrouter/groq or baremetal?
1

BordedDev

3131

44d

@jonathands I just run LLMs locally, mostly because I have so little time to experiment and I have the resources (3090 + 128gb ram), which often means I let it run while I'm doing other things. I'm tempted by the same as you, but I know it's kinda wasting money because of that limited time. But I have used runpod before. But for the big AIs you're not going to be let to run them yourself, I'm mainly playing around with STT (voxtral) and TTS

AFAIK @retoor uses both openrouter and groq
1

gitstashio

318

42d

All right, here goes:
- when you say you are running an LLM locally like when running ollama run qwencoder:8b, does that mean no sort of data is collected by AliBaba. Does that also mean if I unplug the internet from my laptop forever I would have a stable qwencoder model and no other data would ever be needed by the model?
So unlike the breaking changes introduced by ChatGPT when they make you update the model, if I were to use my qwencoder to code something for me I would never have any breaking changes?
- if yes, does that mean theroetically I "trained" my local LLM on all the questions and answers I fed it, and if I lose my Macbook I have to "start over"
- yes as retoor said, I use an M1 macbook to answer questions, and only ask it to write simple code
- why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally
1

BordedDev

3131

42d

@gitstashio When you run it locally, yes it will work without internet. You can even register your own functionality, e.g. to read files from disk.

Your AI will not learn anything "new" without additional tooling. As in, if you restart llama.cpp or whatever you use, it will not remember. You can "inject" additional knowledge by either fine-tuning (this essentially further training the AI) or doing stuff with RAGs and pre-inserting data before prompting. Where you store these additional knowledge is up to you, it doesn't need to be with the model file, RAGs will get requested and added in context so they can live on a separate computer (think of it like an API for data the AI can use)
2

jonathands

405

42d

@gitstashio > why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally

because it's easier and more lucrative to create desperation and fear, then actually teaching stuff
1

jonathands

405

42d

@gitstashio You can run several open weights models, equivalent of opensource for LLMS, on your own hardware

OpenAI released GPT OSS, there's Qwen Coder like you mentioned, Gemma, llama from meta etc

The models themselves are stuck with the knowledge they had during training, some models can do RAG (I don't know about the specific OW ones) and there are people who do fine tuning including lora in local models , Which requires some pretty hefty , I'm not sure apple silicon is well suited for this

My interest is most on consuming and prompt engineering, apple silicon seems to do great with those.
1

jonathands

405

41d

I'm testing stuff on OpenRouter with Cliene and Roocode (Roo is just amazing) despite Qwen coder 30b being able to generate complex nextjs projects, it's not very good at making them work.

I had it create a svelte project and then 3 prompts later decide svelte was not really required (despite my prompts) and recreate everything in Pure HTML/JS

All an all , I think I'm stuck with cloud providers at least for now , as I was aiming for a 48Gb machine and it feels that it's not up to the task
1

gitstashio

318

41d

@jonathands we need to hang out. whats your github?
1

BordedDev

3131

41d

@jonathands Yeah svelte, as much as I like it, doesn't have the same market share as reshat (react). But with AI I've found it easier to just work with templates like jinja no need for a full framework (because it always seems to edit styles.css XD)
0

BordedDev

3131

19d

@jonathands A friend of mine tried the svelte for ai in their docs and said it helped immensely

Add Comment

question