fwiw because of the relatively few activated params offloading to system RAM is quite feasible, you can see the endless amount of people doing this on r/localllama with qwen3.6 35a3b
I... downloaded a 4-bit quantized GGUF of the model, used llama.cpp to run it, and pointed OpenCode at that. My machine is an 8-core Gen1 Ryzen 7, 32 GiB of DDR4, (I think) 4 GiB of VRAM on the graphics.
im more surprised that more people don’t treat their computer as disposable anyway.
that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.
to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern
i grew up wiping my machine every year anyway, so i guess it’s just a habit
i think it's about drawing a line between your "personal computer" and a software development machine. any digital-native is going to accumulate programs, configurations, and other bits and pieces that aren't trivial to migrate to a new machine.
Programs, configs and "other bits" are the trivial parts that no one should care about. It takes about 5min to go from fresh install to near-fully-configured.
Even the hardware itself doesn't matter that much, in the end it's all provided by your employer.
Leaking session tokens or secrets, on the other hand...
imo being digital native means that migrating to any machine should be basically trivial. working with the flow of the machines rather than customizing and ricing them because your a cool computer person or whatever
i just want my computer to work. any config i have on my machine can be rebuilt by just doing the work i need to do.
my primary work machine was stolen last year so i was forced to go through this quite literally with a new machine rather than hypothetically or by my own will
Wondering similar. It certainly can run beyond 30 seconds but at some point I believe the output should degrade
Plus you could do actual batch inference instead. Or if you must carry forward the context you could still do it linearly, but the mem usage shouldn’t just explode
I just don't have the bandwidth to run another project, maintaining Handy is hard enough on it's own, especially for free!
I didn't just dismiss for no reason, I am a human! I have needs and I can't just sleeplessly stay in front of the computer putting out code. If I had more time I would, but alas.
Someone could easily vibe code an iOS version in a few hours. I could do the same but I do not have time to support it.
I don’t think it’s about literally shrinking the models via quantization, but rather training smaller/more efficient models from scratch
Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down
I have said for a while that we need a sort of big-little-big model situation.
The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.
Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.
But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.
There are no practically useful small models, including Qwen 3.5. Yes, the small models of today are a lot more interesting than the small models of 2 years ago, but they remain broadly incoherent beyond demos and tinkering.
I don't think you can make that case for 35b and up, including the 27B dense model. A hypothetical Mac Studio with 512 GB and an M5 Ultra would be able to run the full Qwen 3.5 397B model at a decent speed, which is more like 12 months behind the current SoTA.
A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.
Yes, but bigger models are still more capable. Models shrinking (iso-performance) just means that people will train and use more capable models with a longer context.
I really don't get this comment section. You get a Macbook then you have a perfectly usable machine which will run all the mainstream software you ask of it, and then you get natively compiled well supported developer tooling, no VM required. The best argument for Chromebooks is that you can throw away ChromeOS and install Linux or use Linux in a VM. These are not even close to the same.
I think folks want to hate Apple more than they want to admit that Chromebooks kinda suck.
That wasn’t the point. You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac. There’s always some other computer you can buy, but most people will just buy the Mac
> You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac
I do software development, video + image editing, writing and gaming. My requirements are it runs well, I can depend on it and I don't mind if it has a fan.
I only replied because the OP's comment made it seem like it's difficult to find a good laptop in the $600 range. If macOS is optional you can get quite decent specs.
reply