Hacker Newsnew | past | comments | ask | show | jobs | submit | sipjca's commentslogin

fwiw because of the relatively few activated params offloading to system RAM is quite feasible, you can see the endless amount of people doing this on r/localllama with qwen3.6 35a3b

I ran Gemma4 26B A4B on an 8yo PC with a fucking GTX and it did rather well.

Well, that's pretty impressive. Care to share your setup to do that? How much DDR3/DDR4 do you have, too?

I... downloaded a 4-bit quantized GGUF of the model, used llama.cpp to run it, and pointed OpenCode at that. My machine is an 8-core Gen1 Ryzen 7, 32 GiB of DDR4, (I think) 4 GiB of VRAM on the graphics.

im more surprised that more people don’t treat their computer as disposable anyway.

that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.

to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern

i grew up wiping my machine every year anyway, so i guess it’s just a habit

is the computer that sacred?


Computers are disposable, secrets is what we’re talking about. Rotating passwords and tokens is a major PITA on the best of days.

fair enough, i guess minimizing that surface area is important to begin with

i think it's about drawing a line between your "personal computer" and a software development machine. any digital-native is going to accumulate programs, configurations, and other bits and pieces that aren't trivial to migrate to a new machine.

Programs, configs and "other bits" are the trivial parts that no one should care about. It takes about 5min to go from fresh install to near-fully-configured.

Even the hardware itself doesn't matter that much, in the end it's all provided by your employer.

Leaking session tokens or secrets, on the other hand...


imo being digital native means that migrating to any machine should be basically trivial. working with the flow of the machines rather than customizing and ricing them because your a cool computer person or whatever

i just want my computer to work. any config i have on my machine can be rebuilt by just doing the work i need to do.

my primary work machine was stolen last year so i was forced to go through this quite literally with a new machine rather than hypothetically or by my own will


Sounds like a case for NixOS

inference code is effectively trivial to port at this time

everyone understands cuda well enough anyway


thats incredible


Wondering similar. It certainly can run beyond 30 seconds but at some point I believe the output should degrade

Plus you could do actual batch inference instead. Or if you must carry forward the context you could still do it linearly, but the mem usage shouldn’t just explode


I just don't have the bandwidth to run another project, maintaining Handy is hard enough on it's own, especially for free!

I didn't just dismiss for no reason, I am a human! I have needs and I can't just sleeplessly stay in front of the computer putting out code. If I had more time I would, but alas.

Someone could easily vibe code an iOS version in a few hours. I could do the same but I do not have time to support it.


Thank you for your work, I highly appreciate it!


Thank you!!


I don’t think it’s about literally shrinking the models via quantization, but rather training smaller/more efficient models from scratch

Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down


I have said for a while that we need a sort of big-little-big model situation.

The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.

Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.

But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.


There are no practically useful small models, including Qwen 3.5. Yes, the small models of today are a lot more interesting than the small models of 2 years ago, but they remain broadly incoherent beyond demos and tinkering.


I don't think you can make that case for 35b and up, including the 27B dense model. A hypothetical Mac Studio with 512 GB and an M5 Ultra would be able to run the full Qwen 3.5 397B model at a decent speed, which is more like 12 months behind the current SoTA.

A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.


Yes, but bigger models are still more capable. Models shrinking (iso-performance) just means that people will train and use more capable models with a longer context.


Of course they are! Both are important and will be around and used for different reasons


This is an argument, but it’s also fundamentally comparing a computer that works out of the box to one that doesn’t.


I really don't get this comment section. You get a Macbook then you have a perfectly usable machine which will run all the mainstream software you ask of it, and then you get natively compiled well supported developer tooling, no VM required. The best argument for Chromebooks is that you can throw away ChromeOS and install Linux or use Linux in a VM. These are not even close to the same.

I think folks want to hate Apple more than they want to admit that Chromebooks kinda suck.


Yes


You literally just compared a laptop running arch to a mac. You’re not the target audience lmao


The laptop ships with Windows 11 but its parts are compatible with Linux too.

That is an important note though, the price includes a valid Windows 11 license.


That wasn’t the point. You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac. There’s always some other computer you can buy, but most people will just buy the Mac


> You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac

I do software development, video + image editing, writing and gaming. My requirements are it runs well, I can depend on it and I don't mind if it has a fan.

I only replied because the OP's comment made it seem like it's difficult to find a good laptop in the $600 range. If macOS is optional you can get quite decent specs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: