Hacker Newsnew | past | comments | ask | show | jobs | submit | txtsd's commentslogin

So I can use this in claude code with `ollama run claude`?


Thank you, I had no idea ollama was so shady! I will start using llama.cpp directly.

More like `ollama launch claude --model qwen3.6:latest`

Also you need to check your context size, Ollama default to 4K if <24 Gb of VRAM and you need 64K minimum if you want claude to be able to at least lift a finger.


If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`.

I was comparing various models at M5 Pro 48GB RAM MLX vs GGUF and found that MLX models have a higher time to first token (sometimes by an order of magnitude) while tokens/sec and memory usage is same as GGUF.

Gemma 3 27B q4:

* MLX: 16.7 t/s, 1220ms ttft

* GGUF: 16.4 t/s, 760ms ttft

Gemma 4 31B q8:

* MLX: 8.3 t/s, 25000ms ttft

* GGUF: 8.4 t/s, 1140ms ttft

Gemma 4 A4B q8:

* MLX: 52 t/s, 1790ms ttft

* GGUF: 51 t/s, 380ms ttft

All comparisons done in LM Studio, all versions of everything are the latest.


I only have 16GB VRAM, and my system uses ~4GB from that. What are my options? I got this one: `Qwen3.6-35B-A3B-UD-IQ2_XXS.gguf`

My system has 16 Gb VRAM / 32 Gb RAM, and ollama runs qwen3.6:latest at decent speed just fine. The 35b model is a moe, so I guess the whole model is offloaded.

have you found a model that does this with usable speeds on an M2/M3?

On a M4 MBP ollama's qwen3.5:35b-a3b-coding-nvfp4 runs incredibly fast when in the claude/codex harness. M2/M3 should be similar.

It's incomparably faster than any other model (i.e. it's actually usable without cope). Caching makes a huge difference.


I share your concerns. kind is not the ideal solution. All my communities are on discord though, so it is the practical solution until a replacement for discord completely does away with the need to use it in the first place.


Voice will be implemented once all the text features are up to my standards. You can check it out then. No pressure right now.


Eventually!


Thank you for your input.

Everything you listed will be worked on after initial release. It's in the works.

Nitro features will be for Nitro users only, except things that non Nitro users can see anyway.


I haven't pushed the code yet. I was waiting until the quality is sufficient per my standards. It should be pushed within 2 hours from now.

EDIT: Code is live.


Are you not giving it enough information to work with? All of these issues you and the parent comment mentioned can be worked around by telling it HOW to do things.


The whole shtick of LLMs is that it can do stuff without telling it explicitly. Not sure why people are blamed because they are using it based on that expectation....


Yes, it can. So can I. But neither of us will write the code exactly the way nitpicky PR reviewer #2 demands it be written unless he makes his preferences clear somewhere. Even at a nitpick-hellhole like Google that's mostly codified into a massive number of readability rules, which can be found and followed in theory. Elsewhere, most reviewer preferences are just individual quirks that you have to pick up on over time, and that's the kind of stuff that neither new employees nor Claude will ever possibly be able to get right in a one-shot manner.


Sure, but that is not what the OP talks about.


There is an unconstrained number of ways it can write code and still not be how I want it. Sometimes it's easier to write the correction against the code that is already generated since now you at least have a reference to something there than describing code that doesn't yet exist. I don't think it's solvable in general until they have the neuralink skill that senses my approval as it materializes each token and autocorrects to the golden path based on whether I'm making a happy or frowny face.


Stop thinking like a programmer and start thinking like a business person. Invest time and energy in thinking about WHAT you want; let the LLM worry about the HOW.


The thing is that the HOW of today becomes the context of someone else's tomorrow session, that person may not be as knowledgeable about that particular part of the codebase (and the domain), their LLM will base its own solution on today's unchecked output and will, inevitably, stray a little bit further from the optimum. So far I haven't seen any mechanism and workflow that would consistently push in the opposite direction.


>let the LLM worry about the HOW.

You mean, let the LLM hallucinate about the HOW...


you can tell it how to do things, but sometimes it still goes out on its own, I have some variant of "do not deviate from the plan" and yet sometimes if you look while it's coding it will "ah, this is too hard as per the plan, let me take this shortcut" or "this previous test fails, but it's not an issue with my code I just wrote, so let's just 'fix' the test"

For simple scripts and simple self contained problems fully agenting in yolo mostly works, but as soon as it's an existing codebase or plans get more complex I find I have to handhold claude a lot more and if I leave it to its own devices I find things later. I have found also that having it update the plan with what it did AND afterwards review the plan it will find deviations still in the codebase.

Like the other day I had in the plan to refactor something due to data model changes, specifying very clearly this was an intentional breaking change (greenfield project under development), and it left behind all the existing code to preserve backwards compatibility, and actually it had many code contortions to make that happen, so much so I had to redo the whole thing.

Sometimes it does feel that Anthropic turns up/down the intelligence (I always run opus in high reasoning) but sometimes it seems it's just the nature of things, it is not deterministic, and sometimes it will just go off and do what it thinks it's best whether or not you prompt it not to (if you ask it later why it did that it will apologize with some variation of well it made sense at the time)


Technically that's true, but unless you literally write every single line of code, the LLM will find a way to smuggle in some weirdness. Usually it isn't that bad, but it definitely requires quite a lot of attention.


There is a point where telling it how to do stuff is comparable/more effort to just doing it yourself.


While I can't argue against the risk, all these third party clients continue to be allowed to exist.

Ripcord and discordo were my favorites amongst the alternatives.

kind is the missing Qt FOSS alternative.


Aside from the risk, I don't understand why you would put your time into it. They can change their stance any minute and your work is just down the drain. I guess you could do it for learning and experience? idk...

> kind is the missing Qt FOSS alternative.

Nah. What's missing is an alternative to discord itself, with enough pull behind it.


I can't argue with that. kind is the practical solution, not the ideal one.


Instead of making it work with Discord, built it around xmpp with as many of the same niceties that discord offers and get yourself a working alternative on an open, powerful protocol, fully FOSS.

People want to get away from closed, centralised applications. The sentiment is the strongest it's ever been. But they don't have compelling and working enough alternatives to do so.


>While I can't argue against the risk, all these third party clients continue to be allowed to exist.

Reddit allowed it for a while too until they smelt sweet IPO money.

All it takes is some revenue generating idea that the third party client doesn't support and it's curtains.


Thanks, I'm using Qt Widgets to make this though.

And I made a website for it shortly after I made this post:

https://kind.ihavea.quest


Thank you! I've bookmarked your website!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: