Yes. That is absolutely the case. One of the
Most popular handguns does not have a safety switch that must be toggled before firing. (Glock series handguns)
If someone performs a negligent discharge, they are responsible, not Glock. It does have other safety mechanisms to prevent accidental fires not resulting from a trigger pull.
Just wanted to comment on the fact that I remember seeing that comment, and it left such an impression I remember it 7 years later.
Thanks for the reminder, going to bookmark it this time.
Epyc Genoa CPU/Mobo + 700GB of DDR5 ram.
The model is a MoE, so you don't need to stuff it all into VRAM, you can use a single 3090/5090 to hold the activated weights, and hold the remaining weights in DDR5 ram. Can see their deployment guide for reference here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...
We're using llama.cpp. We use all kinds of different models other than Qwen3, and vLLM startup when switching models is prohibitively slow (several times slower than llama.cpp, which is already 5 sec)
From what I understand, vLLM is best when there's only 1 active model pinned to the GPU and you have many concurrent users (4, 8 etc.). But with just a single 32 GB GPU you have to switch the models pretty often, and you can't fit more than 2 concurrent users anyway (without sacrificing the context length considerably: 4 users = just 16k context, 8 users = 8k context), so I think vLLM so far isn't worth it. Once we have several cards, we may switch to vLLM.
Do you think you could expand on that? Like how you might imagine an ideal workflow to go? Would there be like a sea of tags that you could wade through, or just an '/all', only items you specifically subscribe to + your connections subscriptions according to some ranking algo? Items would 'fall off' or out of view(?) after X time/X amount of browsing a differently weighted topic?
I ask because being honest, you're a big inspiration for myself, and inadvertently, adding an LLM-curated RSS feed reader as a planned feature to a project I'm working on. (I saw https://github.com/karpathy/LLM101n when I was getting interested in LLMs, and then got inspired by your project to start to attempt to build something like the primer from the diamond age.
Where that leads is that I see an RSS feed reader + curation via self-described or identified interests as being a 'core' piece of information gathering for the 'future' individual and have had it on the to-do list as a feature-add for my project.
I first built a POC in gradio and am now rebuilding it as a FastAPI app. The media processing endpoints work but I’m still tweaking media ingestion to allow for syncing to clients(idea is to allow for client-first design).
The GitHub doesn’t show any of the recent changes, but if you check back in 2-3 weeks, I think I’ll have the API version pushed to the main branch.
If someone performs a negligent discharge, they are responsible, not Glock. It does have other safety mechanisms to prevent accidental fires not resulting from a trigger pull.