More

oceansweep · 2025-12-07T15:42:08 1765122128

Yes. That is absolutely the case. One of the Most popular handguns does not have a safety switch that must be toggled before firing. (Glock series handguns)

If someone performs a negligent discharge, they are responsible, not Glock. It does have other safety mechanisms to prevent accidental fires not resulting from a trigger pull.

agentultra · 2025-12-07T16:50:57 1765126257

You seem to be getting hung up on the details of guns and missing the point that it’s a bad analogy.

Another way LLMs are not guns: you don’t need a giant data centre owned by a mega corp to use your gun.

Can’t do science because GlockGPT is down? Too bad I guess. Let’s go watch the paint dry.

The reason I made it is because this is inherently how we designed LLMs. They will make bad citations and people need to be careful.

oceansweep · 2025-12-03T18:26:27 1764786387

https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o...

oceansweep · 2025-11-29T01:33:14 1764379994

Just wanted to comment on the fact that I remember seeing that comment, and it left such an impression I remember it 7 years later. Thanks for the reminder, going to bookmark it this time.

oceansweep · 2025-11-07T01:38:20 1762479500

Epyc Genoa CPU/Mobo + 700GB of DDR5 ram. The model is a MoE, so you don't need to stuff it all into VRAM, you can use a single 3090/5090 to hold the activated weights, and hold the remaining weights in DDR5 ram. Can see their deployment guide for reference here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

oceansweep · 2025-10-20T16:06:20 1760976380

Mean time between failures

oceansweep · 2025-07-06T18:14:34 1751825674

Are you doing this with vLLM? If you're using Llama.cpp/Ollama, you could likely see some pretty massive improvements.

kgeist · 2025-07-06T19:27:24 1751830044

We're using llama.cpp. We use all kinds of different models other than Qwen3, and vLLM startup when switching models is prohibitively slow (several times slower than llama.cpp, which is already 5 sec)

From what I understand, vLLM is best when there's only 1 active model pinned to the GPU and you have many concurrent users (4, 8 etc.). But with just a single 32 GB GPU you have to switch the models pretty often, and you can't fit more than 2 concurrent users anyway (without sacrificing the context length considerably: 4 users = just 16k context, 8 users = 8k context), so I think vLLM so far isn't worth it. Once we have several cards, we may switch to vLLM.

oceansweep · 2025-04-26T07:16:36 1745651796

Do you think you could expand on that? Like how you might imagine an ideal workflow to go? Would there be like a sea of tags that you could wade through, or just an '/all', only items you specifically subscribe to + your connections subscriptions according to some ranking algo? Items would 'fall off' or out of view(?) after X time/X amount of browsing a differently weighted topic?

I ask because being honest, you're a big inspiration for myself, and inadvertently, adding an LLM-curated RSS feed reader as a planned feature to a project I'm working on. (I saw https://github.com/karpathy/LLM101n when I was getting interested in LLMs, and then got inspired by your project to start to attempt to build something like the primer from the diamond age.

Where that leads is that I see an RSS feed reader + curation via self-described or identified interests as being a 'core' piece of information gathering for the 'future' individual and have had it on the to-do list as a feature-add for my project.

oceansweep · 2025-04-23T23:07:31 1745449651

Hey yes, I’m building exactly that.

https://github.com/rmusser01/tldw

I first built a POC in gradio and am now rebuilding it as a FastAPI app. The media processing endpoints work but I’m still tweaking media ingestion to allow for syncing to clients(idea is to allow for client-first design). The GitHub doesn’t show any of the recent changes, but if you check back in 2-3 weeks, I think I’ll have the API version pushed to the main branch.

oceansweep · 2025-02-16T19:00:28 1739732428

You could try my app https://github.com/rmusser01/tldw

Supports arbitrary length videos and also lets you choose what LLM API to use.

oceansweep · 2025-02-05T02:55:12 1738724112

Not the person you asked, but it's dependent on what you're trying to chunk. I've written a standalone chunking library for an app I'm building: https://github.com/rmusser01/tldw/blob/main/App_Function_Lib...

It's setup so that you can perform whatever type of chunking you might prefer.