More

originalvichy · 2026-04-28T09:41:44 1777369304

Cool!

originalvichy · 2026-04-28T08:32:49 1777365169

Is that even possible with medial grade wrist devices? Apple Watches can perform it only during sleep which makes sense. It seems like a difficult problem to solve without a chest strap, or just measuring during sleep.

The only other alternative I can think of is a screen strap (some companies make those screenless ones, Polar, Whoop) around the bicep, as it’s relatively close to the shoulder and chest areas which gently move with our breath.

roryirvine · 2026-04-28T10:15:07 1777371307

Garmin measures "photoplethysmography-derived respiration" (using the optical HR sensor). Error rates are under 1 breath per minute during sleep or at rest but rises during exercise, up to 4 bpm above the lactate threshold.

Impedance pneumography is more consistently accurate, but requires a chest (not bicep) strap.

originalvichy · 2026-04-28T08:27:41 1777364861

I fear this is only the start of it. A minimum of 3-4 constellations more will probably be launched in the near future (Russia, China, EU).

Their obvious dual-use nature makes them tempting, and a military target if a large conflict will take place in the near future. I hope their lower orbit will help any space junk burn up fast.

gorgoiler · 2026-04-28T12:50:38 1777380638

Add a black umbrella to each satellite: when they pass through the critical region where they are visible in the night sky while still being sunlit, pop the brollies up. We will fly them in the shade!

You could paint them black but they’d probably get quite hot.

fluoridation · 2026-04-28T13:38:55 1777383535

Won't the shade then reflect the light instead? It's nighttime, so sunlight will be aimed up, from the Earth-based observer's point of view, so the shade will need to be pointed down in order to shade the satellite.

mark-r · 2026-04-28T12:10:24 1777378224

If you blow up a satellite, half of it will end up going slower and half will go faster. The slower bits will probably burn up nicely, but the faster bits will just elevate their orbit.

bluGill · 2026-04-28T12:41:05 1777380065

I doubt they will elevate their orbit by enough to be a problem. Some bits will come down in hours, some will come down in a year - even in the worst case where it takes out everything in low earth orbit in 5 years everything will be clear and we can start over. Higher orbits are the real worry, even the things slowed down mostly stay in orbit for centuries - but higher orbits are mostly a lot higher.

moffkalast · 2026-04-28T13:58:31 1777384711

Don't be so sure: https://stuffin-space.vader.zone/

The Iridium-Kosmos collision fragments have been up there since 2009, and that's a massive spray of junk just from one disintegration in LEO.

zamadatix · 2026-04-28T22:31:48 1777415508

"LEO" is a big place, those satellites collided ~1.5x higher than e.g. the maximum Starlink altitude and the debris lifetime relationship is not a linear one.

zamadatix · 2026-04-28T13:30:37 1777383037

It'll elongate the orbit, which is a bit better of a scenario than elevating it wholesale.

marcosdumay · 2026-04-28T14:41:42 1777387302

In either case, you will increase the section area / volume ratio, what makes the fragments fall from LEO faster.

originalvichy · 2026-04-22T15:24:58 1776871498

Yup! Smaller quants will fit within 24GB but they might sacrifice context length.

I’m excited to try out the MLX version to see if 32GB of memory from a Pro M-series Mac can get some acceptable tok/s with longer context. HuggingFace has uploaded some MLX versions already.

donmcronald · 2026-04-22T16:02:32 1776873752

I have an Mini M4 Pro with 64GB of 273GB/s memory bandwidth and it's borderline with 3.5-27B. I assume this one is the same. I don't know a ton, but I think it's the memory bandwidth that limits it. It's similar on a DGX Spark I have access to (almost the same memory bandwidth).

It's been a while since I tried it, but I think I was getting around 12-15 tokens per second an that feels slow when you're used to the big commercial models. Whenever I actually want to do stuff with the open source models, I always find myself falling back to OpenRouter.

I tried Intel/Qwen3.6-35B-A3B-int4-AutoRound on a DGX Spark a couple days ago and that felt usable speed wise. I don't know about quality, but that's like running a 3B parameter model. 27B is a lot slower.

I'm not sure if I "get" the local AI stuff everyone is selling. I love the idea of it, but what's the point of 128GB of shared memory on a DGX Spark if I can only run a 20-30GB model before the slow speed makes it unusable?

verdverm · 2026-04-23T00:12:39 1776903159

There are a number of DGX benchmarks for these recent gemma-4 / qwen-3.6 models on the nvidia forum, ex: https://forums.developer.nvidia.com/t/qwen-qwen3-6-35b-a3b-a...

girvo · 2026-04-23T13:40:20 1776951620

Tbf the Sparks usefulness isn’t for inference IMO. Its memory bandwidth is too low for that.

But on the other hand, running Qwen 3.5 122B A10B locally on it using ~110GB of memory and getting 50tk/s generation and quite excellent prefill… I couldn’t do that on many other machines at this price point

For me this has been awesome to learn CUDA on, fine tuning models (until I get it close to what I want then it’s off to H100 or something clusters) and a bit of inference on the side

ycui1986 · 2026-04-22T15:39:48 1776872388

32GB RAM on mac also need to host OS, software, and other stuff. There may not even be 24GB VRAM left for the model.

originalvichy · 2026-04-22T15:19:33 1776871173

Good news!

Friendly reminder: wait a couple weeks to judge the ”final” quality of these free models. Many of them suffer from hidden bugs when connected to an inference backend or bad configs that slow them down. The dev community usually takes a week or two to find the most glaring issues. Some of them may require patches to tools like llama.cpp, and some require users to avoid specific default options.

Gemma 4 had some issues that were ironed out within a week or two. This model is likely no different. Take initial impressions with a grain of salt.

jjcm · 2026-04-22T15:42:53 1776872573

This is probably less likely with this model, as it’s almost certainly a further RL training continuation of 3.5 27b. The bugs with this architecture were worked out when that dropped.

originalvichy · 2026-04-22T15:51:54 1776873114

Valuable note!

Aurornis · 2026-04-22T15:41:52 1776872512

Good advice for all new LLM experimenters.

The bugs come from the downstream implementations and quantizations (which inherit bugs in the tools).

Expect to update your tools and redownload the quants multiple times over 2-4 weeks. There is a mad rush to be first to release quants and first to submit PRs to the popular tools, but the output is often not tested much before uploading.

If you experiment with these on launch week, you are the tester. :)

originalvichy · 2026-04-22T15:13:52 1776870832

For at least a year now, it has been clear that data quality and fine-tuning are the main sources of improvement for mediym-level models. Size != quality for specialized, narrow use cases such as coding.

It’s not a surprise that models are leapfrogging each other when the engineers are able to incorporate better code examples and reasoning traces, which in turn bring higher quality outputs.

cbg0 · 2026-04-22T16:11:53 1776874313

If all you're looking at is benchmarks that might be true, but those are way too easy to game. Try using this model alongside Opus for some work in Rust/C++ and it'll be night and day. You really can't compare a model that's got trillions of parameters to a 27B one.

otabdeveloper4 · 2026-04-22T17:09:15 1776877755

> ...and it'll be night and day.

That's just, like, your opinion, man.

> You really can't compare a model that's got trillions of parameters to a 27B one.

Parameter count doesn't matter much when coding. You don't need in-depth general knowledge or multilingual support in a coding model.

cbg0 · 2026-04-22T17:29:48 1776878988

I often do need in-depth general knowledge in my coding model so that I don't have to explain domain specific logic to it every time and so that it can have some sense of good UX.

originalvichy · 2026-04-20T20:16:30 1776716190

Shot in the dark, but has your actual stove changed? When have you last changed the stones? Is the circulation of air worse?

If your skin feels hot my guess would be that the steaming effect might be disrupted by the water getting evaporated faster than before, and the circulation of air also affects the skin feel (that’s why a certain seating position can make sauna unbearable). You could also try to just turn it on at the lowest setting and see if it changes anything. Maybe the stones have gotten so old that old heat settings have sneakily turned unbearable.

originalvichy · 2026-04-15T21:02:45 1776286965

I wonder if they will finally let you use past chats without having to turn on the data sharing, since it’s possible to store chat context on disk. (No chance).

anon373839 · 2026-04-15T23:31:01 1776295861

It was always possible to store it in the browser’s localStorage, so…

tekacs · 2026-04-15T23:44:52 1776296692

It wasn't even the local-ness so much. Even if they stored at remotely it would be okay like ChatGPT or Claude but unlike the others for a long time the only way to let it store history on their servers was also allowing them to train on it. I haven't checked if it's changed.

teekert · 2026-04-16T10:24:15 1776335055

Google does not let you store your home and work address without storing everywhere you go on Google Maps. So indeed, no chance.

charcircuit · 2026-04-16T04:57:48 1776315468

It still would need to go to the cloud so you can see the chat from another device logged in to the same user.

aldarisbm · 2026-04-16T05:17:19 1776316639

it's not the best but I have a whole other account just for gemini. It's valuable to keep that context.

originalvichy · 2026-04-08T14:43:06 1775659386

Cool implementation. Never occurred to me Jellyfin could serve as a streaming platform on its own! I’ll probably find the answer after sending this reply (may be helpful to others), but does it come with this functionality out of the box, or is any plug-in needed?

How probable is it that this kind of method can be patched or obfuscated further? I assume that since the HLS stream is always at the core, it’s a matter of just finding alternative ways to dig through it.

Any quirks this implementation has wrt. things like quality or additional delay? Thanks! I’d like to try out if your methods could be used to make some sort of snippet that could be sent to VLC that’s running on a TV or streaming device.

c-hendricks · 2026-04-08T20:05:08 1775678708

Adjacent, but back when Plex supported plugins all your plugin had to do was eventually give it a link to a video / stream and bam, you could watch the content on any device. I built a fairly popular plugin around the idea of deduplicating tv / movie listings and letting people watch now or direct download.

originalvichy · 2026-04-02T16:36:57 1775147817

The wait is finally over. One or two iterations, and I’ll be happy to say that language models are more than fulfilling my most common needs when self-hosting. Thanks to the Gemma team!

vunderba · 2026-04-02T16:54:31 1775148871

Strongly agree. Gemma3:27b and Qwen3-vl:30b-a3b are among my favorite local LLMs and handle the vast majority of translation, classification, and categorization work that I throw at them.

curioussquirrel · 2026-04-04T07:31:24 1775287884

Give Gemma 31B a shot for translation, it does a very good job at that given its size.

misiti3780 · 2026-04-02T20:17:16 1775161036

what HW are you running them on ? are you using OLLAMA ?

vunderba · 2026-04-02T20:37:55 1775162275

I'm using the default llama-server that is part of Gerganov's LLM inference system running on a headless machine with an nVidia 16GB GPU, but Ollama's a bit easier to ease into since they have a preset model library.

https://github.com/ggml-org/llama.cpp

kolja005 · 2026-04-02T23:43:51 1775173431

I would be inclined to agree with this except that my "most common needs" keeps expanding and increasing in difficulty each year. In 2023 and 2024, most of my needs were asking models simple questions and getting a response. They were a drop-in replacement for Stack Overflow. I think the best open source models today that I can run on my laptop serve that need.

Now that coding agents are a thing my frame of reference has shifted to where I now consider a model that can be that my most common need. And unfortunately open models today cannot do that reliably. They might, like you said, be able to in a year or two, but by then the cloud models will have a new capability that I will come to regard as a basic necessity for doing software development.

All that said this looks like a great release and I'm looking forward to playing around with it.

adamtaylor_13 · 2026-04-02T16:45:33 1775148333

What sort of tasks are you using self-hosting for? Just curious as I've been watching the scene but not experimenting with self-hosting.

vunderba · 2026-04-02T16:57:23 1775149043

Not OP but one example is that recent VL models are more than sufficient for analyzing your local photo albums/images for creating metadata / descriptions / captions to help better organize your library.

kejaed · 2026-04-02T17:04:27 1775149467

Any pointers on some local VLMs to start with?

vunderba · 2026-04-02T17:13:04 1775149984

The easiest way to get started is probably to use something like Ollama and use the `qwen3-vl:8b` 4‑bit quantized model [1].

It's a good balance between accuracy and memory, though in my experience, it's slower than older model architectures such as Llava. Just be aware Qwen-VL tends to be a bit verbose [2], and you can’t really control that reliably with token limits - it'll just cut off abruptly. You can ask it to be more concise but it can be hit or miss.

What I often end up doing and I admit it's a bit ridiculous is letting Qwen-VL generate its full detailed output, and then passing that to a different LLM to summarize.

- [1] https://ollama.com/library/qwen3-vl:8b

- [2] https://mordenstar.com/other/vlm-xkcd

canyon289 · 2026-04-02T17:09:02 1775149742

You could try Gemma4 :D

ktimespi · 2026-04-02T17:17:04 1775150224

For me, receipt scanning and tagging documents and parts of speech in my personal notes. It's a lot of manual labour and I'd like to automate it if possible.

ezst · 2026-04-02T19:18:16 1775157496

Have you tried paperless-ngx, a true and tested open source solution that's been filling this niche successfully for decades now?

codethief · 2026-04-02T22:47:34 1775170054

They, too, offer integrations for LLMs these days, presumably for better OCR and classification.

mentalgear · 2026-04-02T17:32:13 1775151133

Adding to the Q: Any good small open-source model with a high correctness of reading/extracting Tables and/of PDFs with more uncommon layouts.

mh- · 2026-04-03T01:10:23 1775178623

I haven't tried it yet, but I bookmarked this recently: https://github.com/opendataloader-project/opendataloader-pdf

mentalgear · 2026-04-06T12:54:45 1775480085

Thank you, looks great!

BoredPositron · 2026-04-02T16:59:45 1775149185

I use local models for auto complete in simple coding tasks, cli auto complete, formatter, grammarly replacement, translation (it/de/fr -> en), ocr, simple web research, dataset tagging, file sorting, email sorting, validating configs or creating boilerplates of well known tools and much more basically anything that I would have used the old mini models of OpenAI for.

irishcoffee · 2026-04-02T16:47:57 1775148477

I would personally be much more interested in using LLMs if I didn’t need to depend on an internet connection and spending money on tokens.

dakolli · 2026-04-03T02:40:03 1775184003

[flagged]

originalvichy · 2026-04-03T07:02:44 1775199764

Take a walk outside.

dakolli · 2026-04-04T17:51:18 1775325078

I do more than most, that's why I'm not saying stuff like "The wait is finally over, just two more iterations"