Is that even possible with medial grade wrist devices? Apple Watches can perform it only during sleep which makes sense. It seems like a difficult problem to solve without a chest strap, or just measuring during sleep.
The only other alternative I can think of is a screen strap (some companies make those screenless ones, Polar, Whoop) around the bicep, as it’s relatively close to the shoulder and chest areas which gently move with our breath.
Garmin measures "photoplethysmography-derived respiration" (using the optical HR sensor). Error rates are under 1 breath per minute during sleep or at rest but rises during exercise, up to 4 bpm above the lactate threshold.
Impedance pneumography is more consistently accurate, but requires a chest (not bicep) strap.
I fear this is only the start of it. A minimum of 3-4 constellations more will probably be launched in the near future (Russia, China, EU).
Their obvious dual-use nature makes them tempting, and a military target if a large conflict will take place in the near future. I hope their lower orbit will help any space junk burn up fast.
Add a black umbrella to each satellite: when they pass through the critical region where they are visible in the night sky while still being sunlit, pop the brollies up. We will fly them in the shade!
You could paint them black but they’d probably get quite hot.
Won't the shade then reflect the light instead? It's nighttime, so sunlight will be aimed up, from the Earth-based observer's point of view, so the shade will need to be pointed down in order to shade the satellite.
If you blow up a satellite, half of it will end up going slower and half will go faster. The slower bits will probably burn up nicely, but the faster bits will just elevate their orbit.
I doubt they will elevate their orbit by enough to be a problem. Some bits will come down in hours, some will come down in a year - even in the worst case where it takes out everything in low earth orbit in 5 years everything will be clear and we can start over. Higher orbits are the real worry, even the things slowed down mostly stay in orbit for centuries - but higher orbits are mostly a lot higher.
"LEO" is a big place, those satellites collided ~1.5x higher than e.g. the maximum Starlink altitude and the debris lifetime relationship is not a linear one.
Yup! Smaller quants will fit within 24GB but they might sacrifice context length.
I’m excited to try out the MLX version to see if 32GB of memory from a Pro M-series Mac can get some acceptable tok/s with longer context. HuggingFace has uploaded some MLX versions already.
I have an Mini M4 Pro with 64GB of 273GB/s memory bandwidth and it's borderline with 3.5-27B. I assume this one is the same. I don't know a ton, but I think it's the memory bandwidth that limits it. It's similar on a DGX Spark I have access to (almost the same memory bandwidth).
It's been a while since I tried it, but I think I was getting around 12-15 tokens per second an that feels slow when you're used to the big commercial models. Whenever I actually want to do stuff with the open source models, I always find myself falling back to OpenRouter.
I tried Intel/Qwen3.6-35B-A3B-int4-AutoRound on a DGX Spark a couple days ago and that felt usable speed wise. I don't know about quality, but that's like running a 3B parameter model. 27B is a lot slower.
I'm not sure if I "get" the local AI stuff everyone is selling. I love the idea of it, but what's the point of 128GB of shared memory on a DGX Spark if I can only run a 20-30GB model before the slow speed makes it unusable?
Tbf the Sparks usefulness isn’t for inference IMO. Its memory bandwidth is too low for that.
But on the other hand, running Qwen 3.5 122B A10B locally on it using ~110GB of memory and getting 50tk/s generation and quite excellent prefill… I couldn’t do that on many other machines at this price point
For me this has been awesome to learn CUDA on, fine tuning models (until I get it close to what I want then it’s off to H100 or something clusters) and a bit of inference on the side
Friendly reminder: wait a couple weeks to judge the ”final” quality of these free models. Many of them suffer from hidden bugs when connected to an inference backend or bad configs that slow them down. The dev community usually takes a week or two to find the most glaring issues. Some of them may require patches to tools like llama.cpp, and some require users to avoid specific default options.
Gemma 4 had some issues that were ironed out within a week or two. This model is likely no different. Take initial impressions with a grain of salt.
This is probably less likely with this model, as it’s almost certainly a further RL training continuation of 3.5 27b. The bugs with this architecture were worked out when that dropped.
The bugs come from the downstream implementations and quantizations (which inherit bugs in the tools).
Expect to update your tools and redownload the quants multiple times over 2-4 weeks. There is a mad rush to be first to release quants and first to submit PRs to the popular tools, but the output is often not tested much before uploading.
If you experiment with these on launch week, you are the tester. :)
For at least a year now, it has been clear that data quality and fine-tuning are the main sources of improvement for mediym-level models. Size != quality for specialized, narrow use cases such as coding.
It’s not a surprise that models are leapfrogging each other when the engineers are able to incorporate better code examples and reasoning traces, which in turn bring higher quality outputs.
If all you're looking at is benchmarks that might be true, but those are way too easy to game. Try using this model alongside Opus for some work in Rust/C++ and it'll be night and day. You really can't compare a model that's got trillions of parameters to a 27B one.
I often do need in-depth general knowledge in my coding model so that I don't have to explain domain specific logic to it every time and so that it can have some sense of good UX.
Shot in the dark, but has your actual stove changed? When have you last changed the stones? Is the circulation of air worse?
If your skin feels hot my guess would be that the steaming effect might be disrupted by the water getting evaporated faster than before, and the circulation of air also affects the skin feel (that’s why a certain seating position can make sauna unbearable). You could also try to just turn it on at the lowest setting and see if it changes anything. Maybe the stones have gotten so old that old heat settings have sneakily turned unbearable.
I wonder if they will finally let you use past chats without having to turn on the data sharing, since it’s possible to store chat context on disk. (No chance).
It wasn't even the local-ness so much. Even if they stored at remotely it would be okay like ChatGPT or Claude but unlike the others for a long time the only way to let it store history on their servers was also allowing them to train on it. I haven't checked if it's changed.
Cool implementation. Never occurred to me Jellyfin could serve as a streaming platform on its own! I’ll probably find the answer after sending this reply (may be helpful to others), but does it come with this functionality out of the box, or is any plug-in needed?
How probable is it that this kind of method can be patched or obfuscated further? I assume that since the HLS stream is always at the core, it’s a matter of just finding alternative ways to dig through it.
Any quirks this implementation has wrt. things like quality or additional delay? Thanks! I’d like to try out if your methods could be used to make some sort of snippet that could be sent to VLC that’s running on a TV or streaming device.
Adjacent, but back when Plex supported plugins all your plugin had to do was eventually give it a link to a video / stream and bam, you could watch the content on any device. I built a fairly popular plugin around the idea of deduplicating tv / movie listings and letting people watch now or direct download.
The wait is finally over. One or two iterations, and I’ll be happy to say that language models are more than fulfilling my most common needs when self-hosting. Thanks to the Gemma team!
Strongly agree. Gemma3:27b and Qwen3-vl:30b-a3b are among my favorite local LLMs and handle the vast majority of translation, classification, and categorization work that I throw at them.
I'm using the default llama-server that is part of Gerganov's LLM inference system running on a headless machine with an nVidia 16GB GPU, but Ollama's a bit easier to ease into since they have a preset model library.
I would be inclined to agree with this except that my "most common needs" keeps expanding and increasing in difficulty each year. In 2023 and 2024, most of my needs were asking models simple questions and getting a response. They were a drop-in replacement for Stack Overflow. I think the best open source models today that I can run on my laptop serve that need.
Now that coding agents are a thing my frame of reference has shifted to where I now consider a model that can be that my most common need. And unfortunately open models today cannot do that reliably. They might, like you said, be able to in a year or two, but by then the cloud models will have a new capability that I will come to regard as a basic necessity for doing software development.
All that said this looks like a great release and I'm looking forward to playing around with it.
Not OP but one example is that recent VL models are more than sufficient for analyzing your local photo albums/images for creating metadata / descriptions / captions to help better organize your library.
The easiest way to get started is probably to use something like Ollama and use the `qwen3-vl:8b` 4‑bit quantized model [1].
It's a good balance between accuracy and memory, though in my experience, it's slower than older model architectures such as Llava. Just be aware Qwen-VL tends to be a bit verbose [2], and you can’t really control that reliably with token limits - it'll just cut off abruptly. You can ask it to be more concise but it can be hit or miss.
What I often end up doing and I admit it's a bit ridiculous is letting Qwen-VL generate its full detailed output, and then passing that to a different LLM to summarize.
For me, receipt scanning and tagging documents and parts of speech in my personal notes. It's a lot of manual labour and I'd like to automate it if possible.
I use local models for auto complete in simple coding tasks, cli auto complete, formatter, grammarly replacement, translation (it/de/fr -> en), ocr, simple web research, dataset tagging, file sorting, email sorting, validating configs or creating boilerplates of well known tools and much more basically anything that I would have used the old mini models of OpenAI for.
reply