More

sanxiyn · 2026-01-27T06:53:38 1769496818

Not really. Due to combinatorial explosion some path is hard to hit randomly in this kind of source code. I would have preferred if after 2M random battles the reference implementation had 99% code coverage, than 99% pass rate.

I don't know anything about Pokemon, but I briefly looked at the code. "weather" seemed like a self contained thing I could potentially understand. Looking at https://github.com/vjeux/pokemon-showdown-rs/blob/master/src...

> NOTE: ignoringAbility() and abilityState.ending not fully implemented

So it is almost certain even after 99.96% pass rate, it didn't hit battle with weather suppressing Pokemon but with ability ignored. Code coverage driven testing loop would have found and fixed this one easily.

Herring · 2026-01-27T15:35:07 1769528107

Good catch. I should really look at the code before commenting on it.

sanxiyn · 2026-01-15T01:50:17 1768441817

Yes, but in principle it isn't that different from running on Trainium or Inferentia (it's a matter of degree), and plenty of non-AI organizations adopted Trainium/Inferentia.

sanxiyn · 2026-01-15T01:44:39 1768441479

What's new is HN discovered it. It wasn't posted in September 2025.

tedk-42 · 2026-01-15T04:13:38 1768450418

100%

People forget this is also a place of discussion and the comment section is usually peak value as opposed to the article itself.

sanxiyn · 2026-01-15T01:36:46 1768441006

Looking at their blog, they in fact ran gpt-oss-120b: https://furiosa.ai/blog/serving-gpt-oss-120b-at-5-8-ms-tpot-...

I think Llama 3 focus mostly reflects demand. It may be hard to believe, but many people aren't even aware gpt-oss exists.

reactordev · 2026-01-15T02:07:52 1768442872

Many are aware, just can’t offload it onto their hardware.

The 8B models are easier to run on an RTX to compare it to local inference. What llama does on an RTX 5080 at 40t/s, Furiosa should do at 40,000t/s or whatever… it’s an easy way to have a flat comparison across all the different hardware llama.cpp runs on.

nl · 2026-01-15T02:22:02 1768443722

> we demonstrated running gpt-oss-120b on two RNGD chips [snip] at 5.8 ms per output token

That's 86 token/second/chip

By comparison, a H100 will do 2390 token/second/GPU

Am I comparing the wrong things somehow?

[1] https://inferencemax.semianalysis.com/

sanxiyn · 2026-01-15T02:59:14 1768445954

I think you are comparing latency with throughput. You can't take the inverse of latency to get throughput because concurrency is unknown. But then, RNGD result is probably with concurrency=1.

binary132 · 2026-01-15T02:28:13 1768444093

I thought they were saying it was more efficient, as in tokens per watt. I didn’t see a direct comparison on that metric but maybe I didn’t look well enough.

nl · 2026-01-15T02:50:47 1768445447

Probably. Companies sell on efficiency when they know they lose on performance.

tormeh · 2026-01-15T07:28:03 1768462083

If you have an efficient chip you can just have more of them and come out ahead. This isn't a CPU where single core performance is all that important.

fennecfoxy · 2026-01-15T10:14:35 1768472075

Only if the price is right...

avereveard · 2026-01-15T08:50:17 1768467017

Eh if there's a human on the other side single stream performance is going to matter to them.

binary132 · 2026-01-15T05:31:20 1768455080

Right, but datacenters also very much operate on electrical cost so it’s not without merit.

zmmmmm · 2026-01-15T02:13:23 1768443203

Now I'm interested ...

It still kind of makes the point that you are stuck with a very limited range of models that they are hand implementing. But at least it's a model I would actually use. Give me that in a box I can put in a standard data center with normal power supply and I'm definitely interested.

But I want to know the cost :-)

sanxiyn · 2025-12-24T23:58:33 1766620713

In my opinion, Groq's technical decisions are unsound in a normal world. But being HBM-free may have some merit in a world where HBM supply is constrained.

sanxiyn · 2025-12-17T00:30:34 1765931434

Ruby also used to use Bison, uses its own https://github.com/ruby/lrama these days.

sanxiyn · 2025-11-21T08:06:06 1763712366

Yeah. There are other fully open models like Hugging Face SmolLM but they are not common.

sanxiyn · 2025-11-18T06:25:35 1763447135

With cargo --offline, Rust has better than average support for offline build.

sanxiyn · 2025-11-18T05:52:45 1763445165

An interesting bit of history: for a long time Rust maintained first party support for Windows XP, after other parts of ecosystem generally gave up. This was because Firefox needed it.

https://github.com/rust-lang/compiler-team/issues/378 (major change proposal to drop Windows XP support) notes this history and links to other relevant pages.

sanxiyn · 2025-11-07T07:12:33 1762499553

If you look at the actual code, it runs ping -c 5. I agree ping without options doesn't terminate.