Hacker Newsnew | past | comments | ask | show | jobs | submit | jug's commentslogin

”If you don't cannibalize yourself, someone else will." — Steve Jobs

Looks like an ongoing theme and a very poor benchmark. Not at all the claims I expected.

It's also very surprising to me. This whole deal where humans instantly started taking AI answers at face value, as sources standing on their own legs, or delegating their own mind to a third party, not even a human, but an algorithm.

It's like they're just... Fine?

AI became their god over a few months and it's... Fine?

I thought I knew humanity pretty well and I'm rarely surprised at human large scale behavior these days as I'm hitting 50 myself, but this took me by surprise.


While oldest source of it, note that the 86-DOS v0.1-C binaries are even earlier (and v0.34 has also been found) than this v1.00 source and can be downloaded and used in an emulator. :-)

https://arstechnica.com/gadgets/2024/01/the-oldest-known-ver...


This is a risk although then this is fortunately a model that isn't tied to Chinese hosting. But indeed something to consider if using straight DeepSeek.com.


I found this thought provoking and just had to see how the new Gemini 3.5 Flash reasoned about this (I find it fun to go meta on modern AI like this), and I'm happy that I did! Also as an opportunity to trial this recent model.

https://g.co/gemini/share/065ffa89698e


I think that's what the Omniscience Index is for:

https://artificialanalysis.ai/evaluations/omniscience#aa-omn...

It rewards correct answers and penalizes hallucinations, and finally no reward for refusing to answer.

It's interesting just how poorly some popular Chinese models fare in this regard, like GLM 5.1 or DeepSeek 4 Pro.

Gemini 3.x has truly remarkable knowledge given how it leads in this benchmark despite being (quite a bit) more prone to hallucinate than Claude Opus.


They have now been released on e.g Hugging Face with model suffixes "-assistant".


Shouldn't one use e.g a Wolfram Alpha MCP endpoint for math in AI? From what I've seen on even premium non-quantized models, I would never ever trust the innate ability of a LLM to calculate.


Prices are also expected to drop significantly in H2 as they move to Huawei Ascend 950 super nodes.

Yes, even compared to this low price point.

As before, the headline news with DeepSeek isn't in the benchmarks, but that they're competitive there while being gut churningly cheap for the Western AI industry.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: