Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the prevailing narrative ATM is that DeepSeek's own innovation was done in isolation and they surpassed OpenAI. Even though in the paper they give a lot of credit to Llama for their techniques. The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary.

All of this should have been clear anyway from the start, but that's the Internet for you.



The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary.

Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.

As far as I know, DeepSeek adds only a little to the transformers model while o1/o3 added a special "reasoning component" - if DeepSeek is as good as o1/o3, even taking data from it, then it seems the reasoning component isn't needed.


> I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model

Distillation is a term of art in AI and it is fundamentally incorrect to talk about distilling human-created data. Only an AI model can be distilled.

https://en.m.wikipedia.org/wiki/Knowledge_distillation#Metho...


Meh,

It seems clear that the term can be used informally to denote the boiling down of human knowledge, indeed it was used that way before AI appeared in the popular imagination.


In the context in which you said it, it matters a lot.

>> The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary.

> Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.

If deepseek was produced through the distillation (term of art) of o1, then the cost of producing deepseek is strictly higher than the cost of producing o1, and can't be avoided.

Continuing this argument, if the premise is true then deepseek can't be significantly improved without first producing a very expensive hypothetical o1-next model from which to distill better knowledge.

That is the argument that is being made. Please avoid shallow dismissals.

Edit: just to be clear, I doubt that deepseek was produced via distillation (term of art) of o1, since that would require access to o1's weights. It may have used some of o1's outputs to fine tune the model, which still would mean that the cost of training deepseek is strictly higher than training o1.


just to be clear, I doubt that deepseek was produced via distillation

Yeah, your technical point is kind of ridiculous here that in all my uses of distillation (and in the comment I quoted), distillation is used in informal sense and there's no allegation that DeepSeek could have been in possession of OpenAI's model weights, which is what's needed for your "Distillation (term of Art)".


I’m not sure why folks don’t speculate China is able to obtain copies of OpenAI's weights.

Seems reasonable they would be investing heavily in plaing state assets within OpenAI so they can copy the models.


Because it feeds conspiracy theories and because there's no evidence for it? Also, let's talk DeepSeek in particular, not "China".

Looking back on the article, it is indeed using "distillation" as a special/"term of art" but not using it correctly. IE, it's not actually speculating that DeepSeek obtained OpenAI's weights and distilled them down but rather that it used OpenAI's answers/output as a starting point (which there is a different method/"term of art").


Some info that may be missing:

- v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD)

- r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_)

- In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/


The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.


> the prevailing narrative ATM is that DeepSeek's own innovation was done in isolation and they surpassed OpenAI

I did not think this, nor did I think this was what others assumed. The narrative, I thought, was that there is little point in paying OpenAI for LLM usage when a much cheaper, similar / better version can be made and used for a fraction of the cost (whether it's on the back of existing LLM research doesn't factor in)


Yes, well the narrative that rocked the stock market is different. Its looking at what DeepSeek did and assuming they may have competitive advantage in this space and could outperform OpenAI at their own game.

If the narrative is actually that DeepSeek can only reach whatever heights OpenAI has already gotten to with some new tricks, then markets will probably refocus on OpenAI's innovations and price things accordingly, even if the initial cost is huge. It also means OpenAI probably needs a better moat to protect its interests.

I'm not sure where the reality is exactly, but market reactions so far have basically followed that initial narrative and now the rebuttal.


The idea that someone can easily replicate an OpenAI model based simply on OpenAI outputs is, I’d argue, immeasurably worse for OpenAI’s valuation than the idea that someone happened to come up with a few innovations that leapfrogged OpenAI.

The latter could be a one time thing, and/or OpenAi Could still use their financial might to leverage those innovations and get even better with them.

However, the former destroys their business model and no amount of intelligence and innovation from OpenAI protects them from being copied at a fraction of the cost.


> Yes, well the narrative that rocked the stock market is different.

How do you know this?

> If the narrative is actually that DeepSeek can only reach whatever heights OpenAI has already gotten to with some new tricks, then markets will probably refocus on OpenAI's innovations and price things accordingly

Why? If every innovation OpenAI is trying to keep as secret sauce becomes commoditized quickly and cheaply, then why would markets care about any innovations they have? They will be unable to monetize them.


Couldn't OpenAI just put in their license that training off OpenAi output is not allowed? With shibboleth or API logs, this could be verifiable.


Why would it matter when Chinese deepseek is not going to abide by such rules or be forced to and will release their model open weights so anyone anywhere can host it?

Also, scraping most of the websites they scrape is also not allowed, they do it anyways


If they can make the US and Europe block the use of Deepseek and derivatives, they would be able to protect most of their market.


There were different narratives for different people. When I heard about r1, my first response was to dig into their paper and it's references to figure out how they did it.


> I did not think this, nor did I think this was what others assumed.

That's what I thought and assumed. This is the narrative that's been running through all the major news outlets.

It didn't even occur to me that DeepSeek could have been training their models using the output of other models until reading this article.


Fwiw I assumed they were using o1 to train. But it doesn’t matter: the big story here is that massive compute resources are unlikely to be as important in the future as we thought. It cuts the legs off stargate etc just as it’s announced. The CCP must be highly entertained by the timeline.


That's only the case if you don't need to use the output of a much more expensive model.


>shows that models like o1 are necessary.

But HOW they are necessary is the change. They went from building blocks to stepping stones. From a business standpoint that's very damaging to OAI and other players.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: