Called it from day 0, impossible to reach that performance with 5M, they had to distill OpenAI (or some other leading foundational model).
Got downvoted to oblivion by people who haven't been told what to think by MSM yet. Now it's on FT and everywhere, good, what matters is that truth comes out eventually.
I don't take any sides and think what DeepSeek did is fair play, however, what I do find harmful about this is, what incentive would company A have to spend billions training a new frontier model if all of that could be then reproduced by company B at a fraction of the cost?
>The San Francisco-based ChatGPT maker told the Financial Times it had seen some evidence of “distillation”, which it suspects to be from DeepSeek.
Given that many people have been using ChatGPT to distill their fine-tunes for a few years now, how can they be sure it was specifically DeepSeek? There's, say, glaive.ai whose entire business model is to sell you synthetic datasets, probably generated with ChatGPT as well.
I agree that the evidence is weak, and even if they had some, they cannot really do anything.
To me, it's just very likely they distilled GPT-4, because:
1) Again, you just cannot get that performance at that cost. And no, what they describe on the paper is not enough to explain the 1,000x-fold decrease in cost.
2) Very often, DeepSeek tells you it's ChatGPT or OpenAI; it's actually quite easy to get it to do that. Some say that's related to "the background radiation on the post-AI internet". I'm not a fentanyl consumer so, unfortunately, I think that argument is trash.
If it's just a distillation of GPT-4, wouldn't we expect it to have worse quality than o1? But I've seen countless examples of DeepSeek-r1 solving math problems that o1 cannot.
>Very often, DeepSeek tells you it's ChatGPT or OpenAI; it's actually quite easy to get it to do that. Some say that's related to "the background radiation on the post-AI internet". I'm not a fentanyl consumer so, unfortunately, I think that argument is trash.
The exact same thing happened with Llama. Sometimes it also claimed to be Google Assistant or Amazon Alexa.
Are you sure you checked R1 and not V3? By default, R1 is disabled in their UI.
Prompt: Find an English word that contains 4 'S' letters and 3 'T' letters.
Deepseek-R1: stethoscopists (correct, thought for 207 seconds)
ChatGPT-o1: substantialists (correct, thought for 188 seconds)
ChatGPT-4o: statistics (wrong) (even with "let's think step by step")
In almost every example I provide, it's on par with o1 and better than 4o.
>substantially wrong on benchmarks like ARC which is designed with this in mind.
Wasn't it revealed OpenAI trained their model on that benchmark specifically? And had access to the entire dataset?
The identity issue is not an evidence at all. It is the easiest thing to clean from data, if you are actually distilling GPT-4, that would be the first thing you do to remove those data samples.
It is predicting next token, are we really taking its words and think the model knows what it is saying?
Got downvoted to oblivion by people who haven't been told what to think by MSM yet. Now it's on FT and everywhere, good, what matters is that truth comes out eventually.
I don't take any sides and think what DeepSeek did is fair play, however, what I do find harmful about this is, what incentive would company A have to spend billions training a new frontier model if all of that could be then reproduced by company B at a fraction of the cost?