How does such a distillation work in theory? They don’t have weights from OpenAI... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		blackeyeblitzar 11 months ago \| parent \| context \| favorite \| on: The Illustrated DeepSeek-R1 How does such a distillation work in theory? They don’t have weights from OpenAI’s models, and can only call their APIs, right? So how can they actually build off of it?

moralestapia 11 months ago [–]

Like RLHF but the HF part is GPT4 instead.

KarraAI 11 months ago | [–]

How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?

moralestapia 11 months ago | | [–]

No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact