Perhaps I've missed something, but where will the infinite amounts of training d...

Tier3r · on Oct 19, 2024

Perplexity has a dubious idea based around harvesting user chats -> making service better -> getting more user prompts. I am quite unconvinced that user prompts and stored chats will materially improve an LLM that is trained on a trillion high quality tokens.

The second idea being kicked around is synthetic data will create a new fountain of youth for data that will also fix its reasoning abilities.

LarsDu88 · on Oct 19, 2024

There's pretraining which is just raw text from the internet but there's also supervised preference data sourced from users.

Right now the edge is in acquiring the latter which OpenAI has a slight lead in