Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with "better intelligence" is that OpenAI is running out of human training data to pillage. Training AI on the output of AI smooths over the data distribution, so all the AIs wind up producing same-y output. So OpenAI stopped scraping text back in 2021 or so - because that's when the open web turned into an ocean of AI piss. I've heard rumors that they've started harvesting closed captions out of YouTube videos to try and make up the shortfall of data, but that seems like a way to stave off the inevitable[0].

Multimodal is another way to stave off the inevitable, because these AI companies already are training multiple models on different piles of information. If you have to train a text model and an image model, why split your training data in half when you could train a combined model on a combined dataset?

[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.



>harvesting closed captions out of YouTube videos

I'd bet a lot of YouTubers are using LLMs to write and/or edit content. So we pass that through a human presentation. Then introduce some errors in the form of transcription. Turn feed the output in as part of a training corpus ... we plateaued real quick.

It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?

Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?


If a model is average human intelligence in pretty much everything, is that super-human or not? Simply put, we as individuals aren't average at everything, we have what we're good at and a great many things we're not. We average out by looking at broad population trends. That's why most of us in the modern age spend a lot of time on specialization for whatever we work in. Which brings the likely next place for data. A Manna (the story) like data collection program where companies hoover up everything they can on their above average employees till we're to the point most models are well above the human average in most categories.


>[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.

Whisper models are better than anything google has. In fact the higher quality whisper models are better than humans when it comes to transcribing text with punctuation.


Why do you think they’re using Google auto-captioning?

I would expect they’re using their own t2s which is still a model but way better quality and potentially customizable to better suit their needs


At some point, algorithms for reasoning and long-term planning will be figured out. Data won’t be the holy grail forever, and neither will asymptotically approaching human performance in all domains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: