Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes you're right about the compute. Let me try to make my point differnetly: GPT-3 and GPT-4 were models which when they were released represented the best that OpenAI could do, while GPT-3.5 was an intentionally smaller (than they could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b. So to estimate when the next "best we can do" model might be released we should look at the difference between the release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4. That's my understanding, dunno.


GPT-4 only started training roughly at the same time/after the release of GPT-3.5, so I'm not sure where you're getting the "intentionally smaller".


Ah I misremembered GPT-3.5 as being released around the time of ChatGPT.


oh you remembered correctly, those are the same thing

actually i was wrong about when gpt-4 started training, the time i gave was roughly when they finished




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: