Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.
Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.
The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.
Seems like it was picking up on the audience reaction and stopping to listen.
To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.
Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:
- "What are you seeing now"
- "I'm showing this to you now"
etc.
The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.
Yeah, the way the app currently works is that ChatGPT-4o only sees up to the moment of your last comment.
For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.
So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.
(I was able to play with the model early because I work at OpenAI.)
Commenting on the outfit was very weird indeed. Greg Brockman's demo includes some outfit related questions (https://twitter.com/gdb/status/1790071008499544518). It does seem very impressive though, even if they polished it on some specific tasks. I am looking forward to showing my desktop and asking questions.
Regarding the limits, I recently found that I was hitting limits very quickly on GPT-4 on my ChatGPT Plus plan.
I’m pretty sure that wasn’t always the case - it feels like somewhere along the lines the allowed usage was reduced, unless I’m imagining it. It wouldn’t be such a big deal if there was more visibility of my current usage compared to my total “allowance”.
I ended up upgrading to ChatGPT Team which has a minimum of 2x users (I now use both accounts) but I resented having to do this - especially being forced to pay for two users just to meet their arbitrary minimum.
I feel like I should not be hitting limits on the ChatGPT Plus paid plan at all based on my usage patterns.
I haven’t hit any limits on the Team plan yet.
I hope they continue to improve the paid plans and become a bit more transparent about usage limits/caps. I really do not mind paying for this (incredible) tech, but the way it’s being sold currently is not quite right and feels like paid users get a bit of a raw deal in some cases.
I have API access but just haven’t found an open source client that I like using as much as the native ChatGPT apps yet.
I use GPT from API in emacs, it's wonderful. Gptel is the program.
Although API access through Groq to Llama 3 (8b and 70b) is so much faster, that i cannot stand how slow GPT is anymore. It is slooow, still very capable model, but marginally better than open source alternatives.
Have you tried groq.com? Because I don't think gpt-4o is "incredibly" fast. I've been frustrated at how slow gpt-4-turbo has been lately, and gpt-4o just seems to be "acceptably" fast now, which is a big improvement, but still, not groq-level.
Yes, of course, probably sometime in the following days. Some people mention it already works in the playground.
I was wondering why OpenAI didn't release a smaller model but faster. 175 billion parameters works well, but speed sometimes is crucial. Like, a 20b parameters model could compute 10x faster.
I went through the exact same situation this last week. Didn't send more than 30 (token-heavy) messages within a few hours and it blocked me for 1 hour if I'm not wrong - paying user.
They need to fade the audio or add some vocal queue when it's being interrupted. It makes it sound like it's losing connection. What'll be really impressive is when it intentionally starts interrupting you.
> Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.
It just stops the audio feed when it detects sound instead of an AI detecting when it should speak, so that part is horrible yeah. A full AI conversation would detect the natural pauses where you give it room to speak or when you try to take the word from it by interrupting, there it was just some dumb script to just shut it off when it hears sound.
But it is still very impressive for all the other part, that voice is really good.
Edit: If anyone from OpenAI reads this, at least fade out the voice quickly instead of chopping it, hard chopping off audio doesn't sound good at all, so many experienced this presentation to be extremely buggy due to it.
Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.
The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.