Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like it was picking up on the audience reaction and stopping to listen.

To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.



Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:

- "What are you seeing now"

- "I'm showing this to you now"

etc.

The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.


Yeah, the way the app currently works is that ChatGPT-4o only sees up to the moment of your last comment.

For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.

So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.

(I was able to play with the model early because I work at OpenAI.)


thanks


Commenting on the outfit was very weird indeed. Greg Brockman's demo includes some outfit related questions (https://twitter.com/gdb/status/1790071008499544518). It does seem very impressive though, even if they polished it on some specific tasks. I am looking forward to showing my desktop and asking questions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: