Hacker Newsnew | past | comments | ask | show | jobs | submit | fragilerock's commentslogin

My layperson interpretation of this particular error was that the AI model probably came up with the initial recipe response in full, but when the audio of that response was cut off because the user interrupted it, the model wasn't given any context of where it was interrupted so it didn't understand that the user hadn't heard the first part of the recipe.

I assume the responses from that point onwards didn't take the video input into account, and the model just assumes the user has completed the first step based on the conversation history. I don't know how these 'live' ai sessions things work but based on the existing openai/gemini live ai chat products it seems to me most of the time the model will immediately comment on the video when the 'live' chat starts but for the rest of the conversation it works using TTS+STT unless the user asks the AI to consider the visual input.

I guess if you have enough experience with these live AI sessions you can probably see why it's going wrong and steer it back in the right direction with more explicit instructions but that wouldn't look very slick in a developer keynote. I think in reality this feature could still be pretty useful as long as you aren't expecting it to be as smooth as talking to a real person


That feels plausible to me.

You can trigger this type of issue by ChatGPT then reading the transcript.

The model doesn’t know you interrupted it, so continued assuming he had heard the steps.


If someone said "The earth is round and anybody who says it isn't doesn't know what they are talking about" would you still challenge their intellectual honesty in this way?


I would question their debate ability


Then how come in face-to-face interactions people generally communicate using speech rather than text?

Clearly there's a disadvantage to using text in that situation, and I think it's that it almost always takes longer to express thoughts/intents using text. ISTM a sufficiently advanced computer voice interface would have the same advantage.


People communicate with their friends more over text than in person.

Am I really having to explain basic stuff like this? Lmao.


Because it allows people to communicate when they're not in close physical proximity. Would you rather go out to dinner with friends and just speak to each other or sit there and type your conversation out in a WhatsApp group chat?

It's a convenience/necessity thing, pure and simple.


Theres benefits to be had when interacting with REAL people in person.

Zero benefit interacting with voice with an AI. Pure and simple.

Nobody cares about an agent when they are the principal - this is not remotely the same as interfacing with a human that is valued much higher.


I said was talking about face-to-face (or 'in person' as you put it) communication. You're absolutely right that over long-distance people prefer to communicate by text, but in person people prefer to communicate by speech so that's exactly my point: there are at least some contexts in which people prefer speech.

I guess I could also follow suit and return your weird toxic/patronising insult here too since you clearly didn't understand my original comment, but perhaps it would be nicer if we didn't do that?


That's funny, the way I interpreted this sentence is that usage was already high in older, male, and high-income countries so most of the new users are coming from outside these demographics. Which, ironically, is the exact opposite of what you're saying.


That’s funny, you miscomprehended English.


You read "Users are younger, increasingly female, global, and adoption is growing fastest in lower-income countries" and gathered that "Young moms with no money in poor countries use this product the most". Do I really need to spell out the fact that you completely failed to understand basic English here?


the restaurant spends resources (both physical and human) cooking and serving you the meal, likewise for the barber. a better example would be showing up late for a cinema showing so that you deliberately avoid watching the adverts and trailers... which i would guess most people would agree is morally fine?


The more direct cinema example would be sneaking into the theater and there were empty seats (so you did not deny anyone else access to the movie). Is that morally fine? You watched the movie, the creator doesn't get paid.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: