Hacker Newsnew | past | comments | ask | show | jobs | submit | lysecret's commentslogin

Oh I didn’t know about the visual bounding boxes this is super cool!

Quick question are you talking about this feature?

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/b...

Because it’s just using structured response so it should be doable with Gemini 3 ? (We are using Gemini 3 for some docs processing and its visual understanding is just incredible)


No I'm talking about the image segmentation feature: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...

But the bounding box stuff might work well enough in Gemini 3 to handle this case as well.


Hmm so that post also links back to segmentation done by structured outputs? (Though here not even enforcing the structure)

https://ai.google.dev/gemini-api/docs/image-understanding#se...


It's not supported by Gemini 3: https://ai.google.dev/gemini-api/docs/gemini-3#migrating_fro...

> Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.


It’s funny how every podcaster/public ai figure is so certain text as a Ui will go away and it’s not going anywhere.


A few days ago I was trying to unsubscribe to a service (notably an AI 3D modeling tool that I was curious about).

I spent 5 minutes trying to find a way to unsubscribe and couldn't. Finally, I found it buried in the plan page as one of those low-contrast ellipses on the plan card.

Instead of unsubscribing me or taking me to a form, it opened a convos with an AI chatbot with a preconfigured "unsubscribe" prompt. I have never felt more angry with a UI that I had to waste more time talking to a robot before it would render the unsubscribe button in the chat.

Why would we bring the most hated feature of automated phone calls to apps? As a frontend engineer I am horrified by these trends.


It's probably increased during my lifetime. People used to talk, now they sit and text into smartphones.


There might be some confusion about the transition to what some call post-literate era: era where text is not the primary medium. That’s not necessarily bad because you get the advantages of other mediums - oral and visual but it is something to keep in mind.


I'm bit skeptical that a post-literate era is happening. I gather it appears in some sci-fi but I don't see much sign in reality. I mean here we are on a text only site. If anything we seem to be heading for a 100% literate society. Literacy graphs here: https://ourworldindata.org/grapher/cross-country-literacy-ra...


I don’t think the post-illiterate era means that text will disappear. I think it’s just not going to be dominant anymore but I also have my reservations since I do prefer the text medium.


3 flash is also insanely good even slightly outperforms 3 pro for me.


Super excited about this generally ok satisfied with pyright but so I was with conda before uv or black before ruff.


When I was debugging through f# code they definitely had that.


I see this, a hot take form my side as someone who is bought in to GCP i quite like being able to put everything on the same billing account / handle it easily through service accounts.


2.5 pro is already excellent at this.


Now compare on free cash flow


Yep fully agreed the main thing is to break apart the systems so any retries don’t lead to issues like you mentioned.

I do still think there is sufficient amount of boilerplate to potentially justify some engine like this.


Cursor has this too


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: