Very cool! I don't have this pain point currently but I can absolutely see the u...

vercantez · on March 20, 2024

Thanks for the feedback! We'll add some styling to the demo page. We're processing the image with an object detection model and classification model and also using some accessibility element data to get a better understanding of what is interactive on the screen.

ck_one · on March 20, 2024

Why don't you also use GPT4-V for that part?

vercantez · on March 20, 2024

GPT-4V is great for reasoning about what is on the screen. However, it struggles with precision. For example, it is not able to specify the coordinates to tap when it decides to tap an icon. That's where the object detection and accessibility elements help. We can precisely locate interactive elements.

bluelightning2k · on March 20, 2024

Have you tried putting a pixel grid over the image with labelled guidelines every 100px?

Was one thing I never got around to testing with DemoTime but was always curious about.

Anyway sorry this is a nice product. Congratulations on the launch.

Always good to see substantial tech

vercantez · on March 20, 2024

Thanks! Yes, we experimented with that! I think because of the way that GPT sees images in patches it has a hard time with absolute positioning but that's just a guess.

TelonAlex · on March 22, 2024

I've done something similar and found the same thing. It also could not calibrate when I drew a dot on its last suggested coordinates.

"You said the play button was at 100, 200 and a green circle is drawn there. Is the circle located on the button or do you need to adjust it"

Something along those lines. And it also got the size of the image.

Nope its in the right ballpark but it could not make fine adjustments or anything closer to a button.