Yeah I thought the same and was immediately disappointed that you could only step a tiny bit forwards.
BUT, you can turn around and see something that I presume was entirely generated. So I don't think it is just doing some clever tricks to make the photo look 3D, but also "infilling" what is behind the camera too. That is kinda cool.
I'd love to see this improved so I can walk around some more though, to see what is down those alleys etc.
I’m sure It’ll improve. I imagine, the further from the input image the more the model has to make up stuff. Gen-AI video models are limited to a few seconds. In 3D you’re constrained to a volume
BUT, you can turn around and see something that I presume was entirely generated. So I don't think it is just doing some clever tricks to make the photo look 3D, but also "infilling" what is behind the camera too. That is kinda cool.
I'd love to see this improved so I can walk around some more though, to see what is down those alleys etc.