If he was wearing colored gloves for hand pose detection, these videos would be suitable for a Vision-Language-Action (VLA) model training set. The robotic companies are making and labeling many such videos.
Probably a good thing he's not doing that - it forces each competitor to come up with an own method of training. Otherwise, they could all just train to hi specific video (kind of like how video card mfg's optimize for the tests when competing against other cards).
I've recently been misled by ChatGPT a lot as well. I think it's the router. I'm on the free plan so I assume they're just being tight with the GPU cycles.
There's a lot more than money at stake.
reply