Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Look up VLA models; that's essentially plugging the guts of a language model into a transformer that handles joint motion/vision. They get trained on "episodes" i.e. videos from the PoV of a robot doing a task, after training you can ask the model things like: "pick up the red ball and put it into the green cup" etc. Really cool stuff.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: