I was also just in the market for a small experiment robot. I got the hiwonder armpi-fpv. Avoid it, the actuators are pretty bad - they're very 'grindy', the robot jitters like crazy when it moves. Any such problems with the lekivi?
Hmm I've never used the hiwonder servos so I'm not sure how they compare with the feetech/waveshare STS type, but these have been surprisignly good overall. There is still considerable backlash which results in a cumulative 1-2cm of gripper translational error when accumulated along the arm, but the control is really stable. I don't think there's any jitter at all. They are a bit loud when moving at max speed, but there is also a STS3250 brushless variant that's stronger and really quiet. Expensive though.
I haven't tested the Lekivi specifically, but lots of SO-ARMs and a custom built lekivi-like robot. I think some people have had some issues with the rear omni wheel when moving forward but I haven't seen that myself.
For the fine-motor commands: or, the model can write the code to generate them on the fly. It seems to work, in my very limited experiment.
As for memory: my approach is to give the robot a python repl and, basically, a file system - the LLM can write modules, poke at the robot via interactive python, etc.
Basically, the LLM becomes a robot programmer, writing code in real-time.
This seems perfect to hook up to my 'LLMs can control robots over MCP' system. The idea is that LLMs are great at writing code, so let's lean in to that. I'll give it a try! I just got a bigger robot, we'll see how it does...
Really unfortunate that I forgot what YT video i saw about this just 2 weeks ago.
It was about Googles PaLM-E evolution and progress. It basically has two models one which controls the robot, the other is a llm and they are combined together in some attention layer.
That video is pretty good, thanks for finding it. I'm basically betting that an earlier, abandoned approach described in the video, "Code as Policy", will beat everything else. It requires no training data, and generalizes instantly to all robots.
Remember how all those years ago (oh wait, a few months ago), that big npm hack attack was found by someone thinking their ssh connection was half a second too slow? Hope this ain't like that!
I'm connecting LLMs directly to robots, to see how well they can perform robot things by directly controlling motors and sampling the camera/sensors. Initial results are encouraging!
Isn’t this showing that LLMs can write code to control robots, not that they can actually directly control them? If I’m reading the hand tracking example right, the LLM is not actually in the control loop. Is this wrong?
Yeah, the mechanism by which the LLMs control the robot is by writing code. I suppose they could also issue direct joint sequences, but I thought that they're so good at writing code already, might as well do that. So if they 'wanted' to they could write code with an explicit joint sequence they calculate in-context. That one seems more difficult.
So they can go 'slow', by taking a camera image, controlling the robot, repeating. Or they can write code that runs closer to the robot in a loop, either way. I thought the latter was somehow more impressive, and that's what you see in the hand-tracking example.
Hm, the latter is certainly more practical, but feels less interesting to me because we already know that LLMs can write code. To me, exploring the limits of the LLM-in-the-loop approach would be more interesting. Cool project regardless, thanks for sharing!
Here's an idea for how to do that: treat frontier AI as a sort of 'common carrier'. The only business that frontier AI labs are allowed to conduct is selling raw tokens - no UI. Thus, 'claude code' would have to come from some other company. This would segment the AI industry, and, maybe, prevent a single entity (or small number of entities) from capturing all value.
Sounds promising honestly. One of the scariest parts of the big AI labs is all of the exclusive training data they get through their UIs. (It’s unclear whether distillation is a feasible way to close the gap).
If there were another party involved, that would (hopefully) diversify power that (potentially) comes with those streams of data.
It’s a bit ironic that the USA has mostly abandoned interoperability after being one of the pioneers with the American manufacturing method. [0]
Yet another problem with MCP: every LLM harness that does support it at all supports it poorly and with bugs.
The MCP spec allows MCP servers to send back images to clients (base64-encoded, some json schema). However:
1) codex truncates MCP responses, so it will never receive images at all. This bug has been in existence forever.
2) Claude Code CLI will not pass those resulting images through its multi-modal visual understanding. Indeed, it will create an entirely false hallucination if asked to describe said images.
3) No LLM harness can deal with you bouncing your local MCP server. All require you to restart the harness. None allow reconnection to the MCP server.
I assure you there are many other similar bugs, whose presence makes me think that the LLM companies really don't like MCP, and are bugly-deprecating it.
I've got one of these! Mine is called 'roboflex' (github.com/flexrobotics). It's c++/python, not rust. But similarly born out of frustration with ros. Writing your own robotics middleware seems to be a rite of passage. Just like 'writing your own game engine'. Nothing wrong with that - ros is powerful but has legit problems, and we need alternatives.
Although tbh, these days I'm questioning the utility. If I'm the one writing the robot code, then I care a lot about the ergonomics of the libraries or frameworks. But if LLMs are writing it, do I really care? That's a genuine, not rhetorical question. I suppose ergonomics still matter (and maybe matter even more) if I'm the one that has to check all the LLM code....
Take a look at github.com/dimensionalos/dimos. We are a team making - not only a replacement for ROS - but one that can be easily vibe coded, and one with compatibility with ros and containers.
Always looking for testers and feedback if you want to influence the design/API.
A few years ago there were actually two companies trying to manufacture "zblan optical fiber" (which has better light transmission than normal optical fiber) in orbit: Made In Space, and FOMS. Both of their websites are tombstones now, afaik. The former was also attempting 3d printing in space, and was bought by Redwire.
Fascinating tech, but seemed to go nowhere.
There are now several 'manufacturing in space platform' companies, like Varda. It's not enough to just be a platform. There needs to be an actual killer app.
Borges: Selected Non-fictions. Think his fictions are good? His non-fictions, imho, are even better. You can read three sentences and feel like you just listened to a symphony - you get that constant Borges wit, erudition, mystery. The English translations are SO good. Are they even better in Spanish?
reply