High Dexterity Experiments:
What are the limits of open-source algorithms on high-dexterity tasks? Our tentative answer: we have not yet found a task that a teleoperator can do with our robot but that pi0.5 — using datasets augmented with human interventions — cannot accomplish at least some fraction of the time. How small an object pi0.5 can 'see' with its 224x224 video input resolution, however, is an open question. For some tasks, the ACT algorithm works nicely too.
Policy Improvement using Human Interventions and DAgger:
In our experience, imitation learning for high-dexterity tasks hits a wall pretty fast and neither larger datasets nor longer training seem to help. One solution is to combine demonstrations with reinforcement learning, as in the newer Physical Intelligence approaches, but that code isn't open source. However, having a human intervene at failure points during policy rollouts, and then re-training the policy, DAgger-style, even with just one or two iterations, was surprisingly effective with no code mods!
High-level Control and Sub-task Learning:
pi0.5 was designed for high-level (HL) control of low-level sub-tasks, but training the HL controller using sub-task outputs as feedback isn't exposed in open source. As an alternative, we tested Gemini Robotics ER-1.5 as the HL controller, which works nicely for our task, though it did require some prompt engineering, and web latency may limit real-time use. We also addressed another HL control issue: how do you train a policy whose sub-tasks can be invoked in any order but need to chain naturally? Along the way, we also discovered some config/file tweaks needed to get pi0.5 to learn from sub-task prompts.
Sim to Real +:
With ACT, sim-to-real required precise matching of simulated and real environments, and even then it didn't work very well. pi0, on the other hand, even with LoRA fine-tuning, generalized well across different environments, both sim-to-real and sim-to-sim! We believe this is evidence that large-scale robot pre-training really helps with out-of-distribution robustness. In the video, we think this is why the robot recovers after drifting off-trajectory on its way to picking up the cube.
GR00T Experiments:
A first test of the latest NVIDIA Isaac GR00T N1.7 VLA model on the Trossen Stationary AI did not go smoothly! The robot learned the policy but it shook so violently it looked like it might destroy itself. EMA low-pass filtering, a longer action chunk, and some action clipping got it smoothed out.