ANR Robot

🤖 All experiments use the Trossen Stationary AI Robot

High Dexterity Experiments:

What are the limits of open-source algorithms on high-dexterity tasks? Our tentative answer: we have not yet found a task that a teleoperator can do with our robot but that pi0.5 — using datasets augmented with human interventions — cannot accomplish at least some fraction of the time. How small an object pi0.5 can 'see' with its 224x224 video input resolution, however, is an open question. For some tasks, the ACT algorithm works nicely too.

pi0.5 policy after 2 iterations of policy improvement, see below.

pi0.5 policy after 1 iteration of policy improvement, see below.

ACT policy.

Policy Improvement using Human Interventions and DAgger:

In our experience, imitation learning for high-dexterity tasks hits a wall pretty fast and neither larger datasets nor longer training seem to help. One solution is to combine demonstrations with reinforcement learning, as in the newer Physical Intelligence approaches, but that code isn't open source. However, having a human intervene at failure points during policy rollouts (DAgger-style), and then resuming training on the augmented dataset, even with just one or two iterations, was surprisingly effective with no algorithm mods!

pi0.5 policy after 1 iteration of policy improvement.

Not all results are this good, but they are still much better than pre-DAgger!

High-level Control and Sub-task Learning:

pi0.5 was designed for high-level (HL) control of low-level sub-tasks, but training the HL controller using sub-task outputs as feedback isn't exposed in open source. As an alternative, we tested Gemini Robotics ER-1.5 as the HL controller, which works nicely for our task, though it did require some prompt engineering, and web latency may limit real-time use. We also addressed another HL control issue: how do you train a policy whose sub-tasks can be invoked in any order but need to chain naturally? Along the way, we also discovered some config/file tweaks needed to get pi0.5 to learn from sub-task prompts.

A single pi0.5 policy learned multiple sub-tasks, such as 'pick up pink cube...'

Gemini Robotics ER-1.5 HL controller chooses the next subtask to 'put all cubes in bucket'

Sim to Real +:

With ACT, sim-to-real required precise matching of simulated and real environments, and even then it didn't work very well. pi0, on the other hand, even with LoRA fine-tuning, generalized well across different environments, both sim-to-real and sim-to-sim! We believe this is evidence that large-scale robot pre-training really helps with out-of-distribution robustness. In the video, we think this is why the robot recovers after drifting off-trajectory on its way to picking up the cube.

pi0 LoRA fine-tuned policy from SIMULATED dataset containing only red cubes.

GR00T Experiments:

A first test of the latest NVIDIA Isaac GR00T N1.7 VLA model on the Trossen Stationary AI did not go smoothly! The robot learned the policy but it shook so violently it looked like it might destroy itself. EMA low-pass filtering, a longer action chunk, and some action clipping got it smoothed out.