ANR Robot

We are a small R&D lab aiming to build autonomous robots capable of human-level dexterity, planning, and understanding. A secondary goal is to explore just how far a small company, research group, or hobbyist can go starting with open-source software and off-the-shelf hardware.
🤖 All experiments use the Trossen Stationary AI Robot

High Dexterity Experiments:

What are the limits of open-source algorithms on high-dexterity tasks? Our tentative answer: we have not yet found a task that a teleoperator can do with our robot but that pi0.5 — using datasets augmented with human interventions — cannot accomplish at least some fraction of the time. How small an object pi0.5 can 'see' with its 224x224 video input resolution, however, is an open question. For some tasks, the ACT algorithm works nicely too.

pi0.5 policy after 2 iterations of policy improvement, see below.
pi0.5 policy after 1 iteration of policy improvement, see below.
ACT policy.
ACT policy.

Policy Improvement using Human Interventions and DAgger:

In our experience, imitation learning for high-dexterity tasks hits a wall pretty fast and neither larger datasets nor longer training seem to help. One solution is to combine demonstrations with reinforcement learning, as in the newer Physical Intelligence approaches, but that code isn't open source. However, having a human intervene at failure points during policy rollouts, and then re-training the policy, DAgger-style, even with just one or two iterations, was surprisingly effective with no code mods!

pi0.5 policy after 1 iteration of policy improvement.
Not all results are this good, but they are still much better than pre-DAgger!

High-level Control and Sub-task Learning:

pi0.5 was designed for high-level (HL) control of low-level sub-tasks, but training the HL controller using sub-task outputs as feedback isn't exposed in open source. As an alternative, we tested Gemini Robotics ER-1.5 as the HL controller, which works nicely for our task, though it did require some prompt engineering, and web latency may limit real-time use. We also addressed another HL control issue: how do you train a policy whose sub-tasks can be invoked in any order but need to chain naturally? Along the way, we also discovered some config/file tweaks needed to get pi0.5 to learn from sub-task prompts.

A single pi0.5 policy learned multiple sub-tasks, such as 'pick up pink cube...'
Gemini Robotics ER-1.5 HL controller chooses the next subtask to 'put all cubes in bucket'

Sim to Real +:

With ACT, sim-to-real required precise matching of simulated and real environments, and even then it didn't work very well. pi0, on the other hand, even with LoRA fine-tuning, generalized well across different environments, both sim-to-real and sim-to-sim! We believe this is evidence that large-scale robot pre-training really helps with out-of-distribution robustness. In the video, we think this is why the robot recovers after drifting off-trajectory on its way to picking up the cube.

pi0 LoRA fine-tuned policy from SIMULATED dataset containing only red cubes.

GR00T Experiments:

A first test of the latest NVIDIA Isaac GR00T N1.7 VLA model on the Trossen Stationary AI did not go smoothly! The robot learned the policy but it shook so violently it looked like it might destroy itself. EMA low-pass filtering, a longer action chunk, and some action clipping got it smoothed out.

Looking good now! Note that the original dataset only contains red cubes!
Additional Research

Hierarchical Planning

Recursive Tree Planner
Builds a stack of 5 blocks in Box2d
Recursive Tree Planner
Solves Lunar Lander in Box2d
(Problem from Gymnasium)
Recursive Tree Planner
Solves Inverted Pendulum in Mujoco
(Problem from Levy et al)
Recursive Tree Planner
Solves 3-Level Four Rooms
(Problem from Levy et al)
Paper: Pure Planning to Pure Policies and In Between with a Recursive Tree Planner