Notes & Optimizations
Code for all projects below is in our github.com/anredlich forks. For additional openpi coding details see Openpi implementation details. For additional GR00T coding details, see GR00T implementation details.
Last updated: 05/2026
Robot
- WARNING: Do NOT let pets or small children near the leader arms: they can swing and swoop down violently, especially if you play with the arm joint_characteristics. Almost learned this the hard way.
- sticky gripper: The right arm gripper was a bit sticky (it feels like static friction) and would over-shoot. Improved this by adjusting the embedded arm joint_characteristics variable, friction_viscous_coef, for the gripper (joint 6) from 202.61772... to 25.0. See the Trossen documentation for how to do this.
lerobot
- version warning: There was a dataset version error which prevented lerobot simulation testing and dataset visualization for older aloha and pusht datasets. Converted this to a warning.
- missing type in config: Fixed a model writing error in train.py: the checkpoint config.json file was missing the "type: act" or "type: diffusion" line so the model could not be read, e.g. by eval.py. Solved this by adding type: str = "act" line to configuration_act.py and type: str = "diffusion" to configuration_diffusion.py.
- max joint angle change: For real robot rollouts, we found that setting the robot.max_relative_target to 0.05-0.1 radians makes a huge difference in whether a learned policy succeeds. This argument clips the maximum joint angle change in one step, thereby reducing jerky motions which seem to take the robot out of the learning distribution and often lead to failure.
- multi-task dataset workflow: To build a real robot multi-task dataset, we have lerobot randomly choose from a set of prompts and use log_say to voice the prompt. Then we use teleoperation to enact the task corresponding to that prompt. See control_robot.py->record(...) and control_utils.py->record_episode(...) in our lerobot for more details.
- building a dataset with sub-tasks: Sub-tasks, such as 'pick up the spoon', which are part of a high level task, such as 'clean up the kitchen', must flow naturally into one another. Hence, to build a dataset of sub-tasks, we first record episodes of full tasks, and then use our dataset_splitter.py to split each full task episode into a number of sub-task episodes. Each sub-task is also labeled with a sub-task prompt. For each sub-task, dataset_splitter.py takes as input a range of full task frames/events, and also a prompt. We use visualize_dataset.py to look at the robot video to determine a frame range for each sub-task. The splitter tool, along with dataset_merger.py are in the develop branch of our lerobot.
openpi (pi0 / pi0.5)
- gpus: The RTX 5090 GPU on our local Ubuntu computer has sufficient memory, 32 GB VRAM, for all inference models, and for LoRA fine-tuning, but it does not have enough memory for a full fine-tune. For full fine-tune we rent H100s or better gpus with at least 80GB of VRAM.
- training time: On our local Ubuntu computer with an Nvidia RTX 5090 GPU, 20K — typical — steps of LoRA fine-tune training takes about 16 hours. 40K steps — typical — of full fine-tuning on a H100 takes about 30 hours.
- normalization statistics: We use compute_norm_stats.py to create a norm_stats.json file for each of our datasets! This is especially important since the default pi0_base trossen norm_stats.json seems to be for the older Aloha robot which has different joints.
- LoRA: If lora fine tuning is used, it seems to be necessary to use the same lora model definition for both train.py and for serve_policy.py, although it is possible we are misunderstanding this. For example, if model=pi0_config.Pi0Config(paligemma_variant="gemma_2b_lora") is used in TrainConfig() (in config.py) for training, we find it necessary to have serve_policy.py use the same TrainConfig() for policy rollouts.
- joint_flip_mask: In aloha_policy.py, some joint angles are multiplied by -1 to make joint directions consistent with those expected by the pi0 policy. For the original Aloha robot, this required the shoulder and elbow joints (numbers 1 and 2) to be flipped. For the newer Trossen AI Stationary we believe that the shoulder still needs flipping, but the elbow does not. Also, we do not use the gripper transform in aloha_policy.py, just joint_flip_mask. See adapt_trossen_to_pi in our version of aloha_policy.py, and see more below.
- image resize: The pi0 model wants 224x224 size images. To resize, openpi uses images.tools.resize_with_pad, so in the trossen_ai example main.py file, we changed the cv2.resize, which does not pad, to the images_tools version, see more below.
- sim to real joint calibration: As discussed in the calibration section, below, the real and simulated robot joints are just slightly out of alignment. Hence, for sim to real to work well, we adjusted actions for the right robot arm's joints 1 and 2 using the above calibration: action[7+1]-=0.025 and action[7+2]+=0.025. For base joint 0, we also needed to adjust action[7+0]=1.05*(action[7+0]+0.01). For the base, this angular shift compensates for the shift in location of the sim vs real bases. On the other hand, the small multiplier is a mystery, but works.
- sim to real home pose: It is important to use the same initial pose for the real and simulated robots. We achieve this using the home_pose variable we added to the (deprecated) TrossenAIStationaryRobotConfig. The value needed for our sim to real experiments is home_pose=[0, np.pi/12, np.pi/12, 0, 0, 0, 0.044].
- multiple task prompts: To train a dataset with individual task prompts for each episode, such as in the dataset trossen_ai_stationary_pick_and_place_07, below, a couple of additional lines are needed in TrainConfig, as shown in Training Details, where the line base_config= ..., and the line "prompt": ... need to be uncommented.
- high-level control: Although pi0.5 can train a high-level prompt to produce the appropriate low-level prompt for sub-task control, this option does not seem to be available in the openpi. Instead we use Gemini Robotics ER-1.5 to control our pi0.5 model by feeding pi0.5 sub-task prompts. See High-level Control and Sub-task Learning for more details.
- single arm tasks: In datasets which use only one of the Trossen AI Stationary arms, the state and action standard deviations for the unused arm can be very small or zero in norm_stats.json. This causes the normalized states and actions to blow up leading to huge losses during training and extreme sensitivity to noise/vibration during robot rollouts. pi0 divides by the standard deviation to normalize, so our solution is to replace the left arm standard deviations in the norm_stats.json file with 0.01. pi05, on the other hand, divides by (q99-q01) where q01 and q99 are quantiles, hence we set q01=mean-0.5 and q99=mean+0.5 for the left arm actions and states.
- image crop: The image transform used by openpi takes the 640x480 RealSensse camera image and transforms it using 'resize with padding' to 224x224, which is the size expected by PaliGemma. Unfortunately, this pads the images with black space, and hence wastes image tokens. We added CenterCropImages in transforms.py and added a few lines to the TrainConfig in config.py. CenterCropImages performs a square crop so there is no padding. Also, by cropping only the image center, it effectively zooms in for higher resolution.
- policy improvement using human interventions: This DAgger-style approach has produced the most improvement in performance among the approaches tried here, such as enlarging the imitation learning dataset.
Isaac GR00T
We added a trossen_ai example file and made a few other updates to our Isaac GR00T N1.7 fork.
- gpus: The RTX 5090 GPU on our local Ubuntu computer has sufficient memory, 32 GB VRAM, for inference models, but not for training, which seems to require greater than 40GB or so. Therefore we rent H100s or better gpus with at least 80GB of VRAM.
- training time: A rented H100 runs a 30K step training runs in about 5 hours, but checkpoint I/O can slow that down quite a bit.
- config and modality files: To train a GR00T model on a lerobot dataset for the Trossen Stationary AI robot, it was necessary to add a
trossen_ai_config.pyfile which tells GR00T what robot data formats to expect. We put this file in thetrossen_aiexample folder in our fork. In addition, it is necessary to add amodality.jsonfile inside the meta directory of each lerobot dataset. To see ours look at this transfer-cube dataset. - robot shaking: A GR00T N1.7 policy, trained on a 40mm transfer-cube dataset, produced a policy that shook the robot arms violently. Adding EMA low-pass action filtering at inference time helped quite a bit but was insufficient. We also limited how large a 1-step action change is allowed using the robot control variable
max_relative_target. Finally, we increased the action chunk length during training and inference from 16 to 32. All three of these changes together were needed to fully smooth out the policy.
Have a fix that isn't on this list? Or one of these doesn't match your experience? We'd be glad to hear from you — get in touch.