Implementation Details: openpi

Training Details

Training configs are in src/openpi/training/config.py in the openpi repository.

LoRA fine-tuning config: Below is the training configuration used to learn the simulated robot transfer-cube task, available on huggingface. This config is for LoRA fine-tuning. For full fine-tune, use model=pi0_config.Pi0Config() and remove freeze_filter and ema_decay lines.

TrainConfig(
    name="pi0_aloha_sim_trossen_ai_mem_finetune_v2",
    model=pi0_config.Pi0Config(paligemma_variant="gemma_2b_lora", action_expert_variant="gemma_300m_lora"),
    data=LeRobotAlohaDataConfig(
        repo_id="ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13",
        #base_config=DataConfig(prompt_from_task=True), #add for individual task prompts
        default_prompt="Transfer cube",
        use_delta_joint_actions=False,
        adapt_to_pi=False,
        adapt_trossen_to_pi=True, # see above
        repack_transforms=_transforms.Group(
            inputs=[
                _transforms.RepackTransform(
                    {
                        "images": {
                            "cam_high": "observation.images.cam_high",
                            "cam_low": "observation.images.cam_low",
                            "cam_left_wrist": "observation.images.cam_left_wrist",
                            "cam_right_wrist": "observation.images.cam_right_wrist",
                        },
                        "state": "observation.state",
                        "actions": "action",
                        #"prompt": "prompt", #add for individual task prompts
                    }
                )
            ]
        ),
    ),
    weight_loader = weight_loaders.CheckpointWeightLoader("gs://openpi-assets/checkpoints/pi0_base/params"),
    num_train_steps=20_000,
    freeze_filter=pi0_config.Pi0Config(
        paligemma_variant="gemma_2b_lora", action_expert_variant="gemma_300m_lora"
    ).get_freeze_filter(),
    # Turn off EMA for LoRA finetuning.
    ema_decay=None,
),

pi0.5 full fine-tuning, multi-task config: Below is a training configuration for full fine-tuning of the pi0.5 model. It is designed to train multiple sub-tasks, each with its own prompt. To train with multiple prompts, the lines base_config=... and "prompt": "prompt", were added. Also, the dataset should have prompts/tasks in episode.jsonl and in task.jsonl.

We also add an optional image crop transform, CenterCropImages, which is defined in transforms.py in our openpi. This crops the image prior to the resize-with-padding used by openpi to avoid black space added to the image, which is a waste of image tokens. The policy_metadata is used to let main.py in the trossen_ai example know what crop is being used.

TrainConfig(
    name="pi05_aloha_sim_trossen_ai_full_finetune_v6",  # renamed to avoid collision
    model=pi0_config.Pi0Config(pi05=True),
    data=LeRobotAlohaDataConfig(
        repo_id="ANRedlich/trossen_ai_stationary_pick_and_place_09",
        base_config=DataConfig(prompt_from_task=True), #needed for multi-prompt
        default_prompt="pick and place",
        use_delta_joint_actions=False,
        adapt_to_pi=False,
        adapt_trossen_to_pi=True,
        repack_transforms=_transforms.Group(
            inputs=[
                _transforms.RepackTransform(
                    {
                        "images": {
                            "cam_high": "observation.images.cam_high",
                            "cam_low": "observation.images.cam_low",
                            "cam_left_wrist": "observation.images.cam_left_wrist",
                            "cam_right_wrist": "observation.images.cam_right_wrist",
                        },
                        "state": "observation.state",
                        "actions": "action",
                        "prompt": "prompt", #needed for multi-prompt
                    }
                ),
                # Crop base camera to square (eliminates black bars, no zoom since 480 = min dim)
                _transforms.CenterCropImages(
                    camera_name_patterns=["cam_high"],
                    crop_size=480,
                ),
                # Crop wrist cameras for ~1.8× zoom
                _transforms.CenterCropImages(
                    camera_name_patterns=["wrist"],
                    crop_size=480,
                ),
            ]
        ),
    ),
    weight_loader=weight_loaders.CheckpointWeightLoader("gs://openpi-assets/checkpoints/pi05_base/params"),
    num_train_steps=40_000,
    save_interval=5000,
    policy_metadata={
        "crop_cameras": {
            "cam_high": 480,
            "wrist": 480,
        }
    },
),

Normalization statistics: Before training on a dataset, it is first necessary to create a statistics file, norm_stats.json, for that dataset. To do so, one runs compute_norm_stats.py for a config file with the appropriate repo_id. For example,
```
uv run scripts/compute_norm_stats.py --config-name=pi0_aloha_sim_trossen_ai_mem_finetune_v2
```
will create a stats file with the following path

openpi/assets/pi0_aloha_sim_trossen_ai_mem_finetune_v2/ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13/norm_stats.json
for the dataset

ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13
It is very important for train.py to use thenorm_stats.json for the dataset it is training on. It is also important for serve_policy.py, below, to use the norm_stats.json file that was used to train the model it is serving.
Run training: Run train.py with the appropriate config file, for example,
```
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_aloha_sim_trossen_ai_mem_finetune_v2 --exp-name=trossen_ai_stationary_x1
```
For this example, the model checkpoint will be placed in

openpi/checkpoints/pi0_aloha_sim_trossen_ai_mem_finetune_v2/trossen_ai_stationary_x1/19999/

As mentioned above, train.py must use the correct norm_stats file. If train.py instead insists on loading the default norm_stats file instead of the dataset norm_stats, calculated above, it might be necessary to make a small change to src/openpi/policies/policy_config.py:
```
#in create_trained_policy(...):
    ...
    if data_config.norm_stats is None: #added
        norm_stats = _checkpoints.load_norm_stats(checkpoint_dir / "assets", data_config.asset_id)
    else: #added
        norm_stats = data_config.norm_stats #added
```

Back to top

Evaluation Details

In serve_policy.py:

class EnvMode(enum.Enum):
    ...
    #add this line
    ALOHA_SIM_TROSSEN_AI_FINETUNE = "aloha_sim_trossen_ai_finetune"
    ...
DEFAULT_CHECKPOINT: dict[EnvMode, Checkpoint] = {
    ...
    #add:
    EnvMode.ALOHA_SIM_TROSSEN_AI_FINETUNE: Checkpoint(
        config="pi0_aloha_sim_trossen_ai_mem_finetune_v2",
        dir="./checkpoints/pi0_aloha_sim_trossen_ai_mem_finetune_v2/trossen_ai_stationary_x1/19999"
    ),

Run the policy server, but make sure it uses the norm_stats, above, for the specific dataset, and also uses the above checkpoint. (Note: --no-sync was needed to keep uv from re-installing the default gym-aloha in place of ours.):

uv run --no-sync scripts/serve_policy.py --env ALOHA_SIM_TROSSEN_AI_FINETUNE

In a second terminal, run the real robot control example. (See Sim to real joint calibration, below, for adjust_for_sim_to_real):

MUJOCO_GL=egl uv run python examples/trossen_ai/main.py --adjust_for_sim_to_real=True

Back to top

Trossen openpi fork vs ours

To get the same training results as us, and to use our pi0 models with the Trossen fork of openpi — recommended — the following small mods are required. We do not know if these mods are correct and we continue to experiment with them. For more discussion of the reasons for these mods, see Notes & Optimizations.

Joint_flip_mask: We could be wrong, but we are currently flipping the shoulder, not the elbow, and not transforming the gripper:

in training/config.py:
    TrainConfig(
        ...
        data=LeRobotAlohaDataConfig(
            adapt_to_pi=False,
            ...
becomes:
    TrainConfig(
        ...
        data=LeRobotAlohaDataConfig(
            adapt_to_pi=False,
            adapt_trossen_to_pi=True
            ...

also, in policies/aloha_policy.py:
    class AlohaInputs(transforms.DataTransformFn):
        adapt_trossen_to_pi: bool = False (added by us)
            ...
    def _joint_flip_mask_trossen() -> np.ndarray:
        """Joints 1, both left and right, get flipped by -1"""
        return np.array([1, -1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1])
            ...
    def _decode_state(state: np.ndarray, *, adapt_to_pi: bool = False, adapt_trossen_to_pi: bool = False) -> np.ndarray:
            ...
        elif adapt_trossen_to_pi:
            state = _joint_flip_mask_trossen() * state
            ...
    def _encode_actions(actions: np.ndarray, *, adapt_to_pi: bool = False, adapt_trossen_to_pi: bool = False) -> np.ndarray:
            ...
        elif adapt_trossen_to_pi:
            actions = _joint_flip_mask_trossen() * actions
            ...
    def _encode_actions_inv(actions: np.ndarray, *, adapt_to_pi: bool = False, adapt_trossen_to_pi: bool = False) -> np.ndarray:
            ...
        elif adapt_trossen_to_pi:
            actions = _joint_flip_mask_trossen() * actions
            ...

Image resize: We believe (could be wrong) that image resizing during training uses resize_with_pad. Also, our datasets and robot images (with our older lerobot code) are RGB, so we made a few mods to main.py in the trossen_ai example:

# Transform and resize images from all cameras
    for cam in cameras:
        image_hwc = observation_dict[cam]
        # convert BGR to RGB
        image_resized = cv2.resize(image_hwc, (224, 224))
        image_rgb = cv2.cvtColor(image_resized, cv2.COLOR_BGR2RGB)
        image_chw = np.transpose(image_rgb, (2, 0, 1))
        observation_dict[cam] = image_chw

becomes:

    for cam in cameras:
        image_hwc = observation_dict[cam].numpy()
        image_resized = image_tools.convert_to_uint8(image_tools.resize_with_pad(image_hwc, 224, 224))
        image_chw = np.transpose(image_resized, (2, 0, 1))
        observation_dict[cam] = image_chw

Sim to real joint calibration: To align sim to real robots:

in examples/trossen_ai/main.py->run_episode(...)

    self.execute_action(a_t)

becomes:

    if self.adjust_for_sim_to_real:
        a_t = a_t.copy()
        a_t[7] = 1.05 * (a_t[7] + 0.01)
        a_t[8] = a_t[8] - 0.025
        a_t[9] = a_t[9] + 0.025
    self.execute_action(a_t)

Sim to real home pose: The sim default home position in our dataset and in our gym-aloha is different from the default lerobot staged_position:

in packages/lerobot_robot_trossen/src/lerobot_robot_trossen/config_widowxai_follower.py

    staged_positions: list[float] = field(
        default_factory=lambda: [0, np.pi / 3, np.pi / 6, np.pi / 5, 0, 0, 0]
    )

becomes:

in examples/trossen_ai/main.py:

    robot_config = TrossenAIStationaryRobotConfig(
        max_relative_target,
        home_pose=[0, np.pi/12, np.pi/12, 0, 0, 0, 0.044]
    )

Back to top

Using pi0 models from huggingface

openpi does not download and use huggingface models directly, but huggingface models can easily be used in openpi as follows using trossen_ai_stationary_sim_pi013 as an example.

Download model to local file:

huggingface-cli download ANRedlich/trossen_ai_stationary_sim_pi013 --local-dir ~/openpi/openpi/checkpoints/hf_checkpoint

Add assets to TrainConfig in training/config.py:

TrainConfig(
    name="pi0_aloha_sim_trossen_ai_mem_finetune_v2",
    model=pi0_config.Pi0Config(...),
    data=LeRobotAlohaDataConfig(
        repo_id="ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13",
        assets=AssetsConfig( #note: only use this to over-ride default assets location
            assets_dir="./checkpoints/hf_checkpoint/assets",
            asset_id="ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13",
        ),

In serve_policy.py point dir to the downloaded model:

EnvMode.ALOHA_SIM_TROSSEN_AI_FINETUNE: Checkpoint(
    config="pi0_aloha_sim_trossen_ai_mem_finetune_v2",
    dir="./checkpoints/hf_checkpoint"
),

Back to top

Openpi implementation details:

Training Details

Evaluation Details

Trossen openpi fork vs ours

Using pi0 models from huggingface