diff --git a/README.md b/README.md index 9b847484d..703e64881 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@
Hot News: New tutorial: Getting starting with Real-World Robots
+Hot new tutorial: Getting started with real-world robots
We just dropped an in-depth tutorial on how to build your own robot!
Teach it new skills by showing it a few moves with just a laptop.
Then watch your homemade robot act autonomously 🤯
+For more info, see our thread on X or our tutorial page.
State-of-the-art Machine Learning for real-world robotics
+LeRobot: State-of-the-art AI for real-world robotics
--- @@ -265,13 +267,20 @@ checkpoints │ └── training_state.pth # optimizer/scheduler/rng state and training step ``` +To resume training from a checkpoint, you can add these to the `train.py` python command: +```bash + hydra.run.dir=your/original/experiment/dir resume=true +``` + +It will load the pretrained model, optimizer and scheduler states for training. For more information please see our tutorial on training resumption [here](https://github.com/huggingface/lerobot/blob/main/examples/5_resume_training.md). + To use wandb for logging training and evaluation curves, make sure you've run `wandb login` as a one-time setup step. Then, when running the training command above, enable WandB in the configuration by adding: ```bash wandb.enable=true ``` -A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser: +A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser. Please also check [here](https://github.com/huggingface/lerobot/blob/main/examples/4_train_policy_with_script.md#typical-logs-and-metrics) for the explaination of some commonly used metrics in logs.  diff --git a/examples/4_train_policy_with_script.md b/examples/4_train_policy_with_script.md index db9840a79..05c865dc3 100644 --- a/examples/4_train_policy_with_script.md +++ b/examples/4_train_policy_with_script.md @@ -170,6 +170,36 @@ python lerobot/scripts/train.py --config-dir outputs/train/my_experiment/checkpo Note that you may still use the regular syntax for config parameter overrides (eg: by adding `training.offline_steps=200000`). +## Typical logs and metrics + +When you start the training process, you will first see your full configuration being printed in the terminal. You can check it to make sure that you config it correctly and your config is not overrided by other files. The final configuration will also be saved with the checkpoint. + +After that, you will see training log like this one: + +``` +INFO 2024-08-14 13:35:12 ts/train.py:192 step:0 smpl:64 ep:1 epch:0.00 loss:1.112 grdn:15.387 lr:2.0e-07 updt_s:1.738 data_s:4.774 +``` + +or evaluation log like: + +``` +INFO 2024-08-14 13:38:45 ts/train.py:226 step:100 smpl:6K ep:52 epch:0.25 ∑rwrd:20.693 success:0.0% eval_s:120.266 +``` + +These logs will also be saved in wandb if `wandb.enable` is set to `true`. Here are the meaning of some abbreviations: + +- `smpl`: number of samples seen during training. +- `ep`: number of episodes seen during training. An episode contains multiple samples in a complete manipulation task. +- `epch`: number of time all unique samples are seen (epoch). +- `grdn`: gradient norm. +- `∑rwrd`: compute the sum of rewards in every evaluation episode and then take an average of them. +- `success`: average success rate of eval episodes. Reward and success are usually different except for the sparsing reward setting, where reward=1 only when the task is completed successfully. +- `eval_s`: time to evaluate the policy in the environment, in second. +- `updt_s`: time to update the network parameters, in second. +- `data_s`: time to load a batch of data, in second. + +Some metrics are useful for initial performance profiling. For example, if you find the current GPU utilization is low via the `nvidia-smi` command and `data_s` sometimes is too high, you may need to modify batch size or number of dataloading workers to accelerate dataloading. We also recommend [pytorch profiler](https://github.com/huggingface/lerobot?tab=readme-ov-file#improve-your-code-with-profiling) for detailed performance probing. + --- So far we've seen how to train Diffusion Policy for PushT and ACT for ALOHA. Now, what if we want to train ACT for PushT? Well, there are aspects of the ACT configuration that are specific to the ALOHA environments, and these happen to be incompatible with PushT. Therefore, trying to run the following will almost certainly raise an exception of sorts (eg: feature dimension mismatch): diff --git a/examples/7_get_started_with_real_robot.md b/examples/7_get_started_with_real_robot.md index 3a4b59bb9..f738ec29a 100644 --- a/examples/7_get_started_with_real_robot.md +++ b/examples/7_get_started_with_real_robot.md @@ -13,9 +13,9 @@ By following these steps, you'll be able to replicate tasks like picking up a Le Although this tutorial is general and can be easily adapted to various types of robots by changing the configuration, it is specifically based on the [Koch v1.1](https://github.com/jess-moss/koch-v1-1), an affordable robot. The Koch v1.1 consists of a leader arm and a follower arm, each with 6 motors. It can work with one or several cameras to record the scene, which serve as visual sensors for the robot. -During the data collection phase, you'll control the follower arm by moving the leader arm, a process known as "teleoperation." This technique is used to collect robot trajectories. Afterward, you'll train a neural network to imitate these trajectories and deploy the network to enable your robot to operate autonomously. +During the data collection phase, you will control the follower arm by moving the leader arm. This process is known as "teleoperation." This technique is used to collect robot trajectories. Afterward, you'll train a neural network to imitate these trajectories and deploy the network to enable your robot to operate autonomously. -If you encounter any issues at any step of the tutorial, feel free to seek help on [Discord](https://discord.com/invite/s3KuuzsPFb). +If you encounter any issues at any step of the tutorial, feel free to seek help on [Discord](https://discord.com/invite/s3KuuzsPFb) or don't hesitate to iterate with us on the tutorial by creating issues or pull requests. Thanks! ## 1. Order and Assemble your Koch v1.1 @@ -25,9 +25,9 @@ Follow the sourcing and assembling instructions provided on the [Koch v1.1 Githu
-For a visual walkthrough of the assembly process, you can refer to [this video tutorial](https://youtu.be/5mdxvMlxoos).
+For a visual walkthrough of the assembly process, you can refer to [this video tutorial](https://youtu.be/8nQIg9BwwTk).
-## 2. Configure motors, Calibrate arms, Teleoperate your Koch v1.1
+## 2. Configure motors, calibrate arms, teleoperate your Koch v1.1
First, install the additional dependencies required for Koch v1.1 by running one of the following commands.
@@ -49,7 +49,7 @@ Finally, connect both arms to your computer via USB. Note that the USB doesn't p
*Copy pasting python code*
-In the upcoming sections, you'll learn about our classes and functions by running some python code, in an interactive session, or by copy-pasting it in a python file. If it's your first time using the tutorial, we highly recommend going through these steps to get deeper intuition about how things work. Once you're more familiar, you can streamline the process by directly running the teleoperate script (which is detailed further in the tutorial):
+In the upcoming sections, you'll learn about our classes and functions by running some python code, in an interactive session, or by copy-pasting it in a python file. If this is your first time using the tutorial., we highly recommend going through these steps to get deeper intuition about how things work. Once you're more familiar, you can streamline the process by directly running the teleoperate script (which is detailed further in the tutorial):
```bash
python lerobot/scripts/control_robot.py teleoperate \
--robot-path lerobot/configs/robot/koch.yaml \
@@ -57,7 +57,7 @@ python lerobot/scripts/control_robot.py teleoperate \
```
It will automatically:
-1. Detect and help you correct any motor configurations issue.
+1. Detect and help you correct any motor configuration issues.
2. Identify any missing calibrations and initiate the calibration procedure.
3. Connect the robot and start teleoperation.
@@ -67,7 +67,7 @@ You can use the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dy
**Instantiate the DynamixelMotorsBus**
-To begin, you'll need to create two instances of the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dynamixel.py), one for each arm, using their corresponding USB ports (e.g. `DynamixelMotorsBus(port="/dev/tty.usbmodem575E0031751"`).
+To begin, create two instances of the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dynamixel.py), one for each arm, using their corresponding USB ports (e.g. `DynamixelMotorsBus(port="/dev/tty.usbmodem575E0031751"`).
To find the correct ports for each arm, run the utility script twice:
```bash
@@ -144,7 +144,7 @@ follower_arm = DynamixelMotorsBus(
*Updating the YAML Configuration File*
-Then, update the port values in the YAML configuration file for the Koch robot at [`lerobot/configs/robot/koch.yaml`](../lerobot/configs/robot/koch.yaml) with the ports you've identified:
+Next, update the port values in the YAML configuration file for the Koch robot at [`lerobot/configs/robot/koch.yaml`](../lerobot/configs/robot/koch.yaml) with the ports you've identified:
```yaml
[...]
leader_arms:
@@ -192,7 +192,7 @@ When you connect the leader arm for the first time, you might see an output simi
Read failed due to communication error on port /dev/tty.usbmodem575E0032081 for group_key ID_shoulder_pan_shoulder_lift_elbow_flex_wrist_flex_wrist_roll_gripper: [TxRxResult] There is no status packet!
/!\ A configuration issue has been detected with your motors:
-If it's the first time that you use these motors, press enter to configure your motors... but before verify that all the cables are connected the proper way. If you find an issue, before making a modification, kill the python process, unplug the power cord to not damage the motors, rewire correctly, then plug the power again and relaunch the script.
+If this is the first time you are using these motors, press enter to configure your motors... but before verify that all the cables are connected the proper way. If you find an issue, before making a modification, kill the python process, unplug the power cord to not damage the motors, rewire correctly, then plug the power again and relaunch the script.
Motor indices detected: {9600: [1]}
@@ -327,37 +327,15 @@ When you connect your robot for the first time, the [`KochRobot`](../lerobot/com
Here are the positions you'll move the follower arm to:
-
- 1. Zero position
-
- 2. Rotated position
-
- 3. Rest position
-
|
|
|
And here are the corresponding positions for the leader arm:
-
- 1. Zero position
-
- 2. Rotated position
-
- 3. Rest position
-
|
|
|
You can watch a [video tutorial of the calibration procedure](https://youtu.be/8drnU9uRY24) for more details.
@@ -774,7 +752,7 @@ Before trying `record`, if you want to push your dataset to the hub, make sure y
```bash
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
```
-Also, store your Hugging Face repositery name in a variable (e.g. `cadene` or `lerobot`). For instance, run this to use your Hugging Face user name as repositery:
+Also, store your Hugging Face repository name in a variable (e.g. `cadene` or `lerobot`). For instance, run this to use your Hugging Face user name as repository:
```bash
HF_USER=$(huggingface-cli whoami | head -n 1)
echo $HF_USER
diff --git a/lerobot/__init__.py b/lerobot/__init__.py
index 8e06435c3..65998e8b3 100644
--- a/lerobot/__init__.py
+++ b/lerobot/__init__.py
@@ -129,6 +129,53 @@ available_real_world_datasets = [
"lerobot/unitreeh1_rearrange_objects",
"lerobot/unitreeh1_two_robot_greeting",
"lerobot/unitreeh1_warehouse",
+ "lerobot/nyu_rot_dataset",
+ "lerobot/utokyo_saytap",
+ "lerobot/imperialcollege_sawyer_wrist_cam",
+ "lerobot/utokyo_xarm_bimanual",
+ "lerobot/tokyo_u_lsmo",
+ "lerobot/utokyo_pr2_opening_fridge",
+ "lerobot/cmu_franka_exploration_dataset",
+ "lerobot/cmu_stretch",
+ "lerobot/asu_table_top",
+ "lerobot/utokyo_pr2_tabletop_manipulation",
+ "lerobot/utokyo_xarm_pick_and_place",
+ "lerobot/ucsd_kitchen_dataset",
+ "lerobot/austin_buds_dataset",
+ "lerobot/dlr_sara_grid_clamp",
+ "lerobot/conq_hose_manipulation",
+ "lerobot/columbia_cairlab_pusht_real",
+ "lerobot/dlr_sara_pour",
+ "lerobot/dlr_edan_shared_control",
+ "lerobot/ucsd_pick_and_place_dataset",
+ "lerobot/berkeley_cable_routing",
+ "lerobot/nyu_franka_play_dataset",
+ "lerobot/austin_sirius_dataset",
+ "lerobot/cmu_play_fusion",
+ "lerobot/berkeley_gnm_sac_son",
+ "lerobot/nyu_door_opening_surprising_effectiveness",
+ "lerobot/berkeley_fanuc_manipulation",
+ "lerobot/jaco_play",
+ "lerobot/viola",
+ "lerobot/kaist_nonprehensile",
+ "lerobot/berkeley_mvp",
+ "lerobot/uiuc_d3field",
+ "lerobot/berkeley_gnm_recon",
+ "lerobot/austin_sailor_dataset",
+ "lerobot/utaustin_mutex",
+ "lerobot/roboturk",
+ "lerobot/stanford_hydra_dataset",
+ "lerobot/berkeley_autolab_ur5",
+ "lerobot/stanford_robocook",
+ "lerobot/toto",
+ "lerobot/fmb",
+ "lerobot/droid_100",
+ "lerobot/berkeley_rpt",
+ "lerobot/stanford_kuka_multimodal_dataset",
+ "lerobot/iamlab_cmu_pickup_insert",
+ "lerobot/taco_play",
+ "lerobot/berkeley_gnm_cory_hall",
+ "lerobot/usc_cloth_sim",
]
available_datasets = list(
diff --git a/lerobot/common/datasets/compute_stats.py b/lerobot/common/datasets/compute_stats.py
index a69bc5734..208284465 100644
--- a/lerobot/common/datasets/compute_stats.py
+++ b/lerobot/common/datasets/compute_stats.py
@@ -40,6 +40,10 @@ def get_stats_einops_patterns(dataset, num_workers=0):
stats_patterns = {}
for key, feats_type in dataset.features.items():
+ # NOTE: skip language_instruction embedding in stats computation
+ if key == "language_instruction":
+ continue
+
# sanity check that tensors are not float64
assert batch[key].dtype != torch.float64
diff --git a/lerobot/common/datasets/push_dataset_to_hub/_download_raw.py b/lerobot/common/datasets/push_dataset_to_hub/_download_raw.py
index 36474dd1f..edeaf0933 100644
--- a/lerobot/common/datasets/push_dataset_to_hub/_download_raw.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/_download_raw.py
@@ -60,8 +60,8 @@ AVAILABLE_RAW_REPO_IDS = {
"lerobot-raw/aloha_static_vinh_cup_left_raw": "aloha_hdf5",
"lerobot-raw/aloha_static_vinh_cup_raw": "aloha_hdf5",
"lerobot-raw/aloha_static_ziploc_slide_raw": "aloha_hdf5",
- "lerobot-raw/pusht_raw": "pusht_zarr",
"lerobot-raw/umi_cup_in_the_wild_raw": "umi_zarr",
+ "lerobot-raw/pusht_raw": "pusht_zarr",
"lerobot-raw/unitreeh1_fold_clothes_raw": "aloha_hdf5",
"lerobot-raw/unitreeh1_rearrange_objects_raw": "aloha_hdf5",
"lerobot-raw/unitreeh1_two_robot_greeting_raw": "aloha_hdf5",
@@ -70,6 +70,74 @@ AVAILABLE_RAW_REPO_IDS = {
"lerobot-raw/xarm_lift_medium_replay_raw": "xarm_pkl",
"lerobot-raw/xarm_push_medium_raw": "xarm_pkl",
"lerobot-raw/xarm_push_medium_replay_raw": "xarm_pkl",
+ "lerobot-raw/fractal20220817_data_raw": "openx_rlds.fractal20220817_data",
+ "lerobot-raw/kuka_raw": "openx_rlds.kuka",
+ "lerobot-raw/bridge_openx_raw": "openx_rlds.bridge_openx",
+ "lerobot-raw/taco_play_raw": "openx_rlds.taco_play",
+ "lerobot-raw/jaco_play_raw": "openx_rlds.jaco_play",
+ "lerobot-raw/berkeley_cable_routing_raw": "openx_rlds.berkeley_cable_routing",
+ "lerobot-raw/roboturk_raw": "openx_rlds.roboturk",
+ "lerobot-raw/nyu_door_opening_surprising_effectiveness_raw": "openx_rlds.nyu_door_opening_surprising_effectiveness",
+ "lerobot-raw/viola_raw": "openx_rlds.viola",
+ "lerobot-raw/berkeley_autolab_ur5_raw": "openx_rlds.berkeley_autolab_ur5",
+ "lerobot-raw/toto_raw": "openx_rlds.toto",
+ "lerobot-raw/language_table_raw": "openx_rlds.language_table",
+ "lerobot-raw/columbia_cairlab_pusht_real_raw": "openx_rlds.columbia_cairlab_pusht_real",
+ "lerobot-raw/stanford_kuka_multimodal_dataset_raw": "openx_rlds.stanford_kuka_multimodal_dataset",
+ "lerobot-raw/nyu_rot_dataset_raw": "openx_rlds.nyu_rot_dataset",
+ "lerobot-raw/io_ai_tech_raw": "openx_rlds.io_ai_tech",
+ "lerobot-raw/stanford_hydra_dataset_raw": "openx_rlds.stanford_hydra_dataset",
+ "lerobot-raw/austin_buds_dataset_raw": "openx_rlds.austin_buds_dataset",
+ "lerobot-raw/nyu_franka_play_dataset_raw": "openx_rlds.nyu_franka_play_dataset",
+ "lerobot-raw/maniskill_dataset_raw": "openx_rlds.maniskill_dataset",
+ "lerobot-raw/furniture_bench_dataset_raw": "openx_rlds.furniture_bench_dataset",
+ "lerobot-raw/cmu_franka_exploration_dataset_raw": "openx_rlds.cmu_franka_exploration_dataset",
+ "lerobot-raw/ucsd_kitchen_dataset_raw": "openx_rlds.ucsd_kitchen_dataset",
+ "lerobot-raw/ucsd_pick_and_place_dataset_raw": "openx_rlds.ucsd_pick_and_place_dataset",
+ "lerobot-raw/spoc_raw": "openx_rlds.spoc",
+ "lerobot-raw/austin_sailor_dataset_raw": "openx_rlds.austin_sailor_dataset",
+ "lerobot-raw/austin_sirius_dataset_raw": "openx_rlds.austin_sirius_dataset",
+ "lerobot-raw/bc_z_raw": "openx_rlds.bc_z",
+ "lerobot-raw/utokyo_pr2_opening_fridge_raw": "openx_rlds.utokyo_pr2_opening_fridge",
+ "lerobot-raw/utokyo_pr2_tabletop_manipulation_raw": "openx_rlds.utokyo_pr2_tabletop_manipulation",
+ "lerobot-raw/utokyo_xarm_pick_and_place_raw": "openx_rlds.utokyo_xarm_pick_and_place",
+ "lerobot-raw/utokyo_xarm_bimanual_raw": "openx_rlds.utokyo_xarm_bimanual",
+ "lerobot-raw/utokyo_saytap_raw": "openx_rlds.utokyo_saytap",
+ "lerobot-raw/robo_net_raw": "openx_rlds.robo_net",
+ "lerobot-raw/robo_set_raw": "openx_rlds.robo_set",
+ "lerobot-raw/berkeley_mvp_raw": "openx_rlds.berkeley_mvp",
+ "lerobot-raw/berkeley_rpt_raw": "openx_rlds.berkeley_rpt",
+ "lerobot-raw/kaist_nonprehensile_raw": "openx_rlds.kaist_nonprehensile",
+ "lerobot-raw/stanford_mask_vit_raw": "openx_rlds.stanford_mask_vit",
+ "lerobot-raw/tokyo_u_lsmo_raw": "openx_rlds.tokyo_u_lsmo",
+ "lerobot-raw/dlr_sara_pour_raw": "openx_rlds.dlr_sara_pour",
+ "lerobot-raw/dlr_sara_grid_clamp_raw": "openx_rlds.dlr_sara_grid_clamp",
+ "lerobot-raw/dlr_edan_shared_control_raw": "openx_rlds.dlr_edan_shared_control",
+ "lerobot-raw/asu_table_top_raw": "openx_rlds.asu_table_top",
+ "lerobot-raw/stanford_robocook_raw": "openx_rlds.stanford_robocook",
+ "lerobot-raw/imperialcollege_sawyer_wrist_cam_raw": "openx_rlds.imperialcollege_sawyer_wrist_cam",
+ "lerobot-raw/iamlab_cmu_pickup_insert_raw": "openx_rlds.iamlab_cmu_pickup_insert",
+ "lerobot-raw/uiuc_d3field_raw": "openx_rlds.uiuc_d3field",
+ "lerobot-raw/utaustin_mutex_raw": "openx_rlds.utaustin_mutex",
+ "lerobot-raw/berkeley_fanuc_manipulation_raw": "openx_rlds.berkeley_fanuc_manipulation",
+ "lerobot-raw/cmu_playing_with_food_raw": "openx_rlds.cmu_playing_with_food",
+ "lerobot-raw/cmu_play_fusion_raw": "openx_rlds.cmu_play_fusion",
+ "lerobot-raw/cmu_stretch_raw": "openx_rlds.cmu_stretch",
+ "lerobot-raw/berkeley_gnm_recon_raw": "openx_rlds.berkeley_gnm_recon",
+ "lerobot-raw/berkeley_gnm_cory_hall_raw": "openx_rlds.berkeley_gnm_cory_hall",
+ "lerobot-raw/berkeley_gnm_sac_son_raw": "openx_rlds.berkeley_gnm_sac_son",
+ "lerobot-raw/droid_raw": "openx_rlds.droid",
+ "lerobot-raw/droid_100_raw": "openx_rlds.droid100",
+ "lerobot-raw/fmb_raw": "openx_rlds.fmb",
+ "lerobot-raw/dobbe_raw": "openx_rlds.dobbe",
+ "lerobot-raw/usc_cloth_sim_raw": "openx_rlds.usc_cloth_sim",
+ "lerobot-raw/plex_robosuite_raw": "openx_rlds.plex_robosuite",
+ "lerobot-raw/conq_hose_manipulation_raw": "openx_rlds.conq_hose_manipulation",
+ "lerobot-raw/vima_raw": "openx_rlds.vima",
+ "lerobot-raw/robot_vqa_raw": "openx_rlds.robot_vqa",
+ "lerobot-raw/mimic_play_raw": "openx_rlds.mimic_play",
+ "lerobot-raw/tidybot_raw": "openx_rlds.tidybot",
+ "lerobot-raw/eth_agent_affordances_raw": "openx_rlds.eth_agent_affordances",
}
@@ -110,7 +178,7 @@ def download_all_raw_datasets(data_dir: Path | None = None):
def main():
parser = argparse.ArgumentParser(
description=f"""A script to download raw datasets from Hugging Face hub to a local directory. Here is a
- non exhaustive list of available repositories to use in `--repo-id`: {AVAILABLE_RAW_REPO_IDS}""",
+ non exhaustive list of available repositories to use in `--repo-id`: {list(AVAILABLE_RAW_REPO_IDS.keys())}""",
)
parser.add_argument(
diff --git a/lerobot/common/datasets/push_dataset_to_hub/openx/configs.yaml b/lerobot/common/datasets/push_dataset_to_hub/openx/configs.yaml
new file mode 100644
index 000000000..9bbc46499
--- /dev/null
+++ b/lerobot/common/datasets/push_dataset_to_hub/openx/configs.yaml
@@ -0,0 +1,640 @@
+OPENX_DATASET_CONFIGS:
+ fractal20220817_data:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - base_pose_tool_reached
+ - gripper_closed
+ fps: 3
+
+ kuka:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - clip_function_input/base_pose_tool_reached
+ - gripper_closed
+ fps: 10
+
+ bridge_openx:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - EEF_state
+ - gripper_state
+ fps: 5
+
+ taco_play:
+ image_obs_keys:
+ - rgb_static
+ - rgb_gripper
+ depth_obs_keys:
+ - depth_static
+ - depth_gripper
+ state_obs_keys:
+ - state_eef
+ - state_gripper
+ fps: 15
+
+ jaco_play:
+ image_obs_keys:
+ - image
+ - image_wrist
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state_eef
+ - state_gripper
+ fps: 10
+
+ berkeley_cable_routing:
+ image_obs_keys:
+ - image
+ - top_image
+ - wrist45_image
+ - wrist225_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - robot_state
+ fps: 10
+
+ roboturk:
+ image_obs_keys:
+ - front_rgb
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - null
+ fps: 10
+
+ nyu_door_opening_surprising_effectiveness:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - null
+ fps: 3
+
+ viola:
+ image_obs_keys:
+ - agentview_rgb
+ - eye_in_hand_rgb
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - joint_states
+ - gripper_states
+ fps: 20
+
+ berkeley_autolab_ur5:
+ image_obs_keys:
+ - image
+ - hand_image
+ depth_obs_keys:
+ - image_with_depth
+ state_obs_keys:
+ - state
+ fps: 5
+
+ toto:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 30
+
+ language_table:
+ image_obs_keys:
+ - rgb
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - effector_translation
+ fps: 10
+
+ columbia_cairlab_pusht_real:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - robot_state
+ fps: 10
+
+ stanford_kuka_multimodal_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - depth_image
+ state_obs_keys:
+ - ee_position
+ - ee_orientation
+ fps: 20
+
+ nyu_rot_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 3
+
+ io_ai_tech:
+ image_obs_keys:
+ - image
+ - image_fisheye
+ - image_left_side
+ - image_right_side
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 3
+
+ stanford_hydra_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 10
+
+ austin_buds_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 20
+
+ nyu_franka_play_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - image_additional_view
+ depth_obs_keys:
+ - depth
+ - depth_additional_view
+ state_obs_keys:
+ - eef_state
+ fps: 3
+
+ maniskill_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - depth
+ - wrist_depth
+ state_obs_keys:
+ - tcp_pose
+ - gripper_state
+ fps: 20
+
+ furniture_bench_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ cmu_franka_exploration_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - highres_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - null
+ fps: 10
+
+ ucsd_kitchen_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - joint_state
+ fps: 2
+
+ ucsd_pick_and_place_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 3
+
+ spoc:
+ image_obs_keys:
+ - image
+ - image_manipulation
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - null
+ fps: 3
+
+ austin_sailor_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 20
+
+ austin_sirius_dataset_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 20
+
+ bc_z:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - present/xyz
+ - present/axis_angle
+ - present/sensed_close
+ fps: 10
+
+ utokyo_pr2_opening_fridge_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 10
+
+ utokyo_pr2_tabletop_manipulation_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 10
+
+ utokyo_xarm_pick_and_place_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - image2
+ - hand_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - end_effector_pose
+ fps: 10
+
+ utokyo_xarm_bimanual_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - pose_r
+ fps: 10
+
+ robo_net:
+ image_obs_keys:
+ - image
+ - image1
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 1
+
+ robo_set:
+ image_obs_keys:
+ - image_left
+ - image_right
+ - image_wrist
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ - state_velocity
+ fps: 5
+
+ berkeley_mvp_converted_externally_to_rlds:
+ image_obs_keys:
+ - hand_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - gripper
+ - pose
+ - joint_pos
+ fps: 5
+
+ berkeley_rpt_converted_externally_to_rlds:
+ image_obs_keys:
+ - hand_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - joint_pos
+ - gripper
+ fps: 30
+
+ kaist_nonprehensile_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ stanford_mask_vit_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+
+ tokyo_u_lsmo_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 10
+
+ dlr_sara_pour_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ dlr_sara_grid_clamp_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ dlr_edan_shared_control_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 5
+
+ asu_table_top_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 12.5
+
+ stanford_robocook_converted_externally_to_rlds:
+ image_obs_keys:
+ - image_1
+ - image_2
+ depth_obs_keys:
+ - depth_1
+ - depth_2
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 5
+
+ imperialcollege_sawyer_wrist_cam:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ iamlab_cmu_pickup_insert_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - joint_state
+ - gripper_state
+ fps: 20
+
+ uiuc_d3field:
+ image_obs_keys:
+ - image_1
+ - image_2
+ depth_obs_keys:
+ - depth_1
+ - depth_2
+ state_obs_keys:
+ - null
+ fps: 1
+
+ utaustin_mutex:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 20
+
+ berkeley_fanuc_manipulation:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - joint_state
+ - gripper_state
+ fps: 10
+
+ cmu_playing_with_food:
+ image_obs_keys:
+ - image
+ - finger_vision_1
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 10
+
+ cmu_play_fusion:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 5
+
+ cmu_stretch:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - eef_state
+ - gripper_state
+ fps: 10
+
+ berkeley_gnm_recon:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ - position
+ - yaw
+ fps: 3
+
+ berkeley_gnm_cory_hall:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ - position
+ - yaw
+ fps: 5
+
+ berkeley_gnm_sac_son:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ - position
+ - yaw
+ fps: 10
+
+ droid:
+ image_obs_keys:
+ - exterior_image_1_left
+ - exterior_image_2_left
+ - wrist_image_left
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - proprio
+ fps: 15
+
+ droid_100:
+ image_obs_keys:
+ - exterior_image_1_left
+ - exterior_image_2_left
+ - wrist_image_left
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - proprio
+ fps: 15
+
+ fmb:
+ image_obs_keys:
+ - image_side_1
+ - image_side_2
+ - image_wrist_1
+ - image_wrist_2
+ depth_obs_keys:
+ - image_side_1_depth
+ - image_side_2_depth
+ - image_wrist_1_depth
+ - image_wrist_2_depth
+ state_obs_keys:
+ - proprio
+ fps: 10
+
+ dobbe:
+ image_obs_keys:
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - proprio
+ fps: 3.75
+
+ usc_cloth_sim_converted_externally_to_rlds:
+ image_obs_keys:
+ - image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - null
+ fps: 10
+
+ plex_robosuite:
+ image_obs_keys:
+ - image
+ - wrist_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 20
+
+ conq_hose_manipulation:
+ image_obs_keys:
+ - frontleft_fisheye_image
+ - frontright_fisheye_image
+ - hand_color_image
+ depth_obs_keys:
+ - null
+ state_obs_keys:
+ - state
+ fps: 30
+
\ No newline at end of file
diff --git a/lerobot/common/datasets/push_dataset_to_hub/openx/data_utils.py b/lerobot/common/datasets/push_dataset_to_hub/openx/data_utils.py
new file mode 100644
index 000000000..1582c67c2
--- /dev/null
+++ b/lerobot/common/datasets/push_dataset_to_hub/openx/data_utils.py
@@ -0,0 +1,106 @@
+#!/usr/bin/env python
+
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the Licens e.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+NOTE(YL): Adapted from:
+ Octo: https://github.com/octo-models/octo/blob/main/octo/data/utils/data_utils.py
+
+data_utils.py
+
+Additional utils for data processing.
+"""
+
+from typing import Any, Dict, List
+
+import tensorflow as tf
+
+
+def binarize_gripper_actions(actions: tf.Tensor) -> tf.Tensor:
+ """
+ Converts gripper actions from continuous to binary values (0 and 1).
+
+ We exploit that fact that most of the time, the gripper is fully open (near 1.0) or fully closed (near 0.0). As it
+ transitions between the two, it sometimes passes through a few intermediate values. We relabel those intermediate
+ values based on the state that is reached _after_ those intermediate values.
+
+ In the edge case that the trajectory ends with an intermediate value, we give up on binarizing and relabel that
+ chunk of intermediate values as the last action in the trajectory.
+
+ The `scan_fn` implements the following logic:
+ new_actions = np.empty_like(actions)
+ carry = actions[-1]
+ for i in reversed(range(actions.shape[0])):
+ if in_between_mask[i]:
+ carry = carry
+ else:
+ carry = float(open_mask[i])
+ new_actions[i] = carry
+ """
+ open_mask, closed_mask = actions > 0.95, actions < 0.05
+ in_between_mask = tf.logical_not(tf.logical_or(open_mask, closed_mask))
+ is_open_float = tf.cast(open_mask, tf.float32)
+
+ def scan_fn(carry, i):
+ return tf.cond(in_between_mask[i], lambda: tf.cast(carry, tf.float32), lambda: is_open_float[i])
+
+ return tf.scan(scan_fn, tf.range(tf.shape(actions)[0]), actions[-1], reverse=True)
+
+
+def invert_gripper_actions(actions: tf.Tensor) -> tf.Tensor:
+ return 1 - actions
+
+
+def rel2abs_gripper_actions(actions: tf.Tensor) -> tf.Tensor:
+ """
+ Converts relative gripper actions (+1 for closing, -1 for opening) to absolute actions (0 = closed; 1 = open).
+
+ Assumes that the first relative gripper is not redundant (i.e. close when already closed)!
+ """
+ # Note =>> -1 for closing, 1 for opening, 0 for no change
+ opening_mask, closing_mask = actions < -0.1, actions > 0.1
+ thresholded_actions = tf.where(opening_mask, 1, tf.where(closing_mask, -1, 0))
+
+ def scan_fn(carry, i):
+ return tf.cond(thresholded_actions[i] == 0, lambda: carry, lambda: thresholded_actions[i])
+
+ # If no relative grasp, assumes open for whole trajectory
+ start = -1 * thresholded_actions[tf.argmax(thresholded_actions != 0, axis=0)]
+ start = tf.cond(start == 0, lambda: 1, lambda: start)
+
+ # Note =>> -1 for closed, 1 for open
+ new_actions = tf.scan(scan_fn, tf.range(tf.shape(actions)[0]), start)
+ new_actions = tf.cast(new_actions, tf.float32) / 2 + 0.5
+
+ return new_actions
+
+
+# === Bridge-V2 =>> Dataset-Specific Transform ===
+def relabel_bridge_actions(traj: Dict[str, Any]) -> Dict[str, Any]:
+ """Relabels actions to use reached proprioceptive state; discards last timestep (no-action)."""
+ movement_actions = traj["observation"]["state"][1:, :6] - traj["observation"]["state"][:-1, :6]
+ traj_truncated = tf.nest.map_structure(lambda x: x[:-1], traj)
+ traj_truncated["action"] = tf.concat([movement_actions, traj["action"][:-1, -1:]], axis=1)
+
+ return traj_truncated
+
+
+# === RLDS Dataset Initialization Utilities ===
+def pprint_data_mixture(dataset_kwargs_list: List[Dict[str, Any]], dataset_weights: List[int]) -> None:
+ print("\n######################################################################################")
+ print(f"# Loading the following {len(dataset_kwargs_list)} datasets (incl. sampling weight):{'': >24} #")
+ for dataset_kwargs, weight in zip(dataset_kwargs_list, dataset_weights, strict=False):
+ pad = 80 - len(dataset_kwargs["name"])
+ print(f"# {dataset_kwargs['name']}: {weight:=>{pad}f} #")
+ print("######################################################################################\n")
diff --git a/lerobot/common/datasets/push_dataset_to_hub/openx/droid_utils.py b/lerobot/common/datasets/push_dataset_to_hub/openx/droid_utils.py
new file mode 100644
index 000000000..22ac4d9e3
--- /dev/null
+++ b/lerobot/common/datasets/push_dataset_to_hub/openx/droid_utils.py
@@ -0,0 +1,200 @@
+#!/usr/bin/env python
+
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+NOTE(YL): Adapted from:
+ OpenVLA: https://github.com/openvla/openvla
+
+Episode transforms for DROID dataset.
+"""
+
+from typing import Any, Dict
+
+import tensorflow as tf
+import tensorflow_graphics.geometry.transformation as tfg
+
+
+def rmat_to_euler(rot_mat):
+ return tfg.euler.from_rotation_matrix(rot_mat)
+
+
+def euler_to_rmat(euler):
+ return tfg.rotation_matrix_3d.from_euler(euler)
+
+
+def invert_rmat(rot_mat):
+ return tfg.rotation_matrix_3d.inverse(rot_mat)
+
+
+def rotmat_to_rot6d(mat):
+ """
+ Converts rotation matrix to R6 rotation representation (first two rows in rotation matrix).
+ Args:
+ mat: rotation matrix
+
+ Returns: 6d vector (first two rows of rotation matrix)
+
+ """
+ r6 = mat[..., :2, :]
+ r6_0, r6_1 = r6[..., 0, :], r6[..., 1, :]
+ r6_flat = tf.concat([r6_0, r6_1], axis=-1)
+ return r6_flat
+
+
+def velocity_act_to_wrist_frame(velocity, wrist_in_robot_frame):
+ """
+ Translates velocity actions (translation + rotation) from base frame of the robot to wrist frame.
+ Args:
+ velocity: 6d velocity action (3 x translation, 3 x rotation)
+ wrist_in_robot_frame: 6d pose of the end-effector in robot base frame
+
+ Returns: 9d velocity action in robot wrist frame (3 x translation, 6 x rotation as R6)
+
+ """
+ r_frame = euler_to_rmat(wrist_in_robot_frame[:, 3:6])
+ r_frame_inv = invert_rmat(r_frame)
+
+ # world to wrist: dT_pi = R^-1 dT_rbt
+ vel_t = (r_frame_inv @ velocity[:, :3][..., None])[..., 0]
+
+ # world to wrist: dR_pi = R^-1 dR_rbt R
+ dr_ = euler_to_rmat(velocity[:, 3:6])
+ dr_ = r_frame_inv @ (dr_ @ r_frame)
+ dr_r6 = rotmat_to_rot6d(dr_)
+ return tf.concat([vel_t, dr_r6], axis=-1)
+
+
+def rand_swap_exterior_images(img1, img2):
+ """
+ Randomly swaps the two exterior images (for training with single exterior input).
+ """
+ return tf.cond(tf.random.uniform(shape=[]) > 0.5, lambda: (img1, img2), lambda: (img2, img1))
+
+
+def droid_baseact_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
+ """
+ DROID dataset transformation for actions expressed in *base* frame of the robot.
+ """
+ dt = trajectory["action_dict"]["cartesian_velocity"][:, :3]
+ dr_ = trajectory["action_dict"]["cartesian_velocity"][:, 3:6]
+
+ trajectory["action"] = tf.concat(
+ (
+ dt,
+ dr_,
+ 1 - trajectory["action_dict"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ trajectory["observation"]["exterior_image_1_left"], trajectory["observation"]["exterior_image_2_left"] = (
+ rand_swap_exterior_images(
+ trajectory["observation"]["exterior_image_1_left"],
+ trajectory["observation"]["exterior_image_2_left"],
+ )
+ )
+ trajectory["observation"]["proprio"] = tf.concat(
+ (
+ trajectory["observation"]["cartesian_position"],
+ trajectory["observation"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ return trajectory
+
+
+def droid_wristact_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
+ """
+ DROID dataset transformation for actions expressed in *wrist* frame of the robot.
+ """
+ wrist_act = velocity_act_to_wrist_frame(
+ trajectory["action_dict"]["cartesian_velocity"], trajectory["observation"]["cartesian_position"]
+ )
+ trajectory["action"] = tf.concat(
+ (
+ wrist_act,
+ trajectory["action_dict"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ trajectory["observation"]["exterior_image_1_left"], trajectory["observation"]["exterior_image_2_left"] = (
+ rand_swap_exterior_images(
+ trajectory["observation"]["exterior_image_1_left"],
+ trajectory["observation"]["exterior_image_2_left"],
+ )
+ )
+ trajectory["observation"]["proprio"] = tf.concat(
+ (
+ trajectory["observation"]["cartesian_position"],
+ trajectory["observation"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ return trajectory
+
+
+def droid_finetuning_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
+ """
+ DROID dataset transformation for actions expressed in *base* frame of the robot.
+ """
+ dt = trajectory["action_dict"]["cartesian_velocity"][:, :3]
+ dr_ = trajectory["action_dict"]["cartesian_velocity"][:, 3:6]
+ trajectory["action"] = tf.concat(
+ (
+ dt,
+ dr_,
+ 1 - trajectory["action_dict"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ trajectory["observation"]["proprio"] = tf.concat(
+ (
+ trajectory["observation"]["cartesian_position"],
+ trajectory["observation"]["gripper_position"],
+ ),
+ axis=-1,
+ )
+ return trajectory
+
+
+def zero_action_filter(traj: Dict) -> bool:
+ """
+ Filters transitions whose actions are all-0 (only relative actions, no gripper action).
+ Note: this filter is applied *after* action normalization, so need to compare to "normalized 0".
+ """
+ droid_q01 = tf.convert_to_tensor(
+ [
+ -0.7776297926902771,
+ -0.5803514122962952,
+ -0.5795090794563293,
+ -0.6464047729969025,
+ -0.7041108310222626,
+ -0.8895104378461838,
+ ]
+ )
+ droid_q99 = tf.convert_to_tensor(
+ [
+ 0.7597932070493698,
+ 0.5726242214441299,
+ 0.7351000607013702,
+ 0.6705610305070877,
+ 0.6464948207139969,
+ 0.8897542208433151,
+ ]
+ )
+ droid_norm_0_act = (
+ 2 * (tf.zeros_like(traj["action"][:, :6]) - droid_q01) / (droid_q99 - droid_q01 + 1e-8) - 1
+ )
+
+ return tf.reduce_any(tf.math.abs(traj["action"][:, :6] - droid_norm_0_act) > 1e-5)
diff --git a/lerobot/common/datasets/push_dataset_to_hub/openx/transforms.py b/lerobot/common/datasets/push_dataset_to_hub/openx/transforms.py
new file mode 100644
index 000000000..a0c1e30f6
--- /dev/null
+++ b/lerobot/common/datasets/push_dataset_to_hub/openx/transforms.py
@@ -0,0 +1,859 @@
+#!/usr/bin/env python
+
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+NOTE(YL): Adapted from:
+ OpenVLA: https://github.com/openvla/openvla
+ Octo: https://github.com/octo-models/octo
+
+transforms.py
+
+Defines a registry of per-dataset standardization transforms for each dataset in Open-X Embodiment.
+
+Transforms adopt the following structure:
+ Input: Dictionary of *batched* features (i.e., has leading time dimension)
+ Output: Dictionary `step` =>> {
+ "observation": {
+ {{ video_info.filename }}
-+ Language Instruction: {{ videos_info[0].language_instruction }} +
+ {% endif %} +