Add mobile and neck

Add reachy2 dataset, policy, env
Add custom visualize_dataset.py
2024-06-06 14:01:14 +00:00 · 2024-06-04 12:31:59 +00:00 · 2024-06-03 15:47:12 +00:00 · 2024-06-03 14:47:06 +00:00
39 changed files with 978 additions and 2412 deletions
--- a/README.md
+++ b/README.md
@@ -127,21 +127,13 @@ wandb login

 Check out [example 1](./examples/1_load_lerobot_dataset.py) that illustrates how to use our dataset class which automatically download data from the Hugging Face hub.

-You can also locally visualize episodes from a dataset on the hub by executing our script from the command line:
+You can also locally visualize episodes from a dataset by executing our script from the command line:
 ```bash
 python lerobot/scripts/visualize_dataset.py \
    --repo-id lerobot/pusht \
    --episode-index 0
 ```

-or from a dataset in a local folder with the root `DATA_DIR` environment variable
-```bash
-DATA_DIR='./my_local_data_dir' python lerobot/scripts/visualize_dataset.py \
-    --repo-id lerobot/pusht \
-    --episode-index 0
-```
-
-
 It will open `rerun.io` and display the camera streams, robot states and actions, like this:

 https://github-production-user-asset-6210df.s3.amazonaws.com/4681518/328035972-fd46b787-b532-47e2-bb6f-fd536a55a7ed.mov?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240505%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240505T172924Z&X-Amz-Expires=300&X-Amz-Signature=d680b26c532eeaf80740f08af3320d22ad0b8a4e4da1bcc4f33142c15b509eda&X-Amz-SignedHeaders=host&actor_id=24889239&key_id=0&repo_id=748713144
@@ -149,51 +141,6 @@ https://github-production-user-asset-6210df.s3.amazonaws.com/4681518/328035972-f

 Our script can also visualize datasets stored on a distant server. See `python lerobot/scripts/visualize_dataset.py --help` for more instructions.

-### The `LeRobotDataset` format
-
-A dataset in `LeRobotDataset` format is very simple to use. It can be loaded from a repository on the Hugging Face hub or a local folder simply with e.g. `dataset = LeRobotDataset("lerobot/aloha_static_coffee")` and can be indexed into like any Hugging Face and Pytorch dataset. For instance `dataset[0]` will retrieve a sample of the dataset observations and actions in pytorch tensors format ready to be fed to a model.
-
-A specificity of `LeRobotDataset` is that we can retrieve several frames for one sample query. By setting `delta_timestamps` to a list of delta timestamps, e.g. `delta_timestamps = {"observation.image": [-1, -0.5, -0.2, 0]}`  one can retrieve, for each query, 4 images including one at -1 second before the current time step, the two others at -0.5 second and -0.2, and the final one at the current time step (0 second). See example [1_load_lerobot_dataset.py](examples/1_load_lerobot_dataset.py) for more details on `delta_timestamps`.
-
-Under the hood, the `LeRobotDataset` format makes use of several ways to serialize data which can be useful to understand if you plan to work more closely with this format. We tried to make a flexible yet simple dataset format that would cover most type of features and specificities present in reinforcement learning and robotics, in simulation and in real-world, with a focus on cameras and robot states.
-
-Here are the important details and internal structure organization of a typical `LeRobotDataset` instantiated with `dataset = LeRobotDataset("lerobot/aloha_static_coffee")`. The exact features will change from dataset to dataset but not the main aspects:
-
-```
-dataset attributes:
-  ├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example:
-  │  ├ observation.images.cam_high: VideoFrame
-  │  │   VideoFrame = {'path': path to a mp4 video, 'timestamp': float32 timestamp in the video}
-  │  ├ observation.state: List of float32: position of an arm joints (for instance)
-  │  ... (more observations)
-  │  ├ action: List of float32
-  │  ├ episode_index: int64: index of the episode for this sample
-  │  ├ frame_index: int64: index of the frame for this sample in the episode ; starts at 0 for each episode
-  │  ├ timestamp: float32: timestamp in the episode
-  │  ├ next.done: bool: indicates the end of en episode ; True for the last frame in each episode
-  │  └ index: int64: general index in the whole dataset
-  ├ episode_data_index: contains 2 tensors with the start and end indices of each episode
-  │  ├ from: 1D int64 tensor of first frame index for each episode: shape (num episodes,) starts with 0
-  │  └ to: 1D int64 tensor of last frame index for each episode: shape (num episodes,)
-  ├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance
-  │  ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}
-  │  ...
-  ├ info: a dictionary of metadata on the dataset
-  │  ├ fps: float - frame per second the dataset is recorded/synchronized to
-  │  └ video: bool - indicates if frames are encoded in mp4 video files to save space or stored as png files
-  ├ videos_dir: path to where the mp4 videos or png images are stored/accessed
-  └ camera_keys: List of string: the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`)
-```
-
-A `LeRobotDataset` is serialised using several widespread file formats for each of its parts, namely:
- hf_dataset stored using Hugging Face datasets library serialization to parquet
- videos are stored in mp4 format to save space or png files
- episode_data_index saved using `safetensor` tensor serializtion format
- stats saved using `safetensor` tensor serializtion format
- info are saved using JSON
-
-Dataset can uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can set the `DATA_DIR` environment variable to you root dataset folder as illustrated in the above section on dataset visualization.
-
 ### Evaluate a pretrained policy

 Check out [example 2](./examples/2_evaluate_pretrained_policy.py) that illustrates how to download a pretrained policy from Hugging Face hub, and run an evaluation on its corresponding environment.
--- a/examples/advanced/1_train_act_pusht/act_pusht.yaml
+++ b/examples/advanced/1_train_act_pusht/act_pusht.yaml
@@ -28,7 +28,7 @@ training:
  online_steps_between_rollouts: 1

  delta_timestamps:
-    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"

 eval:
  n_episodes: 50
--- a/examples/real_robot_example/README.md
+++ b/examples/real_robot_example/README.md
@@ -1,89 +0,0 @@
-# Using `lerobot`  on a real world arm
-
-
-In this example, we'll be using `lerobot` on a real world arm to:
- record a dataset in the `lerobot` format
- (soon) train a policy on it
- (soon) run the policy in the real-world
-
-## Which robotic arm to use
-
-In this example we're using the [open-source low-cost arm from Alexander Koch](https://github.com/AlexanderKoch-Koch/low_cost_robot) in the specific setup of:
- having 6 servos per arm, i.e. using the elbow-to-wrist extension
- adding two cameras around it, one on top and one in the front
- having a teleoperation arm as well (build the leader and the follower arms in A. Koch repo, both with elbow-to-wrist extensions)
-
-I'm using these cameras (but the setup should not be sensitive to the exact cameras you're using):
- C922 Pro Stream Webcam
- Intel(R) RealSense D455 (using only the RGB input)
-
-
-In general, this example should be very easily extendable to any type of arm using Dynamixel servos with at least one camera by changing a couple of configuration in the gym env.
-
-## Install the example
-
-Follow these steps:
- install `lerobot`
- install the Dynamixel-sdk: ` pip install dynamixel-sdk`
-
-## Usage
-
-### 0 - record examples
-
-Run the `record_training_data.py` example, selecting the duration and number of episodes you want to record, e.g.
-```
-DATA_DIR='./data' python record_training_data.py \
--repo-id=thomwolf/blue_red_sort \
--num-episodes=50 \
--num-frames=400
-```
-
-TODO:
- various length episodes
- being able to drop episodes
- checking uploading to the hub
-
-### 1 - visualize the dataset
-
-Use the standard dataset visualization script pointing it to the right folder:
-```
-DATA_DIR='./data' python ../../lerobot/scripts/visualize_dataset.py \
-    --repo-id thomwolf/blue_red_sort \
-    --episode-index 0
-```
-
-### 2 - Train a policy
-
-From the example directory let's run this command to train a model using ACT
-
-```
-DATA_DIR='./data' python ../../lerobot/scripts/train.py \
-    device=cuda \
-    hydra.searchpath=[file://./train_config/] \
-    hydra.run.dir=./outputs/train/blue_red_sort \
-    dataset_repo_id=thomwolf/blue_red_sort \
-    env=gym_real_world \
-    policy=act_real_world \
-    wandb.enable=false
-```
-
-### 3 - Evaluate the policy in the real world
-
-From the example directory let's run this command to evaluate our policy.
-The configuration for running the policy is in the checkpoint of the model.
-You can override parameters as follow:
-
-```
-python run_policy.py \
-    -p ./outputs/train/blue_red_sort/checkpoints/last/pretrained_model/
-    env.episode_length=1000
-```
-
-
-## Convert a hdf5 dataset recorded with the original ACT repo
-
-You can convert a dataset from the raw data format of HDF5 files like in: https://github.com/tonyzhaozh/act with the following command:
-
-```
-python ./lerobot/scripts/push_dataset_to_hub.py
-```
--- a/examples/real_robot_example/convert_original_act_checkpoint.ipynb
+++ b/examples/real_robot_example/convert_original_act_checkpoint.ipynb
@@ -1,840 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 48,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import torch\n",
-    "from safetensors.torch import load_file, save_file\n",
-    "from pprint import pprint"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 52,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "original_ckpt_path = \"/home/thomwolf/Documents/Github/ACT/checkpoints/blue_red_sort/policy_last.ckpt\"\n",
-    "converted_ckpt_path = \"/home/thomwolf/Documents/Github/ACT/checkpoints/blue_red_sort/model.safetensors\"\n",
-    "\n",
-    "comparison_main_path = \"/home/thomwolf/Documents/Github/lerobot/examples/real_robot_example/outputs/train/blue_red_debug_no_masking/checkpoints/last/pretrained_model/\"\n",
-    "comparison_safetensor_path = comparison_main_path + \"model.safetensors\"\n",
-    "comparison_config_json_path = comparison_main_path + \"config.json\"\n",
-    "comparison_config_yaml_path = comparison_main_path + \"config.yaml\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "a = torch.load(original_ckpt_path)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "b = load_file(comparison_safetensor_path)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['model.action_head.bias',\n",
-      " 'model.action_head.weight',\n",
-      " 'model.backbone.bn1.bias',\n",
-      " 'model.backbone.bn1.running_mean',\n",
-      " 'model.backbone.bn1.running_var',\n",
-      " 'model.backbone.bn1.weight',\n",
-      " 'model.backbone.conv1.weight',\n",
-      " 'model.backbone.layer1.0.bn1.bias',\n",
-      " 'model.backbone.layer1.0.bn1.running_mean',\n",
-      " 'model.backbone.layer1.0.bn1.running_var',\n",
-      " 'model.backbone.layer1.0.bn1.weight',\n",
-      " 'model.backbone.layer1.0.bn2.bias',\n",
-      " 'model.backbone.layer1.0.bn2.running_mean',\n",
-      " 'model.backbone.layer1.0.bn2.running_var',\n",
-      " 'model.backbone.layer1.0.bn2.weight',\n",
-      " 'model.backbone.layer1.0.conv1.weight',\n",
-      " 'model.backbone.layer1.0.conv2.weight',\n",
-      " 'model.backbone.layer1.1.bn1.bias',\n",
-      " 'model.backbone.layer1.1.bn1.running_mean',\n",
-      " 'model.backbone.layer1.1.bn1.running_var',\n",
-      " 'model.backbone.layer1.1.bn1.weight',\n",
-      " 'model.backbone.layer1.1.bn2.bias',\n",
-      " 'model.backbone.layer1.1.bn2.running_mean',\n",
-      " 'model.backbone.layer1.1.bn2.running_var',\n",
-      " 'model.backbone.layer1.1.bn2.weight',\n",
-      " 'model.backbone.layer1.1.conv1.weight',\n",
-      " 'model.backbone.layer1.1.conv2.weight',\n",
-      " 'model.backbone.layer2.0.bn1.bias',\n",
-      " 'model.backbone.layer2.0.bn1.running_mean',\n",
-      " 'model.backbone.layer2.0.bn1.running_var',\n",
-      " 'model.backbone.layer2.0.bn1.weight',\n",
-      " 'model.backbone.layer2.0.bn2.bias',\n",
-      " 'model.backbone.layer2.0.bn2.running_mean',\n",
-      " 'model.backbone.layer2.0.bn2.running_var',\n",
-      " 'model.backbone.layer2.0.bn2.weight',\n",
-      " 'model.backbone.layer2.0.conv1.weight',\n",
-      " 'model.backbone.layer2.0.conv2.weight',\n",
-      " 'model.backbone.layer2.0.downsample.0.weight',\n",
-      " 'model.backbone.layer2.0.downsample.1.bias',\n",
-      " 'model.backbone.layer2.0.downsample.1.running_mean',\n",
-      " 'model.backbone.layer2.0.downsample.1.running_var',\n",
-      " 'model.backbone.layer2.0.downsample.1.weight',\n",
-      " 'model.backbone.layer2.1.bn1.bias',\n",
-      " 'model.backbone.layer2.1.bn1.running_mean',\n",
-      " 'model.backbone.layer2.1.bn1.running_var',\n",
-      " 'model.backbone.layer2.1.bn1.weight',\n",
-      " 'model.backbone.layer2.1.bn2.bias',\n",
-      " 'model.backbone.layer2.1.bn2.running_mean',\n",
-      " 'model.backbone.layer2.1.bn2.running_var',\n",
-      " 'model.backbone.layer2.1.bn2.weight',\n",
-      " 'model.backbone.layer2.1.conv1.weight',\n",
-      " 'model.backbone.layer2.1.conv2.weight',\n",
-      " 'model.backbone.layer3.0.bn1.bias',\n",
-      " 'model.backbone.layer3.0.bn1.running_mean',\n",
-      " 'model.backbone.layer3.0.bn1.running_var',\n",
-      " 'model.backbone.layer3.0.bn1.weight',\n",
-      " 'model.backbone.layer3.0.bn2.bias',\n",
-      " 'model.backbone.layer3.0.bn2.running_mean',\n",
-      " 'model.backbone.layer3.0.bn2.running_var',\n",
-      " 'model.backbone.layer3.0.bn2.weight',\n",
-      " 'model.backbone.layer3.0.conv1.weight',\n",
-      " 'model.backbone.layer3.0.conv2.weight',\n",
-      " 'model.backbone.layer3.0.downsample.0.weight',\n",
-      " 'model.backbone.layer3.0.downsample.1.bias',\n",
-      " 'model.backbone.layer3.0.downsample.1.running_mean',\n",
-      " 'model.backbone.layer3.0.downsample.1.running_var',\n",
-      " 'model.backbone.layer3.0.downsample.1.weight',\n",
-      " 'model.backbone.layer3.1.bn1.bias',\n",
-      " 'model.backbone.layer3.1.bn1.running_mean',\n",
-      " 'model.backbone.layer3.1.bn1.running_var',\n",
-      " 'model.backbone.layer3.1.bn1.weight',\n",
-      " 'model.backbone.layer3.1.bn2.bias',\n",
-      " 'model.backbone.layer3.1.bn2.running_mean',\n",
-      " 'model.backbone.layer3.1.bn2.running_var',\n",
-      " 'model.backbone.layer3.1.bn2.weight',\n",
-      " 'model.backbone.layer3.1.conv1.weight',\n",
-      " 'model.backbone.layer3.1.conv2.weight',\n",
-      " 'model.backbone.layer4.0.bn1.bias',\n",
-      " 'model.backbone.layer4.0.bn1.running_mean',\n",
-      " 'model.backbone.layer4.0.bn1.running_var',\n",
-      " 'model.backbone.layer4.0.bn1.weight',\n",
-      " 'model.backbone.layer4.0.bn2.bias',\n",
-      " 'model.backbone.layer4.0.bn2.running_mean',\n",
-      " 'model.backbone.layer4.0.bn2.running_var',\n",
-      " 'model.backbone.layer4.0.bn2.weight',\n",
-      " 'model.backbone.layer4.0.conv1.weight',\n",
-      " 'model.backbone.layer4.0.conv2.weight',\n",
-      " 'model.backbone.layer4.0.downsample.0.weight',\n",
-      " 'model.backbone.layer4.0.downsample.1.bias',\n",
-      " 'model.backbone.layer4.0.downsample.1.running_mean',\n",
-      " 'model.backbone.layer4.0.downsample.1.running_var',\n",
-      " 'model.backbone.layer4.0.downsample.1.weight',\n",
-      " 'model.backbone.layer4.1.bn1.bias',\n",
-      " 'model.backbone.layer4.1.bn1.running_mean',\n",
-      " 'model.backbone.layer4.1.bn1.running_var',\n",
-      " 'model.backbone.layer4.1.bn1.weight',\n",
-      " 'model.backbone.layer4.1.bn2.bias',\n",
-      " 'model.backbone.layer4.1.bn2.running_mean',\n",
-      " 'model.backbone.layer4.1.bn2.running_var',\n",
-      " 'model.backbone.layer4.1.bn2.weight',\n",
-      " 'model.backbone.layer4.1.conv1.weight',\n",
-      " 'model.backbone.layer4.1.conv2.weight',\n",
-      " 'model.decoder.layers.0.linear1.bias',\n",
-      " 'model.decoder.layers.0.linear1.weight',\n",
-      " 'model.decoder.layers.0.linear2.bias',\n",
-      " 'model.decoder.layers.0.linear2.weight',\n",
-      " 'model.decoder.layers.0.multihead_attn.in_proj_bias',\n",
-      " 'model.decoder.layers.0.multihead_attn.in_proj_weight',\n",
-      " 'model.decoder.layers.0.multihead_attn.out_proj.bias',\n",
-      " 'model.decoder.layers.0.multihead_attn.out_proj.weight',\n",
-      " 'model.decoder.layers.0.norm1.bias',\n",
-      " 'model.decoder.layers.0.norm1.weight',\n",
-      " 'model.decoder.layers.0.norm2.bias',\n",
-      " 'model.decoder.layers.0.norm2.weight',\n",
-      " 'model.decoder.layers.0.norm3.bias',\n",
-      " 'model.decoder.layers.0.norm3.weight',\n",
-      " 'model.decoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.decoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.decoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.decoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.decoder_pos_embed.weight',\n",
-      " 'model.encoder.layers.0.linear1.bias',\n",
-      " 'model.encoder.layers.0.linear1.weight',\n",
-      " 'model.encoder.layers.0.linear2.bias',\n",
-      " 'model.encoder.layers.0.linear2.weight',\n",
-      " 'model.encoder.layers.0.norm1.bias',\n",
-      " 'model.encoder.layers.0.norm1.weight',\n",
-      " 'model.encoder.layers.0.norm2.bias',\n",
-      " 'model.encoder.layers.0.norm2.weight',\n",
-      " 'model.encoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.1.linear1.bias',\n",
-      " 'model.encoder.layers.1.linear1.weight',\n",
-      " 'model.encoder.layers.1.linear2.bias',\n",
-      " 'model.encoder.layers.1.linear2.weight',\n",
-      " 'model.encoder.layers.1.norm1.bias',\n",
-      " 'model.encoder.layers.1.norm1.weight',\n",
-      " 'model.encoder.layers.1.norm2.bias',\n",
-      " 'model.encoder.layers.1.norm2.weight',\n",
-      " 'model.encoder.layers.1.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.1.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.1.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.1.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.2.linear1.bias',\n",
-      " 'model.encoder.layers.2.linear1.weight',\n",
-      " 'model.encoder.layers.2.linear2.bias',\n",
-      " 'model.encoder.layers.2.linear2.weight',\n",
-      " 'model.encoder.layers.2.norm1.bias',\n",
-      " 'model.encoder.layers.2.norm1.weight',\n",
-      " 'model.encoder.layers.2.norm2.bias',\n",
-      " 'model.encoder.layers.2.norm2.weight',\n",
-      " 'model.encoder.layers.2.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.2.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.2.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.2.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.3.linear1.bias',\n",
-      " 'model.encoder.layers.3.linear1.weight',\n",
-      " 'model.encoder.layers.3.linear2.bias',\n",
-      " 'model.encoder.layers.3.linear2.weight',\n",
-      " 'model.encoder.layers.3.norm1.bias',\n",
-      " 'model.encoder.layers.3.norm1.weight',\n",
-      " 'model.encoder.layers.3.norm2.bias',\n",
-      " 'model.encoder.layers.3.norm2.weight',\n",
-      " 'model.encoder.layers.3.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.3.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.3.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.3.self_attn.out_proj.weight',\n",
-      " 'model.encoder_img_feat_input_proj.bias',\n",
-      " 'model.encoder_img_feat_input_proj.weight',\n",
-      " 'model.encoder_latent_input_proj.bias',\n",
-      " 'model.encoder_latent_input_proj.weight',\n",
-      " 'model.encoder_robot_and_latent_pos_embed.weight',\n",
-      " 'model.encoder_robot_state_input_proj.bias',\n",
-      " 'model.encoder_robot_state_input_proj.weight',\n",
-      " 'model.vae_encoder.layers.0.linear1.bias',\n",
-      " 'model.vae_encoder.layers.0.linear1.weight',\n",
-      " 'model.vae_encoder.layers.0.linear2.bias',\n",
-      " 'model.vae_encoder.layers.0.linear2.weight',\n",
-      " 'model.vae_encoder.layers.0.norm1.bias',\n",
-      " 'model.vae_encoder.layers.0.norm1.weight',\n",
-      " 'model.vae_encoder.layers.0.norm2.bias',\n",
-      " 'model.vae_encoder.layers.0.norm2.weight',\n",
-      " 'model.vae_encoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.vae_encoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.vae_encoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.vae_encoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.vae_encoder.layers.1.linear1.bias',\n",
-      " 'model.vae_encoder.layers.1.linear1.weight',\n",
-      " 'model.vae_encoder.layers.1.linear2.bias',\n",
-      " 'model.vae_encoder.layers.1.linear2.weight',\n",
-      " 'model.vae_encoder.layers.1.norm1.bias',\n",
-      " 'model.vae_encoder.layers.1.norm1.weight',\n",
-      " 'model.vae_encoder.layers.1.norm2.bias',\n",
-      " 'model.vae_encoder.layers.1.norm2.weight',\n",
-      " 'model.vae_encoder.layers.1.self_attn.in_proj_bias',\n",
-      " 'model.vae_encoder.layers.1.self_attn.in_proj_weight',\n",
-      " 'model.vae_encoder.layers.1.self_attn.out_proj.bias',\n",
-      " 'model.vae_encoder.layers.1.self_attn.out_proj.weight',\n",
-      " 'model.vae_encoder.layers.2.linear1.bias',\n",
-      " 'model.vae_encoder.layers.2.linear1.weight',\n",
-      " 'model.vae_encoder.layers.2.linear2.bias',\n",
-      " 'model.vae_encoder.layers.2.linear2.weight',\n",
-      " 'model.vae_encoder.layers.2.norm1.bias',\n",
-      " 'model.vae_encoder.layers.2.norm1.weight',\n",
-      " 'model.vae_encoder.layers.2.norm2.bias',\n",
-      " 'model.vae_encoder.layers.2.norm2.weight',\n",
-      " 'model.vae_encoder.layers.2.self_attn.in_proj_bias',\n",
-      " 'model.vae_encoder.layers.2.self_attn.in_proj_weight',\n",
-      " 'model.vae_encoder.layers.2.self_attn.out_proj.bias',\n",
-      " 'model.vae_encoder.layers.2.self_attn.out_proj.weight',\n",
-      " 'model.vae_encoder.layers.3.linear1.bias',\n",
-      " 'model.vae_encoder.layers.3.linear1.weight',\n",
-      " 'model.vae_encoder.layers.3.linear2.bias',\n",
-      " 'model.vae_encoder.layers.3.linear2.weight',\n",
-      " 'model.vae_encoder.layers.3.norm1.bias',\n",
-      " 'model.vae_encoder.layers.3.norm1.weight',\n",
-      " 'model.vae_encoder.layers.3.norm2.bias',\n",
-      " 'model.vae_encoder.layers.3.norm2.weight',\n",
-      " 'model.vae_encoder.layers.3.self_attn.in_proj_bias',\n",
-      " 'model.vae_encoder.layers.3.self_attn.in_proj_weight',\n",
-      " 'model.vae_encoder.layers.3.self_attn.out_proj.bias',\n",
-      " 'model.vae_encoder.layers.3.self_attn.out_proj.weight',\n",
-      " 'model.vae_encoder_action_input_proj.bias',\n",
-      " 'model.vae_encoder_action_input_proj.weight',\n",
-      " 'model.vae_encoder_cls_embed.weight',\n",
-      " 'model.vae_encoder_latent_output_proj.bias',\n",
-      " 'model.vae_encoder_latent_output_proj.weight',\n",
-      " 'model.vae_encoder_pos_enc',\n",
-      " 'model.vae_encoder_robot_state_input_proj.bias',\n",
-      " 'model.vae_encoder_robot_state_input_proj.weight',\n",
-      " 'normalize_inputs.buffer_observation_images_front.mean',\n",
-      " 'normalize_inputs.buffer_observation_images_front.std',\n",
-      " 'normalize_inputs.buffer_observation_images_top.mean',\n",
-      " 'normalize_inputs.buffer_observation_images_top.std',\n",
-      " 'normalize_inputs.buffer_observation_state.mean',\n",
-      " 'normalize_inputs.buffer_observation_state.std',\n",
-      " 'normalize_targets.buffer_action.mean',\n",
-      " 'normalize_targets.buffer_action.std',\n",
-      " 'unnormalize_outputs.buffer_action.mean',\n",
-      " 'unnormalize_outputs.buffer_action.std']\n"
-     ]
-    }
-   ],
-   "source": [
-    "dest = list(b.keys())\n",
-    "pprint(dest)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['model.pos_table',\n",
-      " 'model.transformer.encoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.transformer.encoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.transformer.encoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.transformer.encoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.transformer.encoder.layers.0.linear1.weight',\n",
-      " 'model.transformer.encoder.layers.0.linear1.bias',\n",
-      " 'model.transformer.encoder.layers.0.linear2.weight',\n",
-      " 'model.transformer.encoder.layers.0.linear2.bias',\n",
-      " 'model.transformer.encoder.layers.0.norm1.weight',\n",
-      " 'model.transformer.encoder.layers.0.norm1.bias',\n",
-      " 'model.transformer.encoder.layers.0.norm2.weight',\n",
-      " 'model.transformer.encoder.layers.0.norm2.bias',\n",
-      " 'model.transformer.encoder.layers.1.self_attn.in_proj_weight',\n",
-      " 'model.transformer.encoder.layers.1.self_attn.in_proj_bias',\n",
-      " 'model.transformer.encoder.layers.1.self_attn.out_proj.weight',\n",
-      " 'model.transformer.encoder.layers.1.self_attn.out_proj.bias',\n",
-      " 'model.transformer.encoder.layers.1.linear1.weight',\n",
-      " 'model.transformer.encoder.layers.1.linear1.bias',\n",
-      " 'model.transformer.encoder.layers.1.linear2.weight',\n",
-      " 'model.transformer.encoder.layers.1.linear2.bias',\n",
-      " 'model.transformer.encoder.layers.1.norm1.weight',\n",
-      " 'model.transformer.encoder.layers.1.norm1.bias',\n",
-      " 'model.transformer.encoder.layers.1.norm2.weight',\n",
-      " 'model.transformer.encoder.layers.1.norm2.bias',\n",
-      " 'model.transformer.encoder.layers.2.self_attn.in_proj_weight',\n",
-      " 'model.transformer.encoder.layers.2.self_attn.in_proj_bias',\n",
-      " 'model.transformer.encoder.layers.2.self_attn.out_proj.weight',\n",
-      " 'model.transformer.encoder.layers.2.self_attn.out_proj.bias',\n",
-      " 'model.transformer.encoder.layers.2.linear1.weight',\n",
-      " 'model.transformer.encoder.layers.2.linear1.bias',\n",
-      " 'model.transformer.encoder.layers.2.linear2.weight',\n",
-      " 'model.transformer.encoder.layers.2.linear2.bias',\n",
-      " 'model.transformer.encoder.layers.2.norm1.weight',\n",
-      " 'model.transformer.encoder.layers.2.norm1.bias',\n",
-      " 'model.transformer.encoder.layers.2.norm2.weight',\n",
-      " 'model.transformer.encoder.layers.2.norm2.bias',\n",
-      " 'model.transformer.encoder.layers.3.self_attn.in_proj_weight',\n",
-      " 'model.transformer.encoder.layers.3.self_attn.in_proj_bias',\n",
-      " 'model.transformer.encoder.layers.3.self_attn.out_proj.weight',\n",
-      " 'model.transformer.encoder.layers.3.self_attn.out_proj.bias',\n",
-      " 'model.transformer.encoder.layers.3.linear1.weight',\n",
-      " 'model.transformer.encoder.layers.3.linear1.bias',\n",
-      " 'model.transformer.encoder.layers.3.linear2.weight',\n",
-      " 'model.transformer.encoder.layers.3.linear2.bias',\n",
-      " 'model.transformer.encoder.layers.3.norm1.weight',\n",
-      " 'model.transformer.encoder.layers.3.norm1.bias',\n",
-      " 'model.transformer.encoder.layers.3.norm2.weight',\n",
-      " 'model.transformer.encoder.layers.3.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.0.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.0.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.0.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.0.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.0.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.0.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.0.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.0.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.0.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.0.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.0.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.0.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.0.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.0.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.1.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.1.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.1.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.1.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.1.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.1.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.1.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.1.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.1.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.1.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.1.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.1.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.1.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.1.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.1.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.1.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.1.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.1.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.2.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.2.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.2.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.2.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.2.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.2.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.2.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.2.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.2.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.2.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.2.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.2.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.2.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.2.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.2.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.2.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.2.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.2.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.3.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.3.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.3.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.3.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.3.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.3.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.3.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.3.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.3.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.3.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.3.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.3.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.3.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.3.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.3.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.3.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.3.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.3.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.4.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.4.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.4.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.4.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.4.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.4.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.4.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.4.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.4.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.4.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.4.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.4.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.4.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.4.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.4.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.4.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.4.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.4.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.5.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.5.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.5.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.5.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.5.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.5.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.5.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.5.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.5.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.5.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.5.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.5.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.5.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.5.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.5.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.5.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.5.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.5.norm3.bias',\n",
-      " 'model.transformer.decoder.layers.6.self_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.6.self_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.6.self_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.6.self_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.6.multihead_attn.in_proj_weight',\n",
-      " 'model.transformer.decoder.layers.6.multihead_attn.in_proj_bias',\n",
-      " 'model.transformer.decoder.layers.6.multihead_attn.out_proj.weight',\n",
-      " 'model.transformer.decoder.layers.6.multihead_attn.out_proj.bias',\n",
-      " 'model.transformer.decoder.layers.6.linear1.weight',\n",
-      " 'model.transformer.decoder.layers.6.linear1.bias',\n",
-      " 'model.transformer.decoder.layers.6.linear2.weight',\n",
-      " 'model.transformer.decoder.layers.6.linear2.bias',\n",
-      " 'model.transformer.decoder.layers.6.norm1.weight',\n",
-      " 'model.transformer.decoder.layers.6.norm1.bias',\n",
-      " 'model.transformer.decoder.layers.6.norm2.weight',\n",
-      " 'model.transformer.decoder.layers.6.norm2.bias',\n",
-      " 'model.transformer.decoder.layers.6.norm3.weight',\n",
-      " 'model.transformer.decoder.layers.6.norm3.bias',\n",
-      " 'model.transformer.decoder.norm.weight',\n",
-      " 'model.transformer.decoder.norm.bias',\n",
-      " 'model.encoder.layers.0.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.0.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.0.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.0.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.0.linear1.weight',\n",
-      " 'model.encoder.layers.0.linear1.bias',\n",
-      " 'model.encoder.layers.0.linear2.weight',\n",
-      " 'model.encoder.layers.0.linear2.bias',\n",
-      " 'model.encoder.layers.0.norm1.weight',\n",
-      " 'model.encoder.layers.0.norm1.bias',\n",
-      " 'model.encoder.layers.0.norm2.weight',\n",
-      " 'model.encoder.layers.0.norm2.bias',\n",
-      " 'model.encoder.layers.1.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.1.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.1.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.1.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.1.linear1.weight',\n",
-      " 'model.encoder.layers.1.linear1.bias',\n",
-      " 'model.encoder.layers.1.linear2.weight',\n",
-      " 'model.encoder.layers.1.linear2.bias',\n",
-      " 'model.encoder.layers.1.norm1.weight',\n",
-      " 'model.encoder.layers.1.norm1.bias',\n",
-      " 'model.encoder.layers.1.norm2.weight',\n",
-      " 'model.encoder.layers.1.norm2.bias',\n",
-      " 'model.encoder.layers.2.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.2.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.2.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.2.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.2.linear1.weight',\n",
-      " 'model.encoder.layers.2.linear1.bias',\n",
-      " 'model.encoder.layers.2.linear2.weight',\n",
-      " 'model.encoder.layers.2.linear2.bias',\n",
-      " 'model.encoder.layers.2.norm1.weight',\n",
-      " 'model.encoder.layers.2.norm1.bias',\n",
-      " 'model.encoder.layers.2.norm2.weight',\n",
-      " 'model.encoder.layers.2.norm2.bias',\n",
-      " 'model.encoder.layers.3.self_attn.in_proj_weight',\n",
-      " 'model.encoder.layers.3.self_attn.in_proj_bias',\n",
-      " 'model.encoder.layers.3.self_attn.out_proj.weight',\n",
-      " 'model.encoder.layers.3.self_attn.out_proj.bias',\n",
-      " 'model.encoder.layers.3.linear1.weight',\n",
-      " 'model.encoder.layers.3.linear1.bias',\n",
-      " 'model.encoder.layers.3.linear2.weight',\n",
-      " 'model.encoder.layers.3.linear2.bias',\n",
-      " 'model.encoder.layers.3.norm1.weight',\n",
-      " 'model.encoder.layers.3.norm1.bias',\n",
-      " 'model.encoder.layers.3.norm2.weight',\n",
-      " 'model.encoder.layers.3.norm2.bias',\n",
-      " 'model.action_head.weight',\n",
-      " 'model.action_head.bias',\n",
-      " 'model.is_pad_head.weight',\n",
-      " 'model.is_pad_head.bias',\n",
-      " 'model.query_embed.weight',\n",
-      " 'model.input_proj.weight',\n",
-      " 'model.input_proj.bias',\n",
-      " 'model.backbones.0.0.body.conv1.weight',\n",
-      " 'model.backbones.0.0.body.bn1.weight',\n",
-      " 'model.backbones.0.0.body.bn1.bias',\n",
-      " 'model.backbones.0.0.body.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer1.0.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer1.0.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer1.0.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer1.1.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer1.1.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer1.1.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer2.0.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer2.0.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer2.0.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.0.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.1.weight',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.1.bias',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.1.running_var',\n",
-      " 'model.backbones.0.0.body.layer2.0.downsample.1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer2.1.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer2.1.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer2.1.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer3.0.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer3.0.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer3.0.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.0.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.1.weight',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.1.bias',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.1.running_var',\n",
-      " 'model.backbones.0.0.body.layer3.0.downsample.1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer3.1.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer3.1.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer3.1.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer4.0.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer4.0.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer4.0.bn2.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.0.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.1.weight',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.1.bias',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.1.running_var',\n",
-      " 'model.backbones.0.0.body.layer4.0.downsample.1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer4.1.conv1.weight',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn1.weight',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn1.bias',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn1.running_mean',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn1.running_var',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn1.num_batches_tracked',\n",
-      " 'model.backbones.0.0.body.layer4.1.conv2.weight',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn2.weight',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn2.bias',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn2.running_mean',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn2.running_var',\n",
-      " 'model.backbones.0.0.body.layer4.1.bn2.num_batches_tracked',\n",
-      " 'model.input_proj_robot_state.weight',\n",
-      " 'model.input_proj_robot_state.bias',\n",
-      " 'model.cls_embed.weight',\n",
-      " 'model.encoder_action_proj.weight',\n",
-      " 'model.encoder_action_proj.bias',\n",
-      " 'model.encoder_joint_proj.weight',\n",
-      " 'model.encoder_joint_proj.bias',\n",
-      " 'model.latent_proj.weight',\n",
-      " 'model.latent_proj.bias',\n",
-      " 'model.latent_out_proj.weight',\n",
-      " 'model.latent_out_proj.bias',\n",
-      " 'model.additional_pos_embed.weight']\n"
-     ]
-    }
-   ],
-   "source": [
-    "orig = list(a.keys())\n",
-    "pprint(orig)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 45,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "a = torch.load(original_ckpt_path)\n",
-    "\n",
-    "to_remove_startswith = ['model.transformer.decoder.layers.1.',\n",
-    "             'model.transformer.decoder.layers.2.',\n",
-    "             'model.transformer.decoder.layers.3.',\n",
-    "             'model.transformer.decoder.layers.4.',\n",
-    "             'model.transformer.decoder.layers.5.',\n",
-    "             'model.transformer.decoder.layers.6.',\n",
-    "             'model.transformer.decoder.norm.',\n",
-    "             'model.is_pad_head']\n",
-    "\n",
-    "to_remove_in = ['num_batches_tracked',]\n",
-    "\n",
-    "conv = {}\n",
-    "\n",
-    "keys = list(a.keys())\n",
-    "for k in keys:\n",
-    "    if any(k.startswith(tr) for tr in to_remove_startswith):\n",
-    "        a.pop(k)\n",
-    "        continue\n",
-    "    if any(tr in k for tr in to_remove_in):\n",
-    "        a.pop(k)\n",
-    "        continue\n",
-    "    if k.startswith('model.transformer.encoder.layers.'):\n",
-    "        conv[k.replace('transformer.', '')] = a.pop(k)\n",
-    "    if k.startswith('model.transformer.decoder.layers.0.'):\n",
-    "        conv[k.replace('transformer.', '')] = a.pop(k)\n",
-    "    if k.startswith('model.encoder.layers.'):\n",
-    "        conv[k.replace('encoder.', 'vae_encoder.')] = a.pop(k)\n",
-    "    if k.startswith('model.action_head.'):\n",
-    "        conv[k] = a.pop(k)\n",
-    "    if k.startswith('model.pos_table'):\n",
-    "        conv[k.replace('pos_table', 'vae_encoder_pos_enc')] = a.pop(k)\n",
-    "    if k.startswith('model.query_embed.'):\n",
-    "        conv[k.replace('query_embed', 'decoder_pos_embed')] = a.pop(k)\n",
-    "    if k.startswith('model.input_proj.'):\n",
-    "        conv[k.replace('input_proj.', 'encoder_img_feat_input_proj.')] = a.pop(k)\n",
-    "    if k.startswith('model.input_proj_robot_state.'):\n",
-    "        conv[k.replace('input_proj_robot_state.', 'encoder_robot_state_input_proj.')] = a.pop(k)\n",
-    "    if k.startswith('model.backbones.0.0.body.'):\n",
-    "        conv[k.replace('backbones.0.0.body', 'backbone')] = a.pop(k)\n",
-    "    if k.startswith('model.cls_embed.'):\n",
-    "        conv[k.replace('cls_embed', 'vae_encoder_cls_embed')] = a.pop(k)\n",
-    "    if k.startswith('model.encoder_action_proj.'):\n",
-    "        conv[k.replace('encoder_action_proj', 'vae_encoder_action_input_proj')] = a.pop(k)\n",
-    "    if k.startswith('model.encoder_joint_proj.'):\n",
-    "        conv[k.replace('encoder_joint_proj', 'vae_encoder_robot_state_input_proj')] = a.pop(k)\n",
-    "    if k.startswith('model.latent_proj.'):\n",
-    "        conv[k.replace('latent_proj', 'vae_encoder_latent_output_proj')] = a.pop(k)\n",
-    "    if k.startswith('model.latent_out_proj.'):\n",
-    "        conv[k.replace('latent_out_proj', 'encoder_latent_input_proj')] = a.pop(k)\n",
-    "    if k.startswith('model.additional_pos_embed.'):\n",
-    "        conv[k.replace('additional_pos_embed', 'encoder_robot_and_latent_pos_embed')] = a.pop(k)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 46,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "OrderedDict()"
-      ]
-     },
-     "execution_count": 46,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "a"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 47,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "for k, v in conv.items():\n",
-    "    assert b[k].shape == v.shape\n",
-    "    b[k] = v"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 53,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "save_file(b, converted_ckpt_path)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 54,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'/home/thomwolf/Documents/Github/ACT/checkpoints/blue_red_sort/config.yaml'"
-      ]
-     },
-     "execution_count": 54,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Now also copy the config files\n",
-    "import shutil\n",
-    "shutil.copy(comparison_config_json_path, converted_ckpt_path.replace('model.safetensors', 'config.json'))\n",
-    "shutil.copy(comparison_config_yaml_path, converted_ckpt_path.replace('model.safetensors', 'config.yaml'))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "lerobot",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.14"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/examples/real_robot_example/gym_real_world/init.py
+++ b/examples/real_robot_example/gym_real_world/init.py
@@ -1,8 +0,0 @@
-from gymnasium.envs.registration import register
-
-register(
-    id="gym_real_world/RealEnv-v0",
-    entry_point="gym_real_world.gym_environment:RealEnv",
-    max_episode_steps=300,
-    nondeterministic=True,
-)
--- a/examples/real_robot_example/gym_real_world/dynamixel.py
+++ b/examples/real_robot_example/gym_real_world/dynamixel.py
@@ -1,363 +0,0 @@
-# ruff: noqa
-"""From Alexander Koch low_cost_robot codebase at https://github.com/AlexanderKoch-Koch/low_cost_robot
-Dynamixel class to control the dynamixel servos
-"""
-
-from __future__ import annotations
-
-import enum
-import math
-import os
-from dataclasses import dataclass
-
-import numpy as np
-from dynamixel_sdk import *  # Uses Dynamixel SDK library
-
-
-def pos2pwm(pos: np.ndarray) -> np.ndarray:
-    """
-    :param pos: numpy array of joint positions in range [-pi, pi]
-    :return: numpy array of pwm values in range [0, 4096]
-    """
-    return ((pos / 3.14 + 1.0) * 2048).astype(np.int64)
-
-
-def pwm2pos(pwm: np.ndarray) -> np.ndarray:
-    """
-    :param pwm: numpy array of pwm values in range [0, 4096]
-    :return: numpy array of joint positions in range [-pi, pi]
-    """
-    return (pwm / 2048 - 1) * 3.14
-
-
-def pwm2vel(pwm: np.ndarray) -> np.ndarray:
-    """
-    :param pwm: numpy array of pwm/s joint velocities
-    :return: numpy array of rad/s joint velocities
-    """
-    return pwm * 3.14 / 2048
-
-
-def vel2pwm(vel: np.ndarray) -> np.ndarray:
-    """
-    :param vel: numpy array of rad/s joint velocities
-    :return: numpy array of pwm/s joint velocities
-    """
-    return (vel * 2048 / 3.14).astype(np.int64)
-
-
-class ReadAttribute(enum.Enum):
-    TEMPERATURE = 146
-    VOLTAGE = 145
-    VELOCITY = 128
-    POSITION = 132
-    CURRENT = 126
-    PWM = 124
-    HARDWARE_ERROR_STATUS = 70
-    HOMING_OFFSET = 20
-    BAUDRATE = 8
-
-
-class OperatingMode(enum.Enum):
-    VELOCITY = 1
-    POSITION = 3
-    CURRENT_CONTROLLED_POSITION = 5
-    PWM = 16
-    UNKNOWN = -1
-
-
-class Dynamixel:
-    ADDR_TORQUE_ENABLE = 64
-    ADDR_GOAL_POSITION = 116
-    ADDR_VELOCITY_LIMIT = 44
-    ADDR_GOAL_PWM = 100
-    OPERATING_MODE_ADDR = 11
-    POSITION_I = 82
-    POSITION_P = 84
-    ADDR_ID = 7
-
-    @dataclass
-    class Config:
-        def instantiate(self):
-            return Dynamixel(self)
-
-        baudrate: int = 57600
-        protocol_version: float = 2.0
-        device_name: str = ""  # /dev/tty.usbserial-1120'
-        dynamixel_id: int = 1
-
-    def __init__(self, config: Config):
-        self.config = config
-        self.connect()
-
-    def connect(self):
-        if self.config.device_name == "":
-            for port_name in os.listdir("/dev"):
-                if "ttyUSB" in port_name or "ttyACM" in port_name:
-                    self.config.device_name = "/dev/" + port_name
-                    print(f"using device {self.config.device_name}")
-        self.portHandler = PortHandler(self.config.device_name)
-        # self.portHandler.LA
-        self.packetHandler = PacketHandler(self.config.protocol_version)
-        if not self.portHandler.openPort():
-            raise Exception(f"Failed to open port {self.config.device_name}")
-
-        if not self.portHandler.setBaudRate(self.config.baudrate):
-            raise Exception(f"failed to set baudrate to {self.config.baudrate}")
-
-        # self.operating_mode = OperatingMode.UNKNOWN
-        # self.torque_enabled = False
-        # self._disable_torque()
-
-        self.operating_modes = [None for _ in range(32)]
-        self.torque_enabled = [None for _ in range(32)]
-        return True
-
-    def disconnect(self):
-        self.portHandler.closePort()
-
-    def set_goal_position(self, motor_id, goal_position):
-        # if self.operating_modes[motor_id] is not OperatingMode.POSITION:
-        #     self._disable_torque(motor_id)
-        #     self.set_operating_mode(motor_id, OperatingMode.POSITION)
-
-        # if not self.torque_enabled[motor_id]:
-        #     self._enable_torque(motor_id)
-
-        # self._enable_torque(motor_id)
-        dxl_comm_result, dxl_error = self.packetHandler.write4ByteTxRx(
-            self.portHandler, motor_id, self.ADDR_GOAL_POSITION, goal_position
-        )
-        # self._process_response(dxl_comm_result, dxl_error)
-        # print(f'set position of motor {motor_id} to {goal_position}')
-
-    def set_pwm_value(self, motor_id: int, pwm_value, tries=3):
-        if self.operating_modes[motor_id] is not OperatingMode.PWM:
-            self._disable_torque(motor_id)
-            self.set_operating_mode(motor_id, OperatingMode.PWM)
-
-        if not self.torque_enabled[motor_id]:
-            self._enable_torque(motor_id)
-            # print(f'enabling torque')
-        dxl_comm_result, dxl_error = self.packetHandler.write2ByteTxRx(
-            self.portHandler, motor_id, self.ADDR_GOAL_PWM, pwm_value
-        )
-        # self._process_response(dxl_comm_result, dxl_error)
-        # print(f'set pwm of motor {motor_id} to {pwm_value}')
-        if dxl_comm_result != COMM_SUCCESS:
-            if tries <= 1:
-                raise ConnectionError(f"dxl_comm_result: {self.packetHandler.getTxRxResult(dxl_comm_result)}")
-            else:
-                print(f"dynamixel pwm setting failure trying again with {tries - 1} tries")
-                self.set_pwm_value(motor_id, pwm_value, tries=tries - 1)
-        elif dxl_error != 0:
-            print(f"dxl error {dxl_error}")
-            raise ConnectionError(f"dynamixel error: {self.packetHandler.getTxRxResult(dxl_error)}")
-
-    def read_temperature(self, motor_id: int):
-        return self._read_value(motor_id, ReadAttribute.TEMPERATURE, 1)
-
-    def read_velocity(self, motor_id: int):
-        pos = self._read_value(motor_id, ReadAttribute.VELOCITY, 4)
-        if pos > 2**31:
-            pos -= 2**32
-        # print(f'read position {pos} for motor {motor_id}')
-        return pos
-
-    def read_position(self, motor_id: int):
-        pos = self._read_value(motor_id, ReadAttribute.POSITION, 4)
-        if pos > 2**31:
-            pos -= 2**32
-        # print(f'read position {pos} for motor {motor_id}')
-        return pos
-
-    def read_position_degrees(self, motor_id: int) -> float:
-        return (self.read_position(motor_id) / 4096) * 360
-
-    def read_position_radians(self, motor_id: int) -> float:
-        return (self.read_position(motor_id) / 4096) * 2 * math.pi
-
-    def read_current(self, motor_id: int):
-        current = self._read_value(motor_id, ReadAttribute.CURRENT, 2)
-        if current > 2**15:
-            current -= 2**16
-        return current
-
-    def read_present_pwm(self, motor_id: int):
-        return self._read_value(motor_id, ReadAttribute.PWM, 2)
-
-    def read_hardware_error_status(self, motor_id: int):
-        return self._read_value(motor_id, ReadAttribute.HARDWARE_ERROR_STATUS, 1)
-
-    def disconnect(self):
-        self.portHandler.closePort()
-
-    def set_id(self, old_id, new_id, use_broadcast_id: bool = False):
-        """
-        sets the id of the dynamixel servo
-        @param old_id: current id of the servo
-        @param new_id: new id
-        @param use_broadcast_id: set ids of all connected dynamixels if True.
-         If False, change only servo with self.config.id
-        @return:
-        """
-        if use_broadcast_id:
-            current_id = 254
-        else:
-            current_id = old_id
-        dxl_comm_result, dxl_error = self.packetHandler.write1ByteTxRx(
-            self.portHandler, current_id, self.ADDR_ID, new_id
-        )
-        self._process_response(dxl_comm_result, dxl_error, old_id)
-        self.config.id = id
-
-    def _enable_torque(self, motor_id):
-        dxl_comm_result, dxl_error = self.packetHandler.write1ByteTxRx(
-            self.portHandler, motor_id, self.ADDR_TORQUE_ENABLE, 1
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-        self.torque_enabled[motor_id] = True
-
-    def _disable_torque(self, motor_id):
-        dxl_comm_result, dxl_error = self.packetHandler.write1ByteTxRx(
-            self.portHandler, motor_id, self.ADDR_TORQUE_ENABLE, 0
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-        self.torque_enabled[motor_id] = False
-
-    def _process_response(self, dxl_comm_result: int, dxl_error: int, motor_id: int):
-        if dxl_comm_result != COMM_SUCCESS:
-            raise ConnectionError(
-                f"dxl_comm_result for motor {motor_id}: {self.packetHandler.getTxRxResult(dxl_comm_result)}"
-            )
-        elif dxl_error != 0:
-            print(f"dxl error {dxl_error}")
-            raise ConnectionError(
-                f"dynamixel error for motor {motor_id}: {self.packetHandler.getTxRxResult(dxl_error)}"
-            )
-
-    def set_operating_mode(self, motor_id: int, operating_mode: OperatingMode):
-        dxl_comm_result, dxl_error = self.packetHandler.write2ByteTxRx(
-            self.portHandler, motor_id, self.OPERATING_MODE_ADDR, operating_mode.value
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-        self.operating_modes[motor_id] = operating_mode
-
-    def set_pwm_limit(self, motor_id: int, limit: int):
-        dxl_comm_result, dxl_error = self.packetHandler.write2ByteTxRx(self.portHandler, motor_id, 36, limit)
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-
-    def set_velocity_limit(self, motor_id: int, velocity_limit):
-        dxl_comm_result, dxl_error = self.packetHandler.write4ByteTxRx(
-            self.portHandler, motor_id, self.ADDR_VELOCITY_LIMIT, velocity_limit
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-
-    def set_P(self, motor_id: int, P: int):
-        dxl_comm_result, dxl_error = self.packetHandler.write2ByteTxRx(
-            self.portHandler, motor_id, self.POSITION_P, P
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-
-    def set_I(self, motor_id: int, I: int):
-        dxl_comm_result, dxl_error = self.packetHandler.write2ByteTxRx(
-            self.portHandler, motor_id, self.POSITION_I, I
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-
-    def read_home_offset(self, motor_id: int):
-        self._disable_torque(motor_id)
-        # dxl_comm_result, dxl_error = self.packetHandler.write4ByteTxRx(self.portHandler, motor_id,
-        #                                                                ReadAttribute.HOMING_OFFSET.value, home_position)
-        home_offset = self._read_value(motor_id, ReadAttribute.HOMING_OFFSET, 4)
-        # self._process_response(dxl_comm_result, dxl_error)
-        self._enable_torque(motor_id)
-        return home_offset
-
-    def set_home_offset(self, motor_id: int, home_position: int):
-        self._disable_torque(motor_id)
-        dxl_comm_result, dxl_error = self.packetHandler.write4ByteTxRx(
-            self.portHandler, motor_id, ReadAttribute.HOMING_OFFSET.value, home_position
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-        self._enable_torque(motor_id)
-
-    def set_baudrate(self, motor_id: int, baudrate):
-        # translate baudrate into dynamixel baudrate setting id
-        if baudrate == 57600:
-            baudrate_id = 1
-        elif baudrate == 1_000_000:
-            baudrate_id = 3
-        elif baudrate == 2_000_000:
-            baudrate_id = 4
-        elif baudrate == 3_000_000:
-            baudrate_id = 5
-        elif baudrate == 4_000_000:
-            baudrate_id = 6
-        else:
-            raise Exception("baudrate not implemented")
-
-        self._disable_torque(motor_id)
-        dxl_comm_result, dxl_error = self.packetHandler.write1ByteTxRx(
-            self.portHandler, motor_id, ReadAttribute.BAUDRATE.value, baudrate_id
-        )
-        self._process_response(dxl_comm_result, dxl_error, motor_id)
-
-    def _read_value(self, motor_id, attribute: ReadAttribute, num_bytes: int, tries=10):
-        try:
-            if num_bytes == 1:
-                value, dxl_comm_result, dxl_error = self.packetHandler.read1ByteTxRx(
-                    self.portHandler, motor_id, attribute.value
-                )
-            elif num_bytes == 2:
-                value, dxl_comm_result, dxl_error = self.packetHandler.read2ByteTxRx(
-                    self.portHandler, motor_id, attribute.value
-                )
-            elif num_bytes == 4:
-                value, dxl_comm_result, dxl_error = self.packetHandler.read4ByteTxRx(
-                    self.portHandler, motor_id, attribute.value
-                )
-        except Exception:
-            if tries == 0:
-                raise Exception
-            else:
-                return self._read_value(motor_id, attribute, num_bytes, tries=tries - 1)
-        if dxl_comm_result != COMM_SUCCESS:
-            if tries <= 1:
-                # print("%s" % self.packetHandler.getTxRxResult(dxl_comm_result))
-                raise ConnectionError(f"dxl_comm_result {dxl_comm_result} for servo {motor_id} value {value}")
-            else:
-                print(f"dynamixel read failure for servo {motor_id} trying again with {tries - 1} tries")
-                time.sleep(0.02)
-                return self._read_value(motor_id, attribute, num_bytes, tries=tries - 1)
-        elif dxl_error != 0:  # # print("%s" % self.packetHandler.getRxPacketError(dxl_error))
-            # raise ConnectionError(f'dxl_error {dxl_error} binary ' + "{0:b}".format(37))
-            if tries == 0 and dxl_error != 128:
-                raise Exception(f"Failed to read value from motor {motor_id} error is {dxl_error}")
-            else:
-                return self._read_value(motor_id, attribute, num_bytes, tries=tries - 1)
-        return value
-
-    def set_home_position(self, motor_id: int):
-        print(f"setting home position for motor {motor_id}")
-        self.set_home_offset(motor_id, 0)
-        current_position = self.read_position(motor_id)
-        print(f"position before {current_position}")
-        self.set_home_offset(motor_id, -current_position)
-        # dynamixel.set_home_offset(motor_id, -4096)
-        # dynamixel.set_home_offset(motor_id, -4294964109)
-        current_position = self.read_position(motor_id)
-        # print(f'signed position {current_position - 2** 32}')
-        print(f"position after {current_position}")
-
-
-if __name__ == "__main__":
-    dynamixel = Dynamixel.Config(baudrate=1_000_000, device_name="/dev/tty.usbmodem57380045631").instantiate()
-    motor_id = 1
-    pos = dynamixel.read_position(motor_id)
-    for i in range(10):
-        s = time.monotonic()
-        pos = dynamixel.read_position(motor_id)
-        delta = time.monotonic() - s
-        print(f"read position took {delta}")
-        print(f"position {pos}")
--- a/examples/real_robot_example/gym_real_world/gym_environment.py
+++ b/examples/real_robot_example/gym_real_world/gym_environment.py
@@ -1,192 +0,0 @@
-import time
-from unittest.mock import MagicMock
-
-import cv2
-import gymnasium as gym
-import numpy as np
-from gymnasium import spaces
-
-from .dynamixel import pos2pwm, pwm2pos
-from .robot import Robot
-
-FPS = 30
-
-CAMERAS_SHAPES = {
-    "images.high": (480, 640, 3),
-    "images.low": (480, 640, 3),
-}
-
-CAMERAS_PORTS = {
-    "images.high": "/dev/video6",
-    "images.low": "/dev/video0",
-}
-
-LEADER_PORT = "/dev/ttyACM1"
-FOLLOWER_PORT = "/dev/ttyACM0"
-
-MockRobot = MagicMock()
-MockRobot.read_position = MagicMock()
-MockRobot.read_position.return_value = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
-
-MockCamera = MagicMock()
-MockCamera.isOpened = MagicMock(return_value=True)
-MockCamera.read = MagicMock(return_value=(True, np.zeros((480, 640, 3), dtype=np.uint8)))
-
-
-def capture_image(cam, cam_width, cam_height):
-    # Capture a single frame
-    _, frame = cam.read()
-    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-    # # Define your crop coordinates (top left corner and bottom right corner)
-    # x1, y1 = 400, 0  # Example starting coordinates (top left of the crop rectangle)
-    # x2, y2 = 1600, 900  # Example ending coordinates (bottom right of the crop rectangle)
-    # # Crop the image
-    # image = image[y1:y2, x1:x2]
-    # Resize the image
-    image = cv2.resize(image, (cam_width, cam_height), interpolation=cv2.INTER_AREA)
-
-    return image
-
-
-class RealEnv(gym.Env):
-    metadata = {}
-
-    def __init__(
-        self,
-        record: bool = False,
-        num_joints: int = 6,
-        cameras_shapes: dict = CAMERAS_SHAPES,
-        cameras_ports: dict = CAMERAS_PORTS,
-        follower_port: str = FOLLOWER_PORT,
-        leader_port: str = LEADER_PORT,
-        warmup_steps: int = 100,
-        trigger_torque=70,
-        fps: int = FPS,
-        fps_tolerance: float = 0.1,
-        mock: bool = False,
-    ):
-        self.num_joints = num_joints
-        self.cameras_shapes = cameras_shapes
-        self.cameras_ports = cameras_ports
-        self.warmup_steps = warmup_steps
-        assert len(self.cameras_shapes) == len(self.cameras_ports), "Number of cameras and shapes must match."
-
-        self.follower_port = follower_port
-        self.leader_port = leader_port
-        self.record = record
-        self.fps = fps
-        self.fps_tolerance = fps_tolerance
-
-        # Initialize the robot
-        self.follower = Robot(device_name=self.follower_port) if not mock else MockRobot
-        if self.record:
-            self.leader = Robot(device_name=self.leader_port) if not mock else MockRobot
-            self.leader.set_trigger_torque(trigger_torque)
-
-        # Initialize the cameras - sorted by camera names
-        self.cameras = {}
-        for cn, p in sorted(self.cameras_ports.items()):
-            self.cameras[cn] = cv2.VideoCapture(p) if not mock else MockCamera
-            if not self.cameras[cn].isOpened():
-                raise OSError(
-                    f"Cannot open camera port {p} for {cn}."
-                    f" Make sure the camera is connected and the port is correct."
-                    f"Also check you are not spinning several instances of the same environment (eval.batch_size)"
-                )
-
-        # Specify gym action and observation spaces
-        observation_space = {}
-
-        if self.num_joints > 0:
-            observation_space["agent_pos"] = spaces.Box(
-                low=-1000.0,
-                high=1000.0,
-                shape=(num_joints,),
-                dtype=np.float64,
-            )
-        if self.record:
-            observation_space["leader_pos"] = spaces.Box(
-                low=-1000.0,
-                high=1000.0,
-                shape=(num_joints,),
-                dtype=np.float64,
-            )
-
-        if self.cameras_shapes:
-            for cn, hwc_shape in self.cameras_shapes.items():
-                # Assumes images are unsigned int8 in [0,255]
-                observation_space[cn] = spaces.Box(
-                    low=0,
-                    high=255,
-                    # height x width x channels (e.g. 480 x 640 x 3)
-                    shape=hwc_shape,
-                    dtype=np.uint8,
-                )
-
-        self.observation_space = spaces.Dict(observation_space)
-        self.action_space = spaces.Box(low=-1, high=1, shape=(num_joints,), dtype=np.float32)
-
-        self._observation = {}
-        self._terminated = False
-        self.timestamps = []
-
-    def _get_obs(self):
-        qpos = self.follower.read_position()
-        self._observation["agent_pos"] = pwm2pos(qpos)
-        for cn, c in self.cameras.items():
-            self._observation[cn] = capture_image(c, self.cameras_shapes[cn][1], self.cameras_shapes[cn][0])
-
-        if self.record:
-            action = self.leader.read_position()
-            self._observation["leader_pos"] = pwm2pos(action)
-
-    def reset(self, seed: int | None = None):
-        # Reset the robot and sync the leader and follower if we are recording
-        for _ in range(self.warmup_steps):
-            self._get_obs()
-            if self.record:
-                self.follower.set_goal_pos(pos2pwm(self._observation["leader_pos"]))
-        self._terminated = False
-        info = {}
-        self.timestamps = []
-        return self._observation, info
-
-    def step(self, action: np.ndarray = None):
-        if self.timestamps:
-            # wait the right amount of time to stay at the desired fps
-            time.sleep(max(0, 1 / self.fps - (time.time() - self.timestamps[-1])))
-
-        self.timestamps.append(time.time())
-
-        # Get the observation
-        self._get_obs()
-        if self.record:
-            # Teleoperate the leader
-            self.follower.set_goal_pos(pos2pwm(self._observation["leader_pos"]))
-        else:
-            # Apply the action to the follower
-            self.follower.set_goal_pos(pos2pwm(action))
-
-        reward = 0
-        terminated = truncated = self._terminated
-        info = {"timestamp": self.timestamps[-1] - self.timestamps[0], "fps_error": False}
-
-        # Check if we are able to keep up with the desired fps
-        if len(self.timestamps) > 1 and (self.timestamps[-1] - self.timestamps[-2]) > 1 / (
-            self.fps - self.fps_tolerance
-        ):
-            print(
-                f"Error: recording fps {1 / (self.timestamps[-1] - self.timestamps[-2]):.5f} is lower"
-                f" than min admited fps {(self.fps - self.fps_tolerance):.5f}"
-                f" at frame {len(self.timestamps)}"
-            )
-            info["fps_error"] = True
-
-        return self._observation, reward, terminated, truncated, info
-
-    def render(self): ...
-
-    def close(self):
-        self.follower._disable_torque()
-        if self.record:
-            self.leader._disable_torque()
--- a/examples/real_robot_example/gym_real_world/robot.py
+++ b/examples/real_robot_example/gym_real_world/robot.py
@@ -1,168 +0,0 @@
-# ruff: noqa
-"""From Alexander Koch low_cost_robot codebase at https://github.com/AlexanderKoch-Koch/low_cost_robot
-Class to control the robot using dynamixel servos.
-"""
-
-from enum import Enum, auto
-from typing import Union
-
-import numpy as np
-from dynamixel_sdk import DXL_HIBYTE, DXL_HIWORD, DXL_LOBYTE, DXL_LOWORD, GroupSyncRead, GroupSyncWrite
-
-from .dynamixel import Dynamixel, OperatingMode, ReadAttribute
-
-
-class MotorControlType(Enum):
-    PWM = auto()
-    POSITION_CONTROL = auto()
-    DISABLED = auto()
-    UNKNOWN = auto()
-
-
-class Robot:
-    def __init__(self, device_name: str, baudrate=1_000_000, servo_ids=[1, 2, 3, 4, 5, 6]) -> None:
-        self.servo_ids = servo_ids
-        self.dynamixel = Dynamixel.Config(baudrate=baudrate, device_name=device_name).instantiate()
-        self._init_motors()
-
-    def _init_motors(self):
-        self.position_reader = GroupSyncRead(
-            self.dynamixel.portHandler, self.dynamixel.packetHandler, ReadAttribute.POSITION.value, 4
-        )
-        for id in self.servo_ids:
-            self.position_reader.addParam(id)
-
-        self.velocity_reader = GroupSyncRead(
-            self.dynamixel.portHandler, self.dynamixel.packetHandler, ReadAttribute.VELOCITY.value, 4
-        )
-        for id in self.servo_ids:
-            self.velocity_reader.addParam(id)
-
-        self.pos_writer = GroupSyncWrite(
-            self.dynamixel.portHandler, self.dynamixel.packetHandler, self.dynamixel.ADDR_GOAL_POSITION, 4
-        )
-        for id in self.servo_ids:
-            self.pos_writer.addParam(id, [2048])
-
-        self.pwm_writer = GroupSyncWrite(
-            self.dynamixel.portHandler, self.dynamixel.packetHandler, self.dynamixel.ADDR_GOAL_PWM, 2
-        )
-        for id in self.servo_ids:
-            self.pwm_writer.addParam(id, [2048])
-        self._disable_torque()
-        self.motor_control_state = MotorControlType.DISABLED
-
-    def read_position(self, tries=2):
-        """
-        Reads the joint positions of the robot. 2048 is the center position. 0 and 4096 are 180 degrees in each direction.
-        :param tries: maximum number of tries to read the position
-        :return: list of joint positions in range [0, 4096]
-        """
-        result = self.position_reader.txRxPacket()
-        if result != 0:
-            if tries > 0:
-                return self.read_position(tries=tries - 1)
-            else:
-                print("failed to read position!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
-        positions = []
-        for id in self.servo_ids:
-            position = self.position_reader.getData(id, ReadAttribute.POSITION.value, 4)
-            if position > 2**31:
-                position -= 2**32
-            positions.append(position)
-        return np.array(positions)
-
-    def read_velocity(self):
-        """
-        Reads the joint velocities of the robot.
-        :return: list of joint velocities,
-        """
-        self.velocity_reader.txRxPacket()
-        velocties = []
-        for id in self.servo_ids:
-            velocity = self.velocity_reader.getData(id, ReadAttribute.VELOCITY.value, 4)
-            if velocity > 2**31:
-                velocity -= 2**32
-            velocties.append(velocity)
-        return np.array(velocties)
-
-    def set_goal_pos(self, action):
-        """
-        :param action: list or numpy array of target joint positions in range [0, 4096]
-        """
-        if self.motor_control_state is not MotorControlType.POSITION_CONTROL:
-            self._set_position_control()
-        for i, motor_id in enumerate(self.servo_ids):
-            data_write = [
-                DXL_LOBYTE(DXL_LOWORD(action[i])),
-                DXL_HIBYTE(DXL_LOWORD(action[i])),
-                DXL_LOBYTE(DXL_HIWORD(action[i])),
-                DXL_HIBYTE(DXL_HIWORD(action[i])),
-            ]
-            self.pos_writer.changeParam(motor_id, data_write)
-
-        self.pos_writer.txPacket()
-
-    def set_pwm(self, action):
-        """
-        Sets the pwm values for the servos.
-        :param action: list or numpy array of pwm values in range [0, 885]
-        """
-        if self.motor_control_state is not MotorControlType.PWM:
-            self._set_pwm_control()
-        for i, motor_id in enumerate(self.servo_ids):
-            data_write = [
-                DXL_LOBYTE(DXL_LOWORD(action[i])),
-                DXL_HIBYTE(DXL_LOWORD(action[i])),
-            ]
-            self.pwm_writer.changeParam(motor_id, data_write)
-
-        self.pwm_writer.txPacket()
-
-    def set_trigger_torque(self, torque: int):
-        """
-        Sets a constant torque torque for the last servo in the chain. This is useful for the trigger of the leader arm
-        """
-        self.dynamixel._enable_torque(self.servo_ids[-1])
-        self.dynamixel.set_pwm_value(self.servo_ids[-1], torque)
-
-    def limit_pwm(self, limit: Union[int, list, np.ndarray]):
-        """
-        Limits the pwm values for the servos in for position control
-        @param limit: 0 ~ 885
-        @return:
-        """
-        if isinstance(limit, int):
-            limits = [
-                limit,
-            ] * 5
-        else:
-            limits = limit
-        self._disable_torque()
-        for motor_id, limit in zip(self.servo_ids, limits, strict=False):
-            self.dynamixel.set_pwm_limit(motor_id, limit)
-        self._enable_torque()
-
-    def _disable_torque(self):
-        print(f"disabling torque for servos {self.servo_ids}")
-        for motor_id in self.servo_ids:
-            self.dynamixel._disable_torque(motor_id)
-
-    def _enable_torque(self):
-        print(f"enabling torque for servos {self.servo_ids}")
-        for motor_id in self.servo_ids:
-            self.dynamixel._enable_torque(motor_id)
-
-    def _set_pwm_control(self):
-        self._disable_torque()
-        for motor_id in self.servo_ids:
-            self.dynamixel.set_operating_mode(motor_id, OperatingMode.PWM)
-        self._enable_torque()
-        self.motor_control_state = MotorControlType.PWM
-
-    def _set_position_control(self):
-        self._disable_torque()
-        for motor_id in self.servo_ids:
-            self.dynamixel.set_operating_mode(motor_id, OperatingMode.POSITION)
-        self._enable_torque()
-        self.motor_control_state = MotorControlType.POSITION_CONTROL
--- a/examples/real_robot_example/record_training_data.py
+++ b/examples/real_robot_example/record_training_data.py
@@ -1,237 +0,0 @@
-"""This script demonstrates how to record a LeRobot dataset of training data
-using a very simple gym environment (see in examples/real_robot_example/gym_real_world/gym_environment.py).
-
-"""
-
-import argparse
-import copy
-import os
-from pathlib import Path
-
-import gym_real_world  # noqa: F401
-import gymnasium as gym
-import numpy as np
-import torch
-from datasets import Dataset, Features, Sequence, Value
-from omegaconf import OmegaConf
-from tqdm import tqdm
-
-from lerobot.common.datasets.compute_stats import compute_stats
-from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, DATA_DIR, LeRobotDataset
-from lerobot.common.datasets.push_dataset_to_hub.utils import concatenate_episodes, save_images_concurrently
-from lerobot.common.datasets.utils import (
-    hf_transform_to_torch,
-)
-from lerobot.common.datasets.video_utils import VideoFrame, encode_video_frames
-from lerobot.scripts.push_dataset_to_hub import push_meta_data_to_hub, push_videos_to_hub, save_meta_data
-
-# parse the repo_id name via command line
-parser = argparse.ArgumentParser()
-parser.add_argument("--repo-id", type=str, default="thomwolf/blue_red_sort")
-parser.add_argument("--num-episodes", type=int, default=2)
-parser.add_argument("--num-frames", type=int, default=400)
-parser.add_argument("--num-workers", type=int, default=16)
-parser.add_argument("--keep-last", action="store_true")
-parser.add_argument("--data_dir", type=str, default=None)
-parser.add_argument("--push-to-hub", action="store_true")
-parser.add_argument("--fps", type=int, default=30, help="Frames per second of the recording.")
-parser.add_argument(
-    "--fps_tolerance",
-    type=float,
-    default=0.5,
-    help="Tolerance in fps for the recording before dropping episodes.",
-)
-parser.add_argument(
-    "--revision", type=str, default=CODEBASE_VERSION, help="Codebase version used to generate the dataset."
-)
-parser.add_argument("--gym-config", type=str, default=None, help="Path to the gym config file.")
-parser.add_argument("--mock_robot", action="store_true")
-args = parser.parse_args()
-
-repo_id = args.repo_id
-num_episodes = args.num_episodes
-num_frames = args.num_frames
-revision = args.revision
-fps = args.fps
-fps_tolerance = args.fps_tolerance
-
-out_data = DATA_DIR / repo_id if args.data_dir is None else Path(args.data_dir)
-
-# During data collection, frames are stored as png images in `images_dir`
-images_dir = out_data / "images"
-# After data collection, png images of each episode are encoded into a mp4 file stored in `videos_dir`
-videos_dir = out_data / "videos"
-meta_data_dir = out_data / "meta_data"
-
-gym_config = None
-if args.config is not None:
-    gym_config = OmegaConf.load(args.config)
-
-# Create image and video directories
-if not os.path.exists(images_dir):
-    os.makedirs(images_dir, exist_ok=True)
-if not os.path.exists(videos_dir):
-    os.makedirs(videos_dir, exist_ok=True)
-
-if __name__ == "__main__":
-    # Create the gym environment - check the kwargs in gym_real_world/gym_environment.py
-    gym_handle = "gym_real_world/RealEnv-v0"
-    gym_kwargs = {}
-    if gym_config is not None:
-        gym_kwargs = OmegaConf.to_container(gym_config.gym_kwargs)
-    env = gym.make(
-        gym_handle, disable_env_checker=True, record=True, fps=fps, fps_tolerance=fps_tolerance, mock=True
-    )
-
-    ep_dicts = []
-    episode_data_index = {"from": [], "to": []}
-    ep_fps = []
-    id_from = 0
-    id_to = 0
-    os.system('spd-say "gym environment created"')
-
-    ep_idx = 0
-    while ep_idx < num_episodes:
-        # bring the follower to the leader and start camera
-        env.reset()
-
-        os.system(f'spd-say "go {ep_idx}"')
-        # init buffers
-        obs_replay = {k: [] for k in env.observation_space}
-
-        drop_episode = False
-        timestamps = []
-        for _ in tqdm(range(num_frames)):
-            # Apply the next action
-            observation, _, _, _, info = env.step(action=None)
-            # images_stacked = np.hstack(list(observation['pixels'].values()))
-            # images_stacked = cv2.cvtColor(images_stacked, cv2.COLOR_RGB2BGR)
-            # cv2.imshow('frame', images_stacked)
-
-            if info["fps_error"]:
-                os.system(f'spd-say "Error fps too low, dropping episode {ep_idx}"')
-                drop_episode = True
-                break
-
-            # store data
-            for key in observation:
-                obs_replay[key].append(copy.deepcopy(observation[key]))
-            timestamps.append(info["timestamp"])
-
-            # if cv2.waitKey(1) & 0xFF == ord('q'):
-            #     break
-
-        os.system('spd-say "stop"')
-
-        if not drop_episode:
-            os.system(f'spd-say "saving episode {ep_idx}"')
-            ep_dict = {}
-            # store images in png and create the video
-            for img_key in env.cameras:
-                save_images_concurrently(
-                    obs_replay[img_key],
-                    images_dir / f"{img_key}_episode_{ep_idx:06d}",
-                    args.num_workers,
-                )
-                fname = f"{img_key}_episode_{ep_idx:06d}.mp4"
-                # store the reference to the video frame
-                ep_dict[f"observation.{img_key}"] = [
-                    {"path": f"videos/{fname}", "timestamp": tstp} for tstp in timestamps
-                ]
-
-            state = torch.tensor(np.array(obs_replay["agent_pos"]))
-            action = torch.tensor(np.array(obs_replay["leader_pos"]))
-            next_done = torch.zeros(num_frames, dtype=torch.bool)
-            next_done[-1] = True
-
-            ep_dict["observation.state"] = state
-            ep_dict["action"] = action
-            ep_dict["episode_index"] = torch.tensor([ep_idx] * num_frames, dtype=torch.int64)
-            ep_dict["frame_index"] = torch.arange(0, num_frames, 1)
-            ep_dict["timestamp"] = torch.tensor(timestamps)
-            ep_dict["next.done"] = next_done
-            ep_fps.append(num_frames / timestamps[-1])
-            ep_dicts.append(ep_dict)
-            print(f"Episode {ep_idx} done, fps: {ep_fps[-1]:.2f}")
-
-            episode_data_index["from"].append(id_from)
-            episode_data_index["to"].append(
-                id_from + num_frames if args.keep_last else id_from + num_frames - 1
-            )
-
-            id_to = id_from + num_frames if args.keep_last else id_from + num_frames - 1
-            id_from = id_to
-
-            ep_idx += 1
-
-    env.close()
-
-    os.system('spd-say "encode video frames"')
-    for ep_idx in range(num_episodes):
-        for img_key in env.cameras:
-            # If necessary, we may want to encode the video
-            # with variable frame rate: https://superuser.com/questions/1661901/encoding-video-from-vfr-still-images
-            encode_video_frames(
-                images_dir / f"{img_key}_episode_{ep_idx:06d}",
-                videos_dir / f"{img_key}_episode_{ep_idx:06d}.mp4",
-                ep_fps[ep_idx],
-            )
-
-    os.system('spd-say "concatenate episodes"')
-    data_dict = concatenate_episodes(
-        ep_dicts, drop_episodes_last_frame=not args.keep_last
-    )  # Since our fps varies we are sometimes off tolerance for the last frame
-
-    features = {}
-
-    keys = [key for key in data_dict if "observation.images." in key]
-    for key in keys:
-        features[key] = VideoFrame()
-
-    features["observation.state"] = Sequence(
-        length=data_dict["observation.state"].shape[1], feature=Value(dtype="float32", id=None)
-    )
-    features["action"] = Sequence(
-        length=data_dict["action"].shape[1], feature=Value(dtype="float32", id=None)
-    )
-    features["episode_index"] = Value(dtype="int64", id=None)
-    features["frame_index"] = Value(dtype="int64", id=None)
-    features["timestamp"] = Value(dtype="float32", id=None)
-    features["next.done"] = Value(dtype="bool", id=None)
-    features["index"] = Value(dtype="int64", id=None)
-
-    hf_dataset = Dataset.from_dict(data_dict, features=Features(features))
-    hf_dataset.set_transform(hf_transform_to_torch)
-
-    info = {
-        "fps": sum(ep_fps) / len(ep_fps),  # to have a good tolerance in data processing for the slowest video
-        "video": 1,
-    }
-
-    os.system('spd-say "from preloaded"')
-    lerobot_dataset = LeRobotDataset.from_preloaded(
-        repo_id=repo_id,
-        version=revision,
-        hf_dataset=hf_dataset,
-        episode_data_index=episode_data_index,
-        info=info,
-        videos_dir=videos_dir,
-    )
-    os.system('spd-say "compute stats"')
-    stats = compute_stats(lerobot_dataset)
-
-    os.system('spd-say "save to disk"')
-    hf_dataset = hf_dataset.with_format(None)  # to remove transforms that cant be saved
-    hf_dataset.save_to_disk(str(out_data / "train"))
-
-    save_meta_data(info, stats, episode_data_index, meta_data_dir)
-
-    if args.push_to_hub:
-        hf_dataset.push_to_hub(repo_id, token=True, revision="main")
-        hf_dataset.push_to_hub(repo_id, token=True, revision=revision)
-
-        push_meta_data_to_hub(repo_id, meta_data_dir, revision="main")
-        push_meta_data_to_hub(repo_id, meta_data_dir, revision=revision)
-
-        push_videos_to_hub(repo_id, videos_dir, revision="main")
-        push_videos_to_hub(repo_id, videos_dir, revision=revision)
--- a/examples/real_robot_example/run_policy.py
+++ b/examples/real_robot_example/run_policy.py
@@ -1,60 +0,0 @@
-import argparse
-import logging
-from pathlib import Path
-
-import gym_real_world  # noqa: F401
-import gymnasium as gym  # noqa: F401
-from huggingface_hub import snapshot_download
-from huggingface_hub.utils._errors import RepositoryNotFoundError
-from huggingface_hub.utils._validators import HFValidationError
-
-from lerobot.common.utils.utils import init_logging
-from lerobot.scripts.eval import eval
-
-if __name__ == "__main__":
-    init_logging()
-
-    parser = argparse.ArgumentParser(
-        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
-    )
-    group = parser.add_mutually_exclusive_group(required=True)
-    group.add_argument(
-        "-p",
-        "--pretrained-policy-name-or-path",
-        help=(
-            "Either the repo ID of a model hosted on the Hub or a path to a directory containing weights "
-            "saved using `Policy.save_pretrained`. If not provided, the policy is initialized from scratch "
-            "(useful for debugging). This argument is mutually exclusive with `--config`."
-        ),
-    )
-    parser.add_argument("--revision", help="Optionally provide the Hugging Face Hub revision ID.")
-    parser.add_argument(
-        "overrides",
-        nargs="*",
-        help="Any key=value arguments to override config values (use dots for.nested=overrides)",
-    )
-    args = parser.parse_args()
-
-    try:
-        pretrained_policy_path = Path(
-            snapshot_download(args.pretrained_policy_name_or_path, revision=args.revision)
-        )
-    except (HFValidationError, RepositoryNotFoundError) as e:
-        if isinstance(e, HFValidationError):
-            error_message = (
-                "The provided pretrained_policy_name_or_path is not a valid Hugging Face Hub repo ID."
-            )
-        else:
-            error_message = (
-                "The provided pretrained_policy_name_or_path was not found on the Hugging Face Hub."
-            )
-
-        logging.warning(f"{error_message} Treating it as a local directory.")
-        pretrained_policy_path = Path(args.pretrained_policy_name_or_path)
-    if not pretrained_policy_path.is_dir() or not pretrained_policy_path.exists():
-        raise ValueError(
-            "The provided pretrained_policy_name_or_path is not a valid/existing Hugging Face Hub "
-            "repo ID, nor is it an existing local directory."
-        )
-
-    eval(pretrained_policy_path=pretrained_policy_path, config_overrides=args.overrides)
--- a/examples/real_robot_example/train_config/env/gym_real_world.yaml
+++ b/examples/real_robot_example/train_config/env/gym_real_world.yaml
@@ -1,19 +0,0 @@
-# @package _global_
-
-fps: 30
-
-env:
-  name: real_world
-  task: RealEnv-v0
-  state_dim: 6
-  action_dim: 6
-  fps: ${fps}
-  episode_length: 200
-  real_world: true
-  gym:
-    cameras_shapes:
-        images.high: [480, 640, 3]
-        images.low: [480, 640, 3]
-    cameras_ports:
-        images.high: /dev/video6
-        images.low: /dev/video0
--- a/examples/real_robot_example/train_config/env/gym_real_world_debug.yaml
+++ b/examples/real_robot_example/train_config/env/gym_real_world_debug.yaml
@@ -1,19 +0,0 @@
-# @package _global_
-
-fps: 30
-
-env:
-  name: real_world
-  task: RealEnv-v0
-  state_dim: 6
-  action_dim: 6
-  fps: ${fps}
-  episode_length: 200
-  real_world: true
-  gym:
-    cameras_shapes:
-        images.top: [480, 640, 3]
-        images.front: [480, 640, 3]
-    cameras_ports:
-        images.top: /dev/video6
-        images.front: /dev/video0
--- a/examples/real_robot_example/train_config/policy/act_real_world_debug.yaml
+++ b/examples/real_robot_example/train_config/policy/act_real_world_debug.yaml
@@ -1,103 +0,0 @@
-# @package _global_
-
-# Use `act_real.yaml` to train on real-world Aloha/Aloha2 datasets.
-# Compared to `act.yaml`, it contains 4 cameras (i.e. right_wrist, left_wrist, images,
-# front) instead of 1 camera (i.e. top). Also, `training.eval_freq` is set to -1. This config is used
-# to evaluate checkpoints at a certain frequency of training steps. When it is set to -1, it deactivates evaluation.
-# This is because real-world evaluation is done through [dora-lerobot](https://github.com/dora-rs/dora-lerobot).
-# Look at its README for more information on how to evaluate a checkpoint in the real-world.
-#
-# Example of usage for training:
-# ```bash
-# python lerobot/scripts/train.py \
-#   policy=act_real \
-#   env=aloha_real
-# ```
-
-seed: 1000
-dataset_repo_id: ???
-
-override_dataset_stats:
-  observation.images.top:
-    # stats from imagenet, since we use a pretrained vision model
-    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
-    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
-  observation.images.front:
-    # stats from imagenet, since we use a pretrained vision model
-    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
-    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
-
-training:
-  offline_steps: 1000
-  online_steps: 0
-  eval_freq: -1
-  save_freq: 1000
-  log_freq: 100
-  save_checkpoint: true
-
-  batch_size: 8
-  lr: 1e-5
-  lr_backbone: 1e-5
-  weight_decay: 1e-4
-  grad_clip_norm: 10
-  online_steps_between_rollouts: 1
-
-  delta_timestamps:
-    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"
-
-eval:
-  n_episodes: 1
-  batch_size: 1
-
-# See `configuration_act.py` for more details.
-policy:
-  name: act
-
-  # Input / output structure.
-  n_obs_steps: 1
-  chunk_size: 100 # chunk_size
-  n_action_steps: 100
-
-  input_shapes:
-    # TODO(rcadene, alexander-soare): add variables for height and width from the dataset/env?
-    observation.images.top: [3, 480, 640]
-    observation.images.front: [3, 480, 640]
-    observation.state: ["${env.state_dim}"]
-  output_shapes:
-    action: ["${env.action_dim}"]
-
-  # Normalization / Unnormalization
-  input_normalization_modes:
-    observation.images.top: mean_std
-    observation.images.front: mean_std
-    observation.state: mean_std
-  output_normalization_modes:
-    action: mean_std
-
-  # Architecture.
-  # Vision backbone.
-  vision_backbone: resnet18
-  pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
-  replace_final_stride_with_dilation: false
-  # Transformer layers.
-  pre_norm: false
-  dim_model: 512
-  n_heads: 8
-  dim_feedforward: 3200
-  feedforward_activation: relu
-  n_encoder_layers: 4
-  # Note: Although the original ACT implementation has 7 for `n_decoder_layers`, there is a bug in the code
-  # that means only the first layer is used. Here we match the original implementation by setting this to 1.
-  # See this issue https://github.com/tonyzhaozh/act/issues/25#issue-2258740521.
-  n_decoder_layers: 1
-  # VAE.
-  use_vae: true
-  latent_dim: 32
-  n_vae_encoder_layers: 4
-
-  # Inference.
-  temporal_ensemble_momentum: null
-
-  # Training and loss computation.
-  dropout: 0.1
-  kl_weight: 10.0
--- a/lerobot/common/datasets/factory.py
+++ b/lerobot/common/datasets/factory.py
@@ -56,7 +56,7 @@ def make_dataset(cfg, split: str = "train") -> LeRobotDataset | MultiLeRobotData
        )

    # A soft check to warn if the environment matches the dataset. Don't check if we are using a real world env (dora).
-    if not cfg.env.real_world:
+    if cfg.env.name != "dora":
        if isinstance(cfg.dataset_repo_id, str):
            dataset_repo_ids = [cfg.dataset_repo_id]  # single dataset
        else:
--- a/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
@@ -43,6 +43,9 @@ def get_cameras(hdf5_data):


 def check_format(raw_dir) -> bool:
+    # only frames from simulation are uncompressed
+    compressed_images = "sim" not in raw_dir.name
+
    hdf5_paths = list(raw_dir.glob("episode_*.hdf5"))
    assert len(hdf5_paths) != 0
    for hdf5_path in hdf5_paths:
@@ -59,15 +62,17 @@ def check_format(raw_dir) -> bool:
            for camera in get_cameras(data):
                assert num_frames == data[f"/observations/images/{camera}"].shape[0]

-                # ndim 2 when image are compressed and 4 when uncompressed
-                assert data[f"/observations/images/{camera}"].ndim in [2, 4]
-                if data[f"/observations/images/{camera}"].ndim == 4:
+                if compressed_images:
+                    assert data[f"/observations/images/{camera}"].ndim == 2
+                else:
+                    assert data[f"/observations/images/{camera}"].ndim == 4
                    b, h, w, c = data[f"/observations/images/{camera}"].shape
                    assert c < h and c < w, f"Expect (h,w,c) image format but ({h=},{w=},{c=}) provided."


 def load_from_raw(raw_dir, out_dir, fps, video, debug):
    # only frames from simulation are uncompressed
+    compressed_images = "sim" not in raw_dir.name

    hdf5_files = list(raw_dir.glob("*.hdf5"))
    ep_dicts = []
@@ -94,7 +99,7 @@ def load_from_raw(raw_dir, out_dir, fps, video, debug):
            for camera in get_cameras(ep):
                img_key = f"observation.images.{camera}"

-                if ep[f"/observations/images/{camera}"].ndim == 2:
+                if compressed_images:
                    import cv2

                    # load one compressed image after the other in RAM and uncompress
--- a/lerobot/common/datasets/push_dataset_to_hub/reachy2_hdf5_format.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/reachy2_hdf5_format.py
@@ -0,0 +1,189 @@
+#!/usr/bin/env python
+
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Contains utilities to process raw data format of HDF5 files like in: https://github.com/tonyzhaozh/act
+"""
+
+import gc
+import re
+import shutil
+from pathlib import Path
+
+import h5py
+import torch
+import tqdm
+from datasets import Dataset, Features, Image, Sequence, Value
+
+from lerobot.common.datasets.push_dataset_to_hub.utils import concatenate_episodes
+from lerobot.common.datasets.utils import (
+    hf_transform_to_torch,
+)
+from lerobot.common.datasets.video_utils import VideoFrame
+
+
+def get_cameras(hdf5_data):
+    # ignore depth channel, not currently handled
+    # TODO(rcadene): add depth
+    rgb_cameras = [key for key in hdf5_data["/observations/images_ids"].keys() if "depth" not in key]  # noqa: SIM118
+    return rgb_cameras
+
+
+def check_format(raw_dir) -> bool:
+    hdf5_paths = list(raw_dir.glob("episode_*.hdf5"))
+    assert len(hdf5_paths) != 0
+    for hdf5_path in hdf5_paths:
+        with h5py.File(hdf5_path, "r") as data:
+            assert "/action" in data
+            assert "/observations/qpos" in data
+
+            assert data["/action"].ndim == 2
+            assert data["/observations/qpos"].ndim == 2
+
+            num_frames = data["/action"].shape[0]
+            assert num_frames == data["/observations/qpos"].shape[0]
+
+            for camera in get_cameras(data):
+                assert num_frames == data[f"/observations/images_ids/{camera}"].shape[0]
+                assert (raw_dir / hdf5_path.name.replace(".hdf5", f"_{camera}.mp4")).exists()
+
+                # assert data[f"/observations/images_ids/{camera}"].ndim == 4
+                # b, h, w, c = data[f"/observations/images_ids/{camera}"].shape
+                # assert c < h and c < w, f"Expect (h,w,c) image format but ({h=},{w=},{c=}) provided."
+
+
+def load_from_raw(raw_dir, out_dir, fps, video, debug):
+    hdf5_files = list(raw_dir.glob("*.hdf5"))
+    ep_dicts = []
+    episode_data_index = {"from": [], "to": []}
+
+    id_from = 0
+    for ep_idx, ep_path in tqdm.tqdm(enumerate(hdf5_files), total=len(hdf5_files)):
+        match = re.search(r"_(\d+).hdf5", ep_path.name)
+        if not match:
+            raise ValueError(ep_path.name)
+        raw_ep_idx = int(match.group(1))
+
+        with h5py.File(ep_path, "r") as ep:
+            num_frames = ep["/action"].shape[0]
+
+            # last step of demonstration is considered done
+            done = torch.zeros(num_frames, dtype=torch.bool)
+            done[-1] = True
+
+            state = torch.from_numpy(ep["/observations/qpos"][:])
+            action = torch.from_numpy(ep["/action"][:])
+            if "/observations/qvel" in ep:
+                velocity = torch.from_numpy(ep["/observations/qvel"][:])
+            if "/observations/effort" in ep:
+                effort = torch.from_numpy(ep["/observations/effort"][:])
+
+            ep_dict = {}
+
+            videos_dir = out_dir / "videos"
+            videos_dir.mkdir(parents=True, exist_ok=True)
+
+            for camera in get_cameras(ep):
+                img_key = f"observation.images.{camera}"
+
+                raw_fname = f"episode_{raw_ep_idx}_{camera}.mp4"
+                new_fname = f"{img_key}_episode_{ep_idx:06d}.mp4"
+                shutil.copy(str(raw_dir / raw_fname), str(videos_dir / new_fname))
+
+                # store the reference to the video frame
+                ep_dict[img_key] = [
+                    {"path": f"videos/{new_fname}", "timestamp": i / fps} for i in range(num_frames)
+                ]
+
+            ep_dict["observation.state"] = state
+            if "/observations/velocity" in ep:
+                ep_dict["observation.velocity"] = velocity
+            if "/observations/effort" in ep:
+                ep_dict["observation.effort"] = effort
+            ep_dict["action"] = action
+            ep_dict["episode_index"] = torch.tensor([ep_idx] * num_frames)
+            ep_dict["frame_index"] = torch.arange(0, num_frames, 1)
+            ep_dict["timestamp"] = torch.arange(0, num_frames, 1) / fps
+            ep_dict["next.done"] = done
+            # TODO(rcadene): add reward and success by computing them in sim
+
+            assert isinstance(ep_idx, int)
+            ep_dicts.append(ep_dict)
+
+            episode_data_index["from"].append(id_from)
+            episode_data_index["to"].append(id_from + num_frames)
+
+        id_from += num_frames
+
+        gc.collect()
+
+        # process first episode only
+        if debug:
+            break
+
+    data_dict = concatenate_episodes(ep_dicts)
+    return data_dict, episode_data_index
+
+
+def to_hf_dataset(data_dict, video) -> Dataset:
+    features = {}
+
+    keys = [key for key in data_dict if "observation.images." in key]
+    for key in keys:
+        if video:
+            features[key] = VideoFrame()
+        else:
+            features[key] = Image()
+
+    features["observation.state"] = Sequence(
+        length=data_dict["observation.state"].shape[1], feature=Value(dtype="float32", id=None)
+    )
+    if "observation.velocity" in data_dict:
+        features["observation.velocity"] = Sequence(
+            length=data_dict["observation.velocity"].shape[1], feature=Value(dtype="float32", id=None)
+        )
+    if "observation.effort" in data_dict:
+        features["observation.effort"] = Sequence(
+            length=data_dict["observation.effort"].shape[1], feature=Value(dtype="float32", id=None)
+        )
+    features["action"] = Sequence(
+        length=data_dict["action"].shape[1], feature=Value(dtype="float32", id=None)
+    )
+    features["episode_index"] = Value(dtype="int64", id=None)
+    features["frame_index"] = Value(dtype="int64", id=None)
+    features["timestamp"] = Value(dtype="float32", id=None)
+    features["next.done"] = Value(dtype="bool", id=None)
+    features["index"] = Value(dtype="int64", id=None)
+
+    hf_dataset = Dataset.from_dict(data_dict, features=Features(features))
+    hf_dataset.set_transform(hf_transform_to_torch)
+    return hf_dataset
+
+
+def from_raw_to_lerobot_format(raw_dir: Path, out_dir: Path, fps=None, video=True, debug=False):
+    # sanity check
+    check_format(raw_dir)
+
+    if fps is None:
+        fps = 30
+
+    data_dir, episode_data_index = load_from_raw(raw_dir, out_dir, fps, video, debug)
+    hf_dataset = to_hf_dataset(data_dir, video)
+
+    info = {
+        "fps": fps,
+        "video": video,
+    }
+    return hf_dataset, episode_data_index, info
--- a/lerobot/common/datasets/push_dataset_to_hub/utils.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/utils.py
@@ -21,24 +21,19 @@ import PIL
 import torch


-def concatenate_episodes(ep_dicts, drop_episodes_last_frame=False):
+def concatenate_episodes(ep_dicts):
    data_dict = {}

    keys = ep_dicts[0].keys()
    for key in keys:
        if torch.is_tensor(ep_dicts[0][key][0]):
-            if drop_episodes_last_frame:
-                data_dict[key] = torch.cat([ep_dict[key][:-1] for ep_dict in ep_dicts])
-            else:
-                data_dict[key] = torch.cat([ep_dict[key] for ep_dict in ep_dicts])
+            data_dict[key] = torch.cat([ep_dict[key] for ep_dict in ep_dicts])
        else:
            if key not in data_dict:
                data_dict[key] = []
            for ep_dict in ep_dicts:
                for x in ep_dict[key]:
                    data_dict[key].append(x)
-                if drop_episodes_last_frame:
-                    data_dict[key].pop()

    total_frames = data_dict["frame_index"].shape[0]
    data_dict["index"] = torch.arange(0, total_frames, 1)
--- a/lerobot/common/envs/utils.py
+++ b/lerobot/common/envs/utils.py
@@ -29,12 +29,10 @@ def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Ten
    # map to expected inputs for the policy
    return_observations = {}

-    if "pixels" in observations and isinstance(observations["pixels"], dict):
+    if isinstance(observations["pixels"], dict):
        imgs = {f"observation.images.{key}": img for key, img in observations["pixels"].items()}
-    elif "pixels" in observations and isinstance(observations["pixels"], np.ndarray):
-        imgs = {"observation.image": observations["pixels"]}
    else:
-        imgs = {f"observation.{key}": img for key, img in observations.items() if "images" in key}
+        imgs = {"observation.image": observations["pixels"]}

    for imgkey, img in imgs.items():
        img = torch.from_numpy(img)
--- a/lerobot/common/logger.py
+++ b/lerobot/common/logger.py
@@ -233,9 +233,6 @@ class Logger:
        if self._wandb is not None:
            for k, v in d.items():
                if not isinstance(v, (int, float, str)):
-                    logging.warning(
-                        f'WandB logging of key "{k}" was ignored as its type is not handled by this wrapper.'
-                    )
                    continue
                self._wandb.log({f"{mode}/{k}": v}, step=step)

--- a/lerobot/common/policies/act/configuration_act.py
+++ b/lerobot/common/policies/act/configuration_act.py
@@ -129,9 +129,7 @@ class ACTConfig:
    # Note: Although the original ACT implementation has 7 for `n_decoder_layers`, there is a bug in the code
    # that means only the first layer is used. Here we match the original implementation by setting this to 1.
    # See this issue https://github.com/tonyzhaozh/act/issues/25#issue-2258740521.
-    # As a consequence we also remove the final, unused layer normalization, by default
    n_decoder_layers: int = 1
-    decoder_norm: bool = False
    # VAE.
    use_vae: bool = True
    latent_dim: int = 32
--- a/lerobot/common/policies/act/modeling_act.py
+++ b/lerobot/common/policies/act/modeling_act.py
@@ -139,25 +139,26 @@ class ACTPolicy(nn.Module, PyTorchModelHubMixin):
        batch = self.normalize_targets(batch)
        actions_hat, (mu_hat, log_sigma_x2_hat) = self.model(batch)

-        l1_loss = (
-            F.l1_loss(batch["action"], actions_hat, reduction="none") * ~batch["action_is_pad"].unsqueeze(-1)
-        ).mean()
+        bsize = actions_hat.shape[0]
+        l1_loss = F.l1_loss(batch["action"], actions_hat, reduction="none")
+        l1_loss = l1_loss * ~batch["action_is_pad"].unsqueeze(-1)
+        l1_loss = l1_loss.view(bsize, -1).mean(dim=1)
+
+        out_dict = {}
+        out_dict["l1_loss"] = l1_loss

-        loss_dict = {"l1_loss": l1_loss.item()}
        if self.config.use_vae:
            # Calculate Dₖₗ(latent_pdf || standard_normal). Note: After computing the KL-divergence for
            # each dimension independently, we sum over the latent dimension to get the total
            # KL-divergence per batch element, then take the mean over the batch.
            # (See App. B of https://arxiv.org/abs/1312.6114 for more details).
-            mean_kld = (
-                (-0.5 * (1 + log_sigma_x2_hat - mu_hat.pow(2) - (log_sigma_x2_hat).exp())).sum(-1).mean()
-            )
-            loss_dict["kld_loss"] = mean_kld.item()
-            loss_dict["loss"] = l1_loss + mean_kld * self.config.kl_weight
+            kld_loss = (-0.5 * (1 + log_sigma_x2_hat - mu_hat.pow(2) - (log_sigma_x2_hat).exp())).sum(-1)
+            out_dict["loss"] = l1_loss + kld_loss * self.config.kl_weight
        else:
-            loss_dict["loss"] = l1_loss
+            out_dict["loss"] = l1_loss

-        return loss_dict
+        out_dict["action"] = self.unnormalize_outputs({"action": actions_hat})["action"]
+        return out_dict


 class ACT(nn.Module):
@@ -315,14 +316,8 @@ class ACT(nn.Module):
            pos_embed = self.vae_encoder_pos_enc.clone().detach()  # (1, S+2, D)

            # Forward pass through VAE encoder to get the latent PDF parameters.
-            cls_joint_is_pad = torch.full((batch_size, 2), False).to(
-                batch["observation.state"].device
-            )  # False: not a padding
-            key_padding_mask = torch.cat([cls_joint_is_pad, batch["action_is_pad"]], axis=1)  # (bs, seq+1)
            cls_token_out = self.vae_encoder(
-                vae_encoder_input.permute(1, 0, 2),
-                pos_embed=pos_embed.permute(1, 0, 2),
-                key_padding_mask=key_padding_mask,
+                vae_encoder_input.permute(1, 0, 2), pos_embed=pos_embed.permute(1, 0, 2)
            )[0]  # select the class token, with shape (B, D)
            latent_pdf_params = self.vae_encoder_latent_output_proj(cls_token_out)
            mu = latent_pdf_params[:, : self.config.latent_dim]
@@ -408,11 +403,9 @@ class ACTEncoder(nn.Module):
        self.layers = nn.ModuleList([ACTEncoderLayer(config) for _ in range(config.n_encoder_layers)])
        self.norm = nn.LayerNorm(config.dim_model) if config.pre_norm else nn.Identity()

-    def forward(
-        self, x: Tensor, pos_embed: Tensor | None = None, key_padding_mask: Tensor | None = None
-    ) -> Tensor:
+    def forward(self, x: Tensor, pos_embed: Tensor | None = None) -> Tensor:
        for layer in self.layers:
-            x = layer(x, pos_embed=pos_embed, key_padding_mask=key_padding_mask)
+            x = layer(x, pos_embed=pos_embed)
        x = self.norm(x)
        return x

@@ -435,14 +428,12 @@ class ACTEncoderLayer(nn.Module):
        self.activation = get_activation_fn(config.feedforward_activation)
        self.pre_norm = config.pre_norm

-    def forward(self, x, pos_embed: Tensor | None = None, key_padding_mask: Tensor | None = None) -> Tensor:
+    def forward(self, x, pos_embed: Tensor | None = None) -> Tensor:
        skip = x
        if self.pre_norm:
            x = self.norm1(x)
        q = k = x if pos_embed is None else x + pos_embed
-        x = self.self_attn(q, k, value=x, key_padding_mask=key_padding_mask)[
-            0
-        ]  # select just the output, not the attention weights
+        x = self.self_attn(q, k, value=x)[0]  # select just the output, not the attention weights
        x = skip + self.dropout1(x)
        if self.pre_norm:
            skip = x
@@ -462,10 +453,7 @@ class ACTDecoder(nn.Module):
        """Convenience module for running multiple decoder layers followed by normalization."""
        super().__init__()
        self.layers = nn.ModuleList([ACTDecoderLayer(config) for _ in range(config.n_decoder_layers)])
-        if config.decoder_norm:
-            self.norm = nn.LayerNorm(config.dim_model)
-        else:
-            self.norm = nn.Identity()
+        self.norm = nn.LayerNorm(config.dim_model)

    def forward(
        self,
@@ -478,7 +466,8 @@ class ACTDecoder(nn.Module):
            x = layer(
                x, encoder_out, decoder_pos_embed=decoder_pos_embed, encoder_pos_embed=encoder_pos_embed
            )
-        x = self.norm(x)
+        if self.norm is not None:
+            x = self.norm(x)
        return x


--- a/lerobot/common/policies/normalize.py
+++ b/lerobot/common/policies/normalize.py
@@ -147,7 +147,7 @@ class Normalize(nn.Module):
                assert not torch.isinf(min).any(), _no_stats_error_str("min")
                assert not torch.isinf(max).any(), _no_stats_error_str("max")
                # normalize to [0,1]
-                batch[key] = (batch[key] - min) / (max - min + 1e-8)
+                batch[key] = (batch[key] - min) / (max - min)
                # normalize to [-1, 1]
                batch[key] = batch[key] * 2 - 1
            else:
--- a/lerobot/configs/default.yaml
+++ b/lerobot/configs/default.yaml
@@ -50,8 +50,6 @@ eval:
  batch_size: 1
  # `use_async_envs` specifies whether to use asynchronous environments (multiprocessing).
  use_async_envs: false
-  # Specify the number of episodes to render during evaluation.
-  max_episodes_rendered: 10

 wandb:
  enable: false
--- a/lerobot/configs/env/aloha.yaml
+++ b/lerobot/configs/env/aloha.yaml
@@ -9,7 +9,6 @@ env:
  action_dim: 14
  fps: ${fps}
  episode_length: 400
-  real_world: false
  gym:
    obs_type: pixels_agent_pos
    render_mode: rgb_array
--- a/lerobot/configs/env/dora_aloha_real.yaml
+++ b/lerobot/configs/env/dora_aloha_real.yaml
@@ -9,6 +9,5 @@ env:
  action_dim: 14
  fps: ${fps}
  episode_length: 400
-  real_world: true
  gym:
    fps: ${fps}
--- a/lerobot/configs/env/dora_reachy2_real.yaml
+++ b/lerobot/configs/env/dora_reachy2_real.yaml
@@ -0,0 +1,13 @@
+# @package _global_
+
+fps: 30
+
+env:
+  name: dora
+  task: DoraReachy2-v0
+  state_dim: 22
+  action_dim: 22
+  fps: ${fps}
+  episode_length: 400
+  gym:
+    fps: ${fps}
--- a/lerobot/configs/env/pusht.yaml
+++ b/lerobot/configs/env/pusht.yaml
@@ -10,7 +10,6 @@ env:
  action_dim: 2
  fps: ${fps}
  episode_length: 300
-  real_world: false
  gym:
    obs_type: pixels_agent_pos
    render_mode: rgb_array
--- a/lerobot/configs/env/xarm.yaml
+++ b/lerobot/configs/env/xarm.yaml
@@ -10,7 +10,6 @@ env:
  action_dim: 4
  fps: ${fps}
  episode_length: 25
-  real_world: false
  gym:
    obs_type: pixels_agent_pos
    render_mode: rgb_array
--- a/lerobot/configs/policy/act.yaml
+++ b/lerobot/configs/policy/act.yaml
@@ -25,7 +25,7 @@ training:
  online_steps_between_rollouts: 1

  delta_timestamps:
-    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"

 eval:
  n_episodes: 50
--- a/examples/real_robot_example/train_config/policy/act_real_world.yaml
+++ b/examples/real_robot_example/train_config/policy/act_real_world.yaml
@@ -1,8 +1,8 @@
 # @package _global_

 # Use `act_real.yaml` to train on real-world Aloha/Aloha2 datasets.
-# Compared to `act.yaml`, it contains 4 cameras (i.e. right_wrist, left_wrist, images,
-# low) instead of 1 camera (i.e. top). Also, `training.eval_freq` is set to -1. This config is used
+# Compared to `act.yaml`, it contains 4 cameras (i.e. cam_right_wrist, cam_left_wrist, images,
+# cam_low) instead of 1 camera (i.e. top). Also, `training.eval_freq` is set to -1. This config is used
 # to evaluate checkpoints at a certain frequency of training steps. When it is set to -1, it deactivates evaluation.
 # This is because real-world evaluation is done through [dora-lerobot](https://github.com/dora-rs/dora-lerobot).
 # Look at its README for more information on how to evaluate a checkpoint in the real-world.
@@ -11,27 +11,23 @@
 # ```bash
 # python lerobot/scripts/train.py \
 #   policy=act_real \
-#   env=aloha_real
+#   env=dora_aloha_real
 # ```

 seed: 1000
-dataset_repo_id: ???
+dataset_repo_id: cadene/reachy2_teleop_remi

 override_dataset_stats:
-  observation.images.high:
-    # stats from imagenet, since we use a pretrained vision model
-    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
-    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
-  observation.images.low:
+  observation.images.cam_trunk:
    # stats from imagenet, since we use a pretrained vision model
    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)

 training:
-  offline_steps: 1000
+  offline_steps: 80000
  online_steps: 0
  eval_freq: -1
-  save_freq: 1000
+  save_freq: 10000
  log_freq: 100
  save_checkpoint: true

@@ -46,8 +42,8 @@ training:
    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"

 eval:
-  n_episodes: 1
-  batch_size: 1
+  n_episodes: 50
+  batch_size: 50

 # See `configuration_act.py` for more details.
 policy:
@@ -60,16 +56,14 @@ policy:

  input_shapes:
    # TODO(rcadene, alexander-soare): add variables for height and width from the dataset/env?
-    observation.images.high: [3, 480, 640]
-    observation.images.low: [3, 480, 640]
+    observation.images.cam_trunk: [3, 800, 1280]
    observation.state: ["${env.state_dim}"]
  output_shapes:
    action: ["${env.action_dim}"]

  # Normalization / Unnormalization
  input_normalization_modes:
-    observation.images.high: mean_std
-    observation.images.low: mean_std
+    observation.images.cam_trunk: mean_std
    observation.state: mean_std
  output_normalization_modes:
    action: mean_std
--- a/lerobot/configs/policy/act_real.yaml
+++ b/lerobot/configs/policy/act_real.yaml
@@ -51,7 +51,7 @@ training:
  online_steps_between_rollouts: 1

  delta_timestamps:
-    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"

 eval:
  n_episodes: 50
--- a/lerobot/configs/policy/act_real_no_state.yaml
+++ b/lerobot/configs/policy/act_real_no_state.yaml
@@ -49,7 +49,7 @@ training:
  online_steps_between_rollouts: 1

  delta_timestamps:
-    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+    action: "[i / ${fps} for i in range(1, ${policy.chunk_size} + 1)]"

 eval:
  n_episodes: 50
--- a/lerobot/scripts/eval.py
+++ b/lerobot/scripts/eval.py
@@ -44,7 +44,6 @@ https://huggingface.co/lerobot/diffusion_pusht/tree/main.
 import argparse
 import json
 import logging
-import os
 import threading
 import time
 from contextlib import nullcontext
@@ -165,10 +164,7 @@ def rollout(
        # VectorEnv stores is_success in `info["final_info"][env_index]["is_success"]`. "final_info" isn't
        # available of none of the envs finished.
        if "final_info" in info:
-            successes = [
-                i["is_success"] if (i is not None and "is_success" in i) else False
-                for i in info["final_info"]
-            ]
+            successes = [info["is_success"] if info is not None else False for info in info["final_info"]]
        else:
            successes = [False] * env.num_envs

@@ -520,7 +516,6 @@ def eval(
    out_dir = (
        f"outputs/eval/{dt.now().strftime('%Y-%m-%d/%H-%M-%S')}_{hydra_cfg.env.name}_{hydra_cfg.policy.name}"
    )
-    os.makedirs(out_dir, exist_ok=True)

    if out_dir is None:
        raise NotImplementedError()
@@ -550,7 +545,7 @@ def eval(
            env,
            policy,
            hydra_cfg.eval.n_episodes,
-            max_episodes_rendered=hydra_cfg.eval.max_episodes_rendered,
+            max_episodes_rendered=10,
            video_dir=Path(out_dir) / "eval",
            start_seed=hydra_cfg.seed,
            enable_progbar=True,
--- a/lerobot/scripts/push_dataset_to_hub.py
+++ b/lerobot/scripts/push_dataset_to_hub.py
@@ -86,6 +86,8 @@ def get_from_raw_to_lerobot_format_fn(raw_format):
        from lerobot.common.datasets.push_dataset_to_hub.aloha_hdf5_format import from_raw_to_lerobot_format
    elif raw_format == "aloha_dora":
        from lerobot.common.datasets.push_dataset_to_hub.aloha_dora_format import from_raw_to_lerobot_format
+    elif raw_format == "reachy2_hdf5":
+        from lerobot.common.datasets.push_dataset_to_hub.reachy2_hdf5_format import from_raw_to_lerobot_format
    elif raw_format == "xarm_pkl":
        from lerobot.common.datasets.push_dataset_to_hub.xarm_pkl_format import from_raw_to_lerobot_format
    else:
--- a/lerobot/scripts/train.py
+++ b/lerobot/scripts/train.py
@@ -107,7 +107,7 @@ def update_policy(
    with torch.autocast(device_type=device.type) if use_amp else nullcontext():
        output_dict = policy.forward(batch)
        # TODO(rcadene): policy.unnormalize_outputs(out_dict)
-        loss = output_dict["loss"]
+        loss = output_dict["loss"].mean()
    grad_scaler.scale(loss).backward()

    # Unscale the graident of the optimzer's assigned params in-place **prior to gradient clipping**.
@@ -406,8 +406,7 @@ def train(cfg: DictConfig, out_dir: str | None = None, job_name: str | None = No

        step += 1

-    if cfg.training.eval_freq > 0:
-        eval_env.close()
+    eval_env.close()
    logging.info("End of training")


--- a/lerobot/scripts/visualize_dataset.py
+++ b/lerobot/scripts/visualize_dataset.py
@@ -30,48 +30,46 @@ Examples:
 - Visualize data stored on a local machine:
 ```
 local$ python lerobot/scripts/visualize_dataset.py \
-    --repo-id lerobot/pusht \
-    --episode-index 0
+    --repo-id lerobot/pusht
+
+local$ open http://localhost:9090
 ```

 - Visualize data stored on a distant machine with a local viewer:
 ```
 distant$ python lerobot/scripts/visualize_dataset.py \
+    --repo-id lerobot/pusht
+
+local$ ssh -L 9090:localhost:9090 distant  # create a ssh tunnel
+local$ open http://localhost:9090
+```
+
+- Select episodes to visualize:
+```
+python lerobot/scripts/visualize_dataset.py \
    --repo-id lerobot/pusht \
-    --episode-index 0 \
-    --save 1 \
-    --output-dir path/to/directory
-
-local$ scp distant:path/to/directory/lerobot_pusht_episode_0.rrd .
-local$ rerun lerobot_pusht_episode_0.rrd
+    --episode-indices 7 3 5 1 4
 ```
-
- Visualize data stored on a distant machine through streaming:
-(You need to forward the websocket port to the distant machine, with
-`ssh -L 9087:localhost:9087 username@remote-host`)
-```
-distant$ python lerobot/scripts/visualize_dataset.py \
-    --repo-id lerobot/pusht \
-    --episode-index 0 \
-    --mode distant \
-    --ws-port 9087
-
-local$ rerun ws://localhost:9087
-```
-
 """

 import argparse
-import gc
+import http.server
 import logging
-import time
+import os
+import shutil
+import socketserver
 from pathlib import Path

-import rerun as rr
 import torch
 import tqdm
+import yaml
+from bs4 import BeautifulSoup
+from huggingface_hub import snapshot_download
+from safetensors.torch import load_file, save_file

 from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
+from lerobot.common.policies.act.modeling_act import ACTPolicy
+from lerobot.common.utils.utils import init_logging


 class EpisodeSampler(torch.utils.data.Sampler):
@@ -87,33 +85,307 @@ class EpisodeSampler(torch.utils.data.Sampler):
        return len(self.frame_ids)


-def to_hwc_uint8_numpy(chw_float32_torch):
-    assert chw_float32_torch.dtype == torch.float32
-    assert chw_float32_torch.ndim == 3
-    c, h, w = chw_float32_torch.shape
-    assert c < h and c < w, f"expect channel first images, but instead {chw_float32_torch.shape}"
-    hwc_uint8_numpy = (chw_float32_torch * 255).type(torch.uint8).permute(1, 2, 0).numpy()
-    return hwc_uint8_numpy
+class NoCacheHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
+    def end_headers(self):
+        self.send_header("Cache-Control", "no-store, no-cache, must-revalidate")
+        self.send_header("Pragma", "no-cache")
+        self.send_header("Expires", "0")
+        super().end_headers()


-def visualize_dataset(
-    repo_id: str,
-    episode_index: int,
-    batch_size: int = 32,
-    num_workers: int = 0,
-    mode: str = "local",
-    web_port: int = 9090,
-    ws_port: int = 9087,
-    save: bool = False,
-    output_dir: Path | None = None,
-) -> Path | None:
-    if save:
-        assert (
-            output_dir is not None
-        ), "Set an output directory where to write .rrd files with `--output-dir path/to/directory`."
+def run_server(path, port):
+    # Change directory to serve 'index.html` as front page
+    os.chdir(path)

-    logging.info("Loading dataset")
-    dataset = LeRobotDataset(repo_id)
+    with socketserver.TCPServer(("", port), NoCacheHTTPRequestHandler) as httpd:
+        logging.info(f"Serving HTTP on 0.0.0.0 port {port} (http://0.0.0.0:{port}/) ...")
+        httpd.serve_forever()
+
+
+def create_html_page(page_title: str):
+    """Create a html page with beautiful soop with default doctype, meta, header and title."""
+    soup = BeautifulSoup("", "html.parser")
+
+    doctype = soup.new_tag("!DOCTYPE html")
+    soup.append(doctype)
+
+    html = soup.new_tag("html", lang="en")
+    soup.append(html)
+
+    head = soup.new_tag("head")
+    html.append(head)
+
+    meta_charset = soup.new_tag("meta", charset="UTF-8")
+    head.append(meta_charset)
+
+    meta_viewport = soup.new_tag(
+        "meta", attrs={"name": "viewport", "content": "width=device-width, initial-scale=1.0"}
+    )
+    head.append(meta_viewport)
+
+    title = soup.new_tag("title")
+    title.string = page_title
+    head.append(title)
+
+    body = soup.new_tag("body")
+    html.append(body)
+
+    main_div = soup.new_tag("div")
+    body.append(main_div)
+    return soup, head, body
+
+
+def write_episode_data_csv(output_dir, file_name, episode_index, dataset, inference_results=None):
+    """Write a csv file containg timeseries data of an episode (e.g. state and action).
+    This file will be loaded by Dygraph javascript to plot data in real time."""
+    from_idx = dataset.episode_data_index["from"][episode_index]
+    to_idx = dataset.episode_data_index["to"][episode_index]
+
+    has_state = "observation.state" in dataset.hf_dataset.features
+    has_action = "action" in dataset.hf_dataset.features
+    has_inference = inference_results is not None
+
+    # init header of csv with state and action names
+    header = ["timestamp"]
+    if has_state:
+        dim_state = len(dataset.hf_dataset["observation.state"][0])
+        header += [f"state_{i}" for i in range(dim_state)]
+    if has_action:
+        dim_action = len(dataset.hf_dataset["action"][0])
+        header += [f"action_{i}" for i in range(dim_action)]
+    if has_inference:
+        assert "actions" in inference_results
+        assert "loss" in inference_results
+        dim_pred_action = inference_results["actions"].shape[2]
+        header += [f"pred_action_{i}" for i in range(dim_pred_action)]
+        header += ["loss"]
+
+    columns = ["timestamp"]
+    if has_state:
+        columns += ["observation.state"]
+    if has_action:
+        columns += ["action"]
+
+    rows = []
+    data = dataset.hf_dataset.select_columns(columns)
+    for i in range(from_idx, to_idx):
+        row = [data[i]["timestamp"].item()]
+        if has_state:
+            row += data[i]["observation.state"].tolist()
+        if has_action:
+            row += data[i]["action"].tolist()
+        rows.append(row)
+
+    if has_inference:
+        num_frames = len(rows)
+        assert num_frames == inference_results["actions"].shape[0]
+        assert num_frames == inference_results["loss"].shape[0]
+        for i in range(num_frames):
+            rows[i] += inference_results["actions"][i, 0].tolist()
+            rows[i] += [inference_results["loss"][i].item()]
+
+    output_dir.mkdir(parents=True, exist_ok=True)
+    with open(output_dir / file_name, "w") as f:
+        f.write(",".join(header) + "\n")
+        for row in rows:
+            row_str = [str(col) for col in row]
+            f.write(",".join(row_str) + "\n")
+
+
+def write_episode_data_js(output_dir, file_name, ep_csv_fname, dataset):
+    """Write a javascript file containing logic to synchronize camera feeds and timeseries."""
+    s = ""
+    s += "document.addEventListener('DOMContentLoaded', function () {\n"
+    for i, key in enumerate(dataset.video_frame_keys):
+        s += f"  const video{i} = document.getElementById('video_{key}');\n"
+    s += "  const slider = document.getElementById('videoControl');\n"
+    s += "  const playButton = document.getElementById('playButton');\n"
+    s += f"  const dygraph = new Dygraph(document.getElementById('graph'), '{ep_csv_fname}', " + "{\n"
+    s += "    pixelsPerPoint: 0.01,\n"
+    s += "    legend: 'always',\n"
+    s += "    labelsDiv: document.getElementById('labels'),\n"
+    s += "    labelsSeparateLines: true,\n"
+    s += "    labelsKMB: true,\n"
+    s += "    highlightCircleSize: 1.5,\n"
+    s += "    highlightSeriesOpts: {\n"
+    s += "        strokeWidth: 1.5,\n"
+    s += "        strokeBorderWidth: 1,\n"
+    s += "        highlightCircleSize: 3\n"
+    s += "    }\n"
+    s += "  });\n"
+    s += "\n"
+    s += "  // Function to play both videos\n"
+    s += "  playButton.addEventListener('click', function () {\n"
+    for i in range(len(dataset.video_frame_keys)):
+        s += f"    video{i}.play();\n"
+    s += "    // playButton.disabled = true; // Optional: disable button after playing\n"
+    s += "  });\n"
+    s += "\n"
+    s += "  // Update the video time when the slider value changes\n"
+    s += "  slider.addEventListener('input', function () {\n"
+    s += "    const sliderValue = slider.value;\n"
+    for i in range(len(dataset.video_frame_keys)):
+        s += f"    const time{i} = (video{i}.duration * sliderValue) / 100;\n"
+    for i in range(len(dataset.video_frame_keys)):
+        s += f"    video{i}.currentTime = time{i};\n"
+    s += "  });\n"
+    s += "\n"
+    s += "  // Synchronize slider with the video's current time\n"
+    s += "  const syncSlider = (video) => {\n"
+    s += "    video.addEventListener('timeupdate', function () {\n"
+    s += "      if (video.duration) {\n"
+    s += "        const pc = (100 / video.duration) * video.currentTime;\n"
+    s += "        slider.value = pc;\n"
+    s += "        const index = Math.floor(pc * dygraph.numRows() / 100);\n"
+    s += "        dygraph.setSelection(index, undefined, true, true);\n"
+    s += "      }\n"
+    s += "    });\n"
+    s += "  };\n"
+    s += "\n"
+    for i in range(len(dataset.video_frame_keys)):
+        s += f"  syncSlider(video{i});\n"
+    s += "\n"
+    s += "});\n"
+
+    output_dir.mkdir(parents=True, exist_ok=True)
+    with open(output_dir / file_name, "w", encoding="utf-8") as f:
+        f.write(s)
+
+
+def write_episode_data_html(output_dir, file_name, js_fname, ep_index, dataset):
+    """Write an html file containg video feeds and timeseries associated to an episode."""
+    soup, head, body = create_html_page("")
+
+    css_style = soup.new_tag("style")
+    css_style.string = ""
+    css_style.string += "#labels > span.highlight {\n"
+    css_style.string += "  border: 1px solid grey;\n"
+    css_style.string += "}"
+    head.append(css_style)
+
+    # Add videos from camera feeds
+
+    videos_control_div = soup.new_tag("div")
+    body.append(videos_control_div)
+
+    videos_div = soup.new_tag("div")
+    videos_control_div.append(videos_div)
+
+    def create_video(id, src):
+        video = soup.new_tag("video", id=id, width="320", height="240", controls="")
+        source = soup.new_tag("source", src=src, type="video/mp4")
+        video.string = "Your browser does not support the video tag."
+        video.append(source)
+        return video
+
+    # get first frame of episode (hack to get video_path of the episode)
+    first_frame_idx = dataset.episode_data_index["from"][ep_index].item()
+
+    for key in dataset.video_frame_keys:
+        # Example of video_path: 'videos/observation.image_episode_000004.mp4'
+        video_path = dataset.hf_dataset.select_columns(key)[first_frame_idx][key]["path"]
+        videos_div.append(create_video(f"video_{key}", video_path))
+
+    # Add controls for videos and graph
+
+    control_div = soup.new_tag("div")
+    videos_control_div.append(control_div)
+
+    button_div = soup.new_tag("div")
+    control_div.append(button_div)
+
+    button = soup.new_tag("button", id="playButton")
+    button.string = "Play Videos"
+    button_div.append(button)
+
+    slider_div = soup.new_tag("div")
+    control_div.append(slider_div)
+
+    slider = soup.new_tag("input", type="range", id="videoControl", min="0", max="100", value="0", step="1")
+    control_div.append(slider)
+
+    # Add graph of states/actions, and its labels
+
+    graph_labels_div = soup.new_tag("div", style="display: flex;")
+    body.append(graph_labels_div)
+
+    graph_div = soup.new_tag("div", id="graph", style="flex: 1; width: 85%")
+    graph_labels_div.append(graph_div)
+
+    labels_div = soup.new_tag("div", id="labels", style="flex: 1; width: 15%")
+    graph_labels_div.append(labels_div)
+
+    # add dygraph library
+    script = soup.new_tag("script", type="text/javascript", src=js_fname)
+    body.append(script)
+
+    script_dygraph = soup.new_tag(
+        "script",
+        type="text/javascript",
+        src="https://cdn.jsdelivr.net/npm/dygraphs@2.1.0/dist/dygraph.min.js",
+    )
+    body.append(script_dygraph)
+
+    link_dygraph = soup.new_tag(
+        "link", rel="stylesheet", href="https://cdn.jsdelivr.net/npm/dygraphs@2.1.0/dist/dygraph.min.css"
+    )
+    body.append(link_dygraph)
+
+    # Write as a html file
+
+    output_dir.mkdir(parents=True, exist_ok=True)
+    with open(output_dir / file_name, "w", encoding="utf-8") as f:
+        f.write(soup.prettify())
+
+
+def write_episodes_list_html(output_dir, file_name, ep_indices, ep_html_fnames, dataset):
+    """Write an html file containing information related to the dataset and a list of links to
+    html pages of episodes."""
+    soup, head, body = create_html_page("TODO")
+
+    h3 = soup.new_tag("h3")
+    h3.string = "TODO"
+    body.append(h3)
+
+    ul_info = soup.new_tag("ul")
+    body.append(ul_info)
+
+    li_info = soup.new_tag("li")
+    li_info.string = f"Number of samples/frames: {dataset.num_samples}"
+    ul_info.append(li_info)
+
+    li_info = soup.new_tag("li")
+    li_info.string = f"Number of episodes: {dataset.num_episodes}"
+    ul_info.append(li_info)
+
+    li_info = soup.new_tag("li")
+    li_info.string = f"Frames per second: {dataset.fps}"
+    ul_info.append(li_info)
+
+    # li_info = soup.new_tag("li")
+    # li_info.string = f"Size: {format_big_number(dataset.hf_dataset.info.size_in_bytes)}B"
+    # ul_info.append(li_info)
+
+    ul = soup.new_tag("ul")
+    body.append(ul)
+
+    for ep_idx, ep_html_fname in zip(ep_indices, ep_html_fnames, strict=False):
+        li = soup.new_tag("li")
+        ul.append(li)
+
+        a = soup.new_tag("a", href=ep_html_fname)
+        a.string = f"Episode number {ep_idx}"
+
+        li.append(a)
+
+    output_dir.mkdir(parents=True, exist_ok=True)
+    with open(output_dir / file_name, "w", encoding="utf-8") as f:
+        f.write(soup.prettify())
+
+
+def run_inference(dataset, episode_index, policy, num_workers=4, batch_size=32, device="cuda"):
+    policy.eval()
+    policy.to(device)

    logging.info("Loading dataloader")
    episode_sampler = EpisodeSampler(dataset, episode_index)
@@ -124,70 +396,104 @@ def visualize_dataset(
        sampler=episode_sampler,
    )

-    logging.info("Starting Rerun")
-
-    if mode not in ["local", "distant"]:
-        raise ValueError(mode)
-
-    spawn_local_viewer = mode == "local" and not save
-    rr.init(f"{repo_id}/episode_{episode_index}", spawn=spawn_local_viewer)
-
-    # Manually call python garbage collector after `rr.init` to avoid hanging in a blocking flush
-    # when iterating on a dataloader with `num_workers` > 0
-    # TODO(rcadene): remove `gc.collect` when rerun version 0.16 is out, which includes a fix
-    gc.collect()
-
-    if mode == "distant":
-        rr.serve(open_browser=False, web_port=web_port, ws_port=ws_port)
-
-    logging.info("Logging to Rerun")
-
+    logging.info("Running inference")
+    inference_results = {}
    for batch in tqdm.tqdm(dataloader, total=len(dataloader)):
-        # iterate over the batch
-        for i in range(len(batch["index"])):
-            rr.set_time_sequence("frame_index", batch["frame_index"][i].item())
-            rr.set_time_seconds("timestamp", batch["timestamp"][i].item())
+        batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
+        with torch.inference_mode():
+            output_dict = policy.forward(batch)

-            # display each camera image
-            for key in dataset.camera_keys:
-                # TODO(rcadene): add `.compress()`? is it lossless?
-                rr.log(key, rr.Image(to_hwc_uint8_numpy(batch[key][i])))
+        for key in output_dict:
+            if key not in inference_results:
+                inference_results[key] = []
+            inference_results[key].append(output_dict[key].to("cpu"))

-            # display each dimension of action space (e.g. actuators command)
-            if "action" in batch:
-                for dim_idx, val in enumerate(batch["action"][i]):
-                    rr.log(f"action/{dim_idx}", rr.Scalar(val.item()))
+    for key in inference_results:
+        inference_results[key] = torch.cat(inference_results[key])

-            # display each dimension of observed state space (e.g. agent position in joint space)
-            if "observation.state" in batch:
-                for dim_idx, val in enumerate(batch["observation.state"][i]):
-                    rr.log(f"state/{dim_idx}", rr.Scalar(val.item()))
+    return inference_results

-            if "next.done" in batch:
-                rr.log("next.done", rr.Scalar(batch["next.done"][i].item()))

-            if "next.reward" in batch:
-                rr.log("next.reward", rr.Scalar(batch["next.reward"][i].item()))
+def visualize_dataset(
+    repo_id: str,
+    episode_indices: list[int] = None,
+    output_dir: Path | None = None,
+    serve: bool = True,
+    port: int = 9090,
+    force_overwrite: bool = True,
+    policy_repo_id: str | None = None,
+    policy_ckpt_path: Path | None = None,
+    batch_size: int = 32,
+    num_workers: int = 4,
+) -> Path | None:
+    init_logging()

-            if "next.success" in batch:
-                rr.log("next.success", rr.Scalar(batch["next.success"][i].item()))
+    has_policy = policy_repo_id or policy_ckpt_path

-    if mode == "local" and save:
-        # save .rrd locally
-        output_dir = Path(output_dir)
-        output_dir.mkdir(parents=True, exist_ok=True)
-        repo_id_str = repo_id.replace("/", "_")
-        rrd_path = output_dir / f"{repo_id_str}_episode_{episode_index}.rrd"
-        rr.save(rrd_path)
-        return rrd_path
+    if has_policy:
+        logging.info("Loading policy")
+        if policy_repo_id:
+            pretrained_policy_path = Path(snapshot_download(policy_repo_id))
+        elif policy_ckpt_path:
+            pretrained_policy_path = Path(policy_ckpt_path)
+        policy = ACTPolicy.from_pretrained(pretrained_policy_path)
+        with open(pretrained_policy_path / "config.yaml") as f:
+            cfg = yaml.safe_load(f)
+        delta_timestamps = cfg["training"]["delta_timestamps"]
+    else:
+        delta_timestamps = None

-    elif mode == "distant":
-        # stop the process from exiting since it is serving the websocket connection
-        try:
-            while True:
-                time.sleep(1)
-        except KeyboardInterrupt:
-            print("Ctrl-C received. Exiting.")
+    logging.info("Loading dataset")
+    dataset = LeRobotDataset(repo_id, delta_timestamps=delta_timestamps)
+
+    if not dataset.video:
+        raise NotImplementedError(f"Image datasets ({dataset.video=}) are currently not supported.")
+
+    if output_dir is None:
+        output_dir = f"outputs/visualize_dataset/{repo_id}"
+
+    output_dir = Path(output_dir)
+    if force_overwrite and output_dir.exists():
+        shutil.rmtree(output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    # Create a simlink from the dataset video folder containg mp4 files to the output directory
+    # so that the http server can get access to the mp4 files.
+    ln_videos_dir = output_dir / "videos"
+    if not ln_videos_dir.exists():
+        ln_videos_dir.symlink_to(dataset.videos_dir.resolve())
+
+    if episode_indices is None:
+        episode_indices = list(range(dataset.num_episodes))
+
+    logging.info("Writing html")
+    ep_html_fnames = []
+    for episode_index in tqdm.tqdm(episode_indices):
+        inference_results = None
+        if has_policy:
+            inference_results_path = output_dir / f"episode_{episode_index}.safetensors"
+            if inference_results_path.exists():
+                inference_results = load_file(inference_results_path)
+            else:
+                inference_results = run_inference(dataset, episode_index, policy)
+                save_file(inference_results, inference_results_path)
+
+        # write states and actions in a csv
+        ep_csv_fname = f"episode_{episode_index}.csv"
+        write_episode_data_csv(output_dir, ep_csv_fname, episode_index, dataset, inference_results)
+
+        js_fname = f"episode_{episode_index}.js"
+        write_episode_data_js(output_dir, js_fname, ep_csv_fname, dataset)
+
+        # write a html page to view videos and timeseries
+        ep_html_fname = f"episode_{episode_index}.html"
+        write_episode_data_html(output_dir, ep_html_fname, js_fname, episode_index, dataset)
+        ep_html_fnames.append(ep_html_fname)
+
+    write_episodes_list_html(output_dir, "index.html", episode_indices, ep_html_fnames, dataset)
+
+    if serve:
+        run_server(output_dir, port)


 def main():
@@ -197,13 +503,51 @@ def main():
        "--repo-id",
        type=str,
        required=True,
-        help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht`).",
+        help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht` for https://huggingface.co/datasets/lerobot/pusht).",
    )
    parser.add_argument(
-        "--episode-index",
+        "--episode-indices",
        type=int,
-        required=True,
-        help="Episode to visualize.",
+        nargs="*",
+        default=None,
+        help="Episode indices to visualize (e.g. `0 1 5 6` to load episodes of index 0, 1, 5 and 6). By default loads all episodes.",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default=None,
+        help="Directory path to write html files and kickoff a web server. By default write them to 'outputs/visualize_dataset/REPO_ID'.",
+    )
+    parser.add_argument(
+        "--serve",
+        type=int,
+        default=1,
+        help="Launch web server.",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=9090,
+        help="Web port used by the http server.",
+    )
+    parser.add_argument(
+        "--force-overwrite",
+        type=int,
+        default=1,
+        help="Delete the output directory if it exists already.",
+    )
+
+    parser.add_argument(
+        "--policy-repo-id",
+        type=str,
+        default=None,
+        help="Name of hugging face repositery containing a pretrained policy (e.g. `lerobot/diffusion_pusht` for https://huggingface.co/lerobot/diffusion_pusht).",
+    )
+    parser.add_argument(
+        "--policy-ckpt-path",
+        type=str,
+        default=None,
+        help="Name of hugging face repositery containing a pretrained policy (e.g. `lerobot/diffusion_pusht` for https://huggingface.co/lerobot/diffusion_pusht).",
    )
    parser.add_argument(
        "--batch-size",
@@ -217,44 +561,6 @@ def main():
        default=4,
        help="Number of processes of Dataloader for loading the data.",
    )
-    parser.add_argument(
-        "--mode",
-        type=str,
-        default="local",
-        help=(
-            "Mode of viewing between 'local' or 'distant'. "
-            "'local' requires data to be on a local machine. It spawns a viewer to visualize the data locally. "
-            "'distant' creates a server on the distant machine where the data is stored. "
-            "Visualize the data by connecting to the server with `rerun ws://localhost:PORT` on the local machine."
-        ),
-    )
-    parser.add_argument(
-        "--web-port",
-        type=int,
-        default=9090,
-        help="Web port for rerun.io when `--mode distant` is set.",
-    )
-    parser.add_argument(
-        "--ws-port",
-        type=int,
-        default=9087,
-        help="Web socket port for rerun.io when `--mode distant` is set.",
-    )
-    parser.add_argument(
-        "--save",
-        type=int,
-        default=0,
-        help=(
-            "Save a .rrd file in the directory provided by `--output-dir`. "
-            "It also deactivates the spawning of a viewer. "
-            "Visualize the data by running `rerun path/to/file.rrd` on your local machine."
-        ),
-    )
-    parser.add_argument(
-        "--output-dir",
-        type=str,
-        help="Directory path to write a .rrd file when `--save 1` is set.",
-    )

    args = parser.parse_args()
    visualize_dataset(**vars(args))
--- a/lerobot/scripts/visualize_dataset_rerun.py
+++ b/lerobot/scripts/visualize_dataset_rerun.py
@@ -0,0 +1,263 @@
+#!/usr/bin/env python
+
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" Visualize data of **all** frames of any episode of a dataset of type LeRobotDataset.
+
+Note: The last frame of the episode doesnt always correspond to a final state.
+That's because our datasets are composed of transition from state to state up to
+the antepenultimate state associated to the ultimate action to arrive in the final state.
+However, there might not be a transition from a final state to another state.
+
+Note: This script aims to visualize the data used to train the neural networks.
+~What you see is what you get~. When visualizing image modality, it is often expected to observe
+lossly compression artifacts since these images have been decoded from compressed mp4 videos to
+save disk space. The compression factor applied has been tuned to not affect success rate.
+
+Examples:
+
+- Visualize data stored on a local machine:
+```
+local$ python lerobot/scripts/visualize_dataset.py \
+    --repo-id lerobot/pusht \
+    --episode-index 0
+```
+
+- Visualize data stored on a distant machine with a local viewer:
+```
+distant$ python lerobot/scripts/visualize_dataset.py \
+    --repo-id lerobot/pusht \
+    --episode-index 0 \
+    --save 1 \
+    --output-dir path/to/directory
+
+local$ scp distant:path/to/directory/lerobot_pusht_episode_0.rrd .
+local$ rerun lerobot_pusht_episode_0.rrd
+```
+
+- Visualize data stored on a distant machine through streaming:
+(You need to forward the websocket port to the distant machine, with
+`ssh -L 9087:localhost:9087 username@remote-host`)
+```
+distant$ python lerobot/scripts/visualize_dataset.py \
+    --repo-id lerobot/pusht \
+    --episode-index 0 \
+    --mode distant \
+    --ws-port 9087
+
+local$ rerun ws://localhost:9087
+```
+
+"""
+
+import argparse
+import gc
+import logging
+import time
+from pathlib import Path
+
+import rerun as rr
+import torch
+import tqdm
+
+from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
+
+
+class EpisodeSampler(torch.utils.data.Sampler):
+    def __init__(self, dataset, episode_index):
+        from_idx = dataset.episode_data_index["from"][episode_index].item()
+        to_idx = dataset.episode_data_index["to"][episode_index].item()
+        self.frame_ids = range(from_idx, to_idx)
+
+    def __iter__(self):
+        return iter(self.frame_ids)
+
+    def __len__(self):
+        return len(self.frame_ids)
+
+
+def to_hwc_uint8_numpy(chw_float32_torch):
+    assert chw_float32_torch.dtype == torch.float32
+    assert chw_float32_torch.ndim == 3
+    c, h, w = chw_float32_torch.shape
+    assert c < h and c < w, f"expect channel first images, but instead {chw_float32_torch.shape}"
+    hwc_uint8_numpy = (chw_float32_torch * 255).type(torch.uint8).permute(1, 2, 0).numpy()
+    return hwc_uint8_numpy
+
+
+def visualize_dataset(
+    repo_id: str,
+    episode_index: int,
+    batch_size: int = 32,
+    num_workers: int = 0,
+    mode: str = "local",
+    web_port: int = 9090,
+    ws_port: int = 9087,
+    save: bool = False,
+    output_dir: Path | None = None,
+) -> Path | None:
+    if save:
+        assert (
+            output_dir is not None
+        ), "Set an output directory where to write .rrd files with `--output-dir path/to/directory`."
+
+    logging.info("Loading dataset")
+    dataset = LeRobotDataset(repo_id)
+
+    logging.info("Loading dataloader")
+    episode_sampler = EpisodeSampler(dataset, episode_index)
+    dataloader = torch.utils.data.DataLoader(
+        dataset,
+        num_workers=num_workers,
+        batch_size=batch_size,
+        sampler=episode_sampler,
+    )
+
+    logging.info("Starting Rerun")
+
+    if mode not in ["local", "distant"]:
+        raise ValueError(mode)
+
+    spawn_local_viewer = mode == "local" and not save
+    rr.init(f"{repo_id}/episode_{episode_index}", spawn=spawn_local_viewer)
+
+    # Manually call python garbage collector after `rr.init` to avoid hanging in a blocking flush
+    # when iterating on a dataloader with `num_workers` > 0
+    # TODO(rcadene): remove `gc.collect` when rerun version 0.16 is out, which includes a fix
+    gc.collect()
+
+    if mode == "distant":
+        rr.serve(open_browser=False, web_port=web_port, ws_port=ws_port)
+
+    logging.info("Logging to Rerun")
+
+    for batch in tqdm.tqdm(dataloader, total=len(dataloader)):
+        # iterate over the batch
+        for i in range(len(batch["index"])):
+            rr.set_time_sequence("frame_index", batch["frame_index"][i].item())
+            rr.set_time_seconds("timestamp", batch["timestamp"][i].item())
+
+            # display each camera image
+            for key in dataset.camera_keys:
+                # TODO(rcadene): add `.compress()`? is it lossless?
+                rr.log(key, rr.Image(to_hwc_uint8_numpy(batch[key][i])))
+
+            # display each dimension of action space (e.g. actuators command)
+            if "action" in batch:
+                for dim_idx, val in enumerate(batch["action"][i]):
+                    rr.log(f"action/{dim_idx}", rr.Scalar(val.item()))
+
+            # display each dimension of observed state space (e.g. agent position in joint space)
+            if "observation.state" in batch:
+                for dim_idx, val in enumerate(batch["observation.state"][i]):
+                    rr.log(f"state/{dim_idx}", rr.Scalar(val.item()))
+
+            if "next.done" in batch:
+                rr.log("next.done", rr.Scalar(batch["next.done"][i].item()))
+
+            if "next.reward" in batch:
+                rr.log("next.reward", rr.Scalar(batch["next.reward"][i].item()))
+
+            if "next.success" in batch:
+                rr.log("next.success", rr.Scalar(batch["next.success"][i].item()))
+
+    if mode == "local" and save:
+        # save .rrd locally
+        output_dir = Path(output_dir)
+        output_dir.mkdir(parents=True, exist_ok=True)
+        repo_id_str = repo_id.replace("/", "_")
+        rrd_path = output_dir / f"{repo_id_str}_episode_{episode_index}.rrd"
+        rr.save(rrd_path)
+        return rrd_path
+
+    elif mode == "distant":
+        # stop the process from exiting since it is serving the websocket connection
+        try:
+            while True:
+                time.sleep(1)
+        except KeyboardInterrupt:
+            print("Ctrl-C received. Exiting.")
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        required=True,
+        help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht`).",
+    )
+    parser.add_argument(
+        "--episode-index",
+        type=int,
+        required=True,
+        help="Episode to visualize.",
+    )
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=32,
+        help="Batch size loaded by DataLoader.",
+    )
+    parser.add_argument(
+        "--num-workers",
+        type=int,
+        default=4,
+        help="Number of processes of Dataloader for loading the data.",
+    )
+    parser.add_argument(
+        "--mode",
+        type=str,
+        default="local",
+        help=(
+            "Mode of viewing between 'local' or 'distant'. "
+            "'local' requires data to be on a local machine. It spawns a viewer to visualize the data locally. "
+            "'distant' creates a server on the distant machine where the data is stored. Visualize the data by connecting to the server with `rerun ws://localhost:PORT` on the local machine."
+        ),
+    )
+    parser.add_argument(
+        "--web-port",
+        type=int,
+        default=9090,
+        help="Web port for rerun.io when `--mode distant` is set.",
+    )
+    parser.add_argument(
+        "--ws-port",
+        type=int,
+        default=9087,
+        help="Web socket port for rerun.io when `--mode distant` is set.",
+    )
+    parser.add_argument(
+        "--save",
+        type=int,
+        default=0,
+        help=(
+            "Save a .rrd file in the directory provided by `--output-dir`. "
+            "It also deactivates the spawning of a viewer. ",
+            "Visualize the data by running `rerun path/to/file.rrd` on your local machine.",
+        ),
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        help="Directory path to write a .rrd file when `--save 1` is set.",
+    )
+
+    args = parser.parse_args()
+    visualize_dataset(**vars(args))
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/test_examples.py
+++ b/tests/test_examples.py
@@ -29,8 +29,8 @@ def _find_and_replace(text: str, finds_and_replaces: list[tuple[str, str]]) -> s
    return text


-def _run_script(path, args=None):
-    subprocess.run([sys.executable, path] + args if args is not None else [], check=True)
+def _run_script(path):
+    subprocess.run([sys.executable, path], check=True)


 def _read_file(path):
@@ -126,22 +126,3 @@ def test_examples_basic2_basic3_advanced1():
    # Restore stdout to its original state
    sys.stdout = sys.__stdout__
    assert "Average loss on validation set" in printed_output
-
-
-def test_real_world_recording():
-    path = "examples/real_robot_example/record_training_data.py"
-    _run_script(
-        path,
-        [
-            "--data_dir",
-            "outputs/examples",
-            "--repo-id",
-            "real_world_debug",
-            "--num-episodes",
-            "2",
-            "--num-frames",
-            "10",
-            "--mock-robot",
-        ],
-    )
-    assert Path("outputs/examples/real_world_debug/video/episode_0.mp4").exists()
--- a/tests/test_visualize_dataset.py
+++ b/tests/test_visualize_dataset.py
@@ -25,9 +25,8 @@ from lerobot.scripts.visualize_dataset import visualize_dataset
 def test_visualize_dataset(tmpdir, repo_id):
    rrd_path = visualize_dataset(
        repo_id,
-        episode_index=0,
-        batch_size=32,
-        save=True,
+        episode_indices=[0],
        output_dir=tmpdir,
+        serve=False,
    )
    assert rrd_path.exists()
Author	SHA1	Message	Date
Remi Cadene	b65247feee	Add mobile and neck	2024-06-06 14:01:14 +00:00
Remi Cadene	5e85a2c50b	Add reachy2 dataset, policy, env	2024-06-04 12:31:59 +00:00
Remi Cadene	a56626cf9c	Add custom visualize_dataset.py	2024-06-03 15:47:12 +00:00
Remi Cadene	44ba4ed566	Fix aloha (WIP: do not train in sim)	2024-06-03 14:47:06 +00:00