Added hilserl.mdx document which contains all the documentation for the RL training for the real robot and simulation and the training of the reward classifier

(hotfix): nightly CI by clipping pymunk version below 7.0.0 (#1182 )
[Fix] Unpin torch beyond 2.6.0 & torchcodec beyond 0.2.1 (#1127 )
2025-06-02 18:36:35 +02:00 · 2025-06-02 13:18:02 +02:00 · 2025-05-28 16:54:20 +02:00 · 2025-05-27 11:50:43 +02:00 · 2025-05-27 11:45:07 +02:00 · 2025-05-25 17:20:45 +02:00
27 changed files with 560 additions and 26 deletions
--- a/README.md
+++ b/README.md
@@ -360,7 +360,7 @@ with profile(
 If you want, you can cite this work with:
 ```bibtex
@misc{cadene2024lerobot,
-    author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas},
+    author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascale, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
    title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
    howpublished = "\url{https://github.com/huggingface/lerobot}",
    year = {2024}
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -9,4 +9,6 @@
    title: Assemble SO-101
  - local: getting_started_real_world_robot
    title: Getting Started with Real-World Robots
+  - local: hilserl
+    title: Getting Started with Reinforcement Learning
  title: "Tutorials"
--- a/docs/source/hilserl.mdx
+++ b/docs/source/hilserl.mdx
@@ -0,0 +1,512 @@
+# HilSerl Real Robot Training Workflow Guide
+
+Human-in-the-Loop Sample-Efficient Reinforcement Learning (HIL-SERL) with LeRobot workflow for taking a policy from “zero” to real-world robot mastery in just a couple of hours.
+It combines three ingredients:
+	1.	**Offline demonstrations & reward classifier:** a handful of human-teleop episodes plus a vision-based success detector give the policy a shaped starting point.
+	2.	**On-robot actor / learner loop with human interventions:** a distributed SAC/RLPD learner updates the policy while an actor explores on the physical robot; the human can jump in at any time to correct dangerous or unproductive behaviour.
+	3.	**Safety & efficiency tools:** joint/EE bounds, impedance control, crop-ROI preprocessing and WandB monitoring keep the data useful and the hardware safe.
+
+Together these elements let HIL-SERL reach near-perfect task success and faster cycle times than imitation-only baselines.
+
+<p align="center">
+  <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hilserl-main-figure.png" alt="HIL-SERL workflow" title="HIL-SERL workflow" width="100%"></img>
+</p>
+
+<p align="center"><i>HIL-SERL workflow, Luo et al. 2024</i></p>
+
+This guide provides step-by-step instructions for training a robot policy using LeRobot's HilSerl implementation to train on a real robot.
+
+
+# 1. Real Robot Training Workflow
+
+## 1.1 Understanding Configuration
+
+The training process begins with proper configuration for the HILSerl environment. The configuration class of interest is `HILSerlRobotEnvConfig` in `lerobot/common/envs/configs.py`. Which is defined as:
+
+```python
+class HILSerlRobotEnvConfig(EnvConfig):
+    robot: Optional[RobotConfig] = None    # Main robot agent (defined in `lerobot/common/robots`)
+    teleop: Optional[TeleoperatorConfig] = None    # Teleoperator agent, e.g., gamepad or leader arm, (defined in `lerobot/common/teleoperators`)
+    wrapper: Optional[EnvTransformConfig] = None    # Environment wrapper settings; check `lerobot/scripts/server/gym_manipulator.py`
+    fps: int = 10    # Control frequency
+    name: str = "real_robot"    # Environment name
+    mode: str = None    # "record", "replay", or None (for training)
+    repo_id: Optional[str] = None    # LeRobot dataset repository ID
+    dataset_root: Optional[str] = None    # Local dataset root (optional)
+    task: str = ""    # Task identifier
+    num_episodes: int = 10    # Number of episodes for recording
+    episode: int = 0    # episode index for replay
+    device: str = "cuda"    # Compute device
+    push_to_hub: bool = True    # Whether to push the recorded datasets to Hub
+    pretrained_policy_name_or_path: Optional[str] = None    # For policy loading
+    reward_classifier_pretrained_path: Optional[str] = None    # For reward model
+```
+
+
+## 1.2 Finding Robot Workspace Bounds
+
+Before collecting demonstrations, you need to determine the appropriate operational bounds for your robot.
+
+This helps simplifying the problem of learning on the real robot by limiting the robot's operational space to a specific region that solves the task and avoids unnecessary or unsafe exploration.
+
+### 1.2.1 Using find_joint_limits.py
+
+This script helps you find the safe operational bounds for your robot's end-effector. Given that you have a follower and leader arm, you can use the script to find the bounds for the follower arm that will be applied during training.
+Bounding the action space will reduce the redundant exploration of the agent and guarantees safety.
+
+```bash
+python -m lerobot.scripts.find_joint_limits \
+    --robot.type=so100_follower \
+    --robot.port=/dev/tty.usbmodem58760431541 \
+    --robot.id=black \
+    --teleop.type=so100_leader \
+    --teleop.port=/dev/tty.usbmodem58760431551 \
+    --teleop.id=blue
+```
+
+### 1.2.2 Workflow
+
+1. Run the script and move the robot through the space that solves the task
+2. The script will record the minimum and maximum end-effector positions and the joint angles and prints them to the console, for example:
+   ```
+   Max ee position [0.24170487 0.201285   0.10273342]
+   Min ee position [0.16631757 -0.08237468  0.03364977]
+   Max joint positions [-20.0, -20.0, -20.0, -20.0, -20.0, -20.0]
+   Min joint positions [50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
+   ```
+3. Use these values in the configuration of you teleoperation device (TeleoperatorConfig) under the `end_effector_bounds` field
+
+### 1.2.3 Example Configuration
+
+```json
+"end_effector_bounds": {
+    "max": [0.24, 0.20, 0.10],
+    "min": [0.16, -0.08, 0.03]
+}
+```
+
+## 1.3 Collecting Demonstrations
+
+With the bounds defined, you can safely collect demonstrations for training. Training RL with off-policy algorithm allows us to use offline datasets collected in order to improve the efficiency of the learning process.
+
+### 1.3.1 Setting Up Record Mode
+
+Create a configuration file for recording demonstrations (or edit an existing one like `env_config_so100.json`):
+
+1. Set `mode` to `"record"` 
+2. Specify a unique `repo_id` for your dataset (e.g., "username/task_name")
+3. Set `num_episodes` to the number of demonstrations you want to collect
+4. Set `crop_params_dict` to `null` initially (we'll determine crops later)
+5. Configure `robot`, `cameras`, and other hardware settings
+
+Example configuration section:
+```json
+"mode": "record",
+"repo_id": "username/pick_lift_cube",
+"dataset_root": null,
+"task": "pick_and_lift",
+"num_episodes": 15,
+"episode": 0,
+"push_to_hub": true
+```
+
+### 1.3.2 Gamepad Controls
+
+<p align="center">
+  <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/gamepad_guide.jpg?raw=true" alt="Figure shows the control mappings on a Logitech gamepad." title="Gamepad Control Mapping" width="100%"></img>
+</p>
+<p align="center"><i>Gamepad button mapping for robot control and episode management</i></p>
+
+
+### 1.3.3 Recording Demonstrations
+
+Start the recording process:
+
+```bash
+python lerobot/scripts/rl/gym_manipulator.py --config_path lerobot/configs/env_config_so100.json
+```
+
+During recording:
+1. The robot will reset to the initial position defined in the configuration file `fixed_reset_position`
+2. Use the gamepad to control the robot by setting `"control_mode"="gamepad"` in the configuration file
+3. Complete the task successfully
+4. The episode ends with a reward of 1 when you press the "success" button
+5. If the time limit is reached, or the fail button is pressed, the episode ends with a reward of 0
+6. You can rerecord an episode by pressing the "rerecord" button
+7. The process automatically continues to the next episode
+8. After recording all episodes, the dataset is pushed to the Hugging Face Hub (optional) and saved locally
+
+
+
+## 1.4 Processing the Dataset
+
+After collecting demonstrations, process them to determine optimal camera crops.
+Reinforcement learning is sensitive to background distractions, so it is important to crop the images to the relevant workspace area.
+Note: If you already know the crop parameters, you can skip this step and just set the `crop_params_dict` in the configuration file during recording.
+
+### 1.4.1 Determining Crop Parameters
+
+Use the `crop_dataset_roi.py` script to interactively select regions of interest in your camera images:
+
+```bash
+python lerobot/scripts/rl/crop_dataset_roi.py --repo-id username/pick_lift_cube
+```
+
+1. For each camera view, the script will display the first frame
+2. Draw a rectangle around the relevant workspace area
+3. Press 'c' to confirm the selection
+4. Repeat for all camera views
+5. The script outputs cropping parameters and creates a new cropped dataset
+
+Example output:
+```
+Selected Rectangular Regions of Interest (top, left, height, width):
+observation.images.side: [180, 207, 180, 200]
+observation.images.front: [180, 250, 120, 150]
+```
+
+<p align="center">
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/crop_dataset.gif" width="600"/>
+</p>
+
+<p align="center"><i>Interactive cropping tool for selecting regions of interest</i></p>
+
+
+### 1.4.2 Updating Configuration
+
+Add these crop parameters to your training configuration:
+
+```json
+"crop_params_dict": {
+    "observation.images.side": [180, 207, 180, 200],
+    "observation.images.front": [180, 250, 120, 150]
+},
+"resize_size": [128, 128]
+```
+
+## 1.5 Training with Actor-Learner
+
+The LeRobot system uses a distributed actor-learner architecture for training. You will need to start two processes: a learner and an actor.
+
+### 1.5.1 Configuration Setup
+
+Create a training configuration file (See example `train_config_hilserl_so100.json`). The training config is based on the main `TrainPipelineConfig` class in `lerobot/configs/train.py`.
+
+1. Set `mode` to `null` (for training mode)
+2. Configure the policy settings (`type`, `device`, etc.)
+3. Set `dataset` to your cropped dataset
+4. Configure environment settings with crop parameters
+5. Check the other parameters related to SAC.
+6. Verify that the `policy` config is correct with the right `input_features` and `output_features` for your task.
+
+### 1.5.2 Starting the Learner
+
+First, start the learner server process:
+
+```bash
+python lerobot/scripts/rl/learner.py --config_path lerobot/configs/train_config_hilserl_so100.json
+```
+
+The learner:
+- Initializes the policy network
+- Prepares replay buffers
+- Opens a gRPC server to communicate with actors
+- Processes transitions and updates the policy
+
+### 1.5.3 Starting the Actor
+
+In a separate terminal, start the actor process with the same configuration:
+
+```bash
+python lerobot/scripts/rl/actor.py --config_path lerobot/configs/train_config_hilserl_so100.json
+```
+
+The actor:
+- Connects to the learner via gRPC
+- Initializes the environment
+- Execute rollouts of the policy to collect experience
+- Sends transitions to the learner
+- Receives updated policy parameters
+
+### 1.5.4 Training Flow
+
+The training proceeds automatically:
+
+1. The actor executes the policy in the environment
+2. Transitions are collected and sent to the learner
+3. The learner updates the policy based on these transitions
+4. Updated policy parameters are sent back to the actor
+5. The process continues until the specified step limit is reached
+
+### 1.5.5 Human in the Loop
+
+- The key to learning efficiently is to have a human interventions to provide corrective feedback and completing the task to aide the policy learning and exploration.
+- To perform human interventions, you can press the upper right trigger button on the gamepad. This will pause the policy actions and allow you to take over. 
+- A successful experiment is one where the human has to intervene at the start but then reduces the amount of interventions as the policy improves. You can monitor the intervention rate in the `wandb` dashboard.
+
+<p align="center">
+  <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/hil_effect.png?raw=true" alt="Figure shows the control mappings on a Logitech gamepad." title="Gamepad Control Mapping" width="100%"></img>
+</p>
+
+<p align="center"><i>Example showing how human interventions help guide policy learning over time</i></p>
+
+- The figure shows the plot of the episodic reward over interaction step. The figure shows the effect of human interventions on the policy learning.
+- The orange curve is an experiment without any human interventions. While the pink and blue curves are experiments with human interventions.
+- We can observe that the number of steps where the policy starts acheiving the maximum reward is cut by a quarter when human interventions are present.
+
+#### Guide to Human Interventions
+The strategy to follow is to intervene heavily at the start of training and then reduce the amount of interventions as the training progresses. Some tips and hints:
+- Interevene for almost the length of the entire episode at the first few episodes.
+- When the policy is less chaotic, gradually reduce the intervention time during one episode and let the policy explore for a longer time.
+- Once the policy start guiding the robot towards acheiving the task, even if its not perfect, you can limit your interventions to simple quick actions like a grasping command, or grasp and lift command. 
+
+## 1.6 Monitoring and Debugging
+
+If you have `wandb.enable` set to `true` in your configuration, you can monitor training progress in real-time through the [Weights & Biases](https://wandb.ai/site/) dashboard. 
+
+# 2. Training a Reward Classifier with LeRobot
+
+This guide explains how to train a reward classifier for human-in-the-loop reinforcement learning implementation of  LeRobot. Reward classifiers learn to predict the reward value given a state which can be used in an RL setup to train a policy.
+
+
+The reward classifier implementation in `modeling_classifier.py` uses a pretrained vision model to process the images. It can output either a single value for binary rewards to predict success/fail cases or multiple values for multi-class settings.
+
+## 2.1 Collecting a Dataset
+Before training, you need to collect a dataset with labeled examples. The `record_dataset` function in `gym_manipulator.py` enables the process of collecting a dataset of observations, actions, and rewards.
+
+To collect a dataset, you need to modeify some parameters in the environment configuration based on HILSerlRobotEnvConfig.
+
+```bash
+python lerobot/scripts/rl/gym_manipulator.py --config_path lerobot/configs/reward_classifier_train_config.json
+```
+
+### 2.1.1 Key Parameters for Data Collection:
+
+- **mode**: set it to "record" to collect a dataset
+- **repo_id**: "hf_username/dataset_name", name of the dataset and repo on the hub
+- **num_episodes**: Number of episodes to record
+- **number_of_steps_after_success**: Number of additional frames to record after a success (reward=1) is detected
+- **fps**: Number of frames per second to record
+- **push_to_hub**: Whether to push the dataset to the hub
+
+The `number_of_steps_after_success` parameter is crucial as it allows you to collect more positive examples. When a success is detected, the system will continue recording for the specified number of steps while maintaining the reward=1 label. Otherwise, there won't be enough states in the dataset labeled to 1 to train a good classifier.
+
+Example configuration section for data collection:
+
+```json
+{
+    "mode": "record",
+    "repo_id": "hf_username/dataset_name",
+    "dataset_root": "data/your_dataset",
+    "num_episodes": 20,
+    "push_to_hub": true,
+    "fps": 10,
+    "number_of_steps_after_success": 15
+}
+```
+
+## 2.2 Reward Classifier Configuration
+
+The reward classifier is configured using `configuration_classifier.py`. Here are the key parameters:
+
+- **model_name**: Base model architecture (e.g., we mainly use "helper2424/resnet10")
+- **model_type**: "cnn" or "transformer"
+- **num_cameras**: Number of camera inputs
+- **num_classes**: Number of output classes (typically 2 for binary success/failure)
+- **hidden_dim**: Size of hidden representation
+- **dropout_rate**: Regularization parameter
+- **learning_rate**: Learning rate for optimizer
+
+Example configuration from `reward_classifier_train_config.json`:
+
+```json
+{
+  "policy": {
+    "type": "reward_classifier",
+    "model_name": "helper2424/resnet10",
+    "model_type": "cnn",
+    "num_cameras": 2,
+    "num_classes": 2,
+    "hidden_dim": 256,
+    "dropout_rate": 0.1,
+    "learning_rate": 1e-4,
+    "device": "cuda",
+    "use_amp": true,
+    "input_features": {
+      "observation.images.front": {
+        "type": "VISUAL",
+        "shape": [3, 128, 128]
+      },
+      "observation.images.side": {
+        "type": "VISUAL",
+        "shape": [3, 128, 128]
+      }
+    }
+  }
+}
+```
+
+## 2.3 Training the Classifier
+
+To train the classifier, use the `train.py` script with your configuration:
+
+```bash
+python lerobot/scripts/train.py --config_path lerobot/configs/reward_classifier_train_config.json
+```
+
+## 2.4 Deploying and Testing the Model
+
+To use your trained reward classifier, configure the `HILSerlRobotEnvConfig` to use your model:
+
+```python
+env_config = HILSerlRobotEnvConfig(
+    reward_classifier_pretrained_path="path_to_your_pretrained_trained_model",
+    # Other environment parameters
+)
+```
+or set the argument in the json config file.
+
+```json
+{
+    "reward_classifier_pretrained_path": "path_to_your_pretrained_model"
+}
+```
+
+Run gym_manipulator.py to test the model.
+```bash
+python lerobot/scripts/rl/gym_manipulator.py --config_path lerobot/configs/env_config.json
+```
+
+The reward classifier will automatically provide rewards based on the visual input from the robot's cameras.
+
+## 2.5 Example Workflow
+
+1. **Create the configuration files**:
+   Create the necessary json configuration files for the reward classifier and the environment. Check the `json_examples` directory for examples.
+
+2. **Collect a dataset**:
+   ```bash
+   python lerobot/scripts/rl/gym_manipulator.py --config_path lerobot/configs/env_config.json
+   ```
+
+3. **Train the classifier**:
+   ```bash
+   python lerobot/scripts/train.py --config_path lerobot/configs/reward_classifier_train_config.json
+   ```
+
+4. **Test the classifier**:
+   ```bash
+   python lerobot/scripts/rl/gym_manipulator.py --config_path lerobot/configs/env_config.json
+   ```
+# 3. Using gym_hil Simulation Environments with LeRobot
+
+This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.
+
+`gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:
+
+- Train policies in simulation to test the RL stack before training on real robots
+
+- Collect demonstrations in sim using external devices like gamepads or keyboards
+- Perform human interventions during policy learning
+
+Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.
+
+## 3.1 Installation
+
+First, install the `gym_hil` package within the LeRobot environment:
+
+```bash
+pip install gym_hil
+
+# Or in LeRobot
+cd lerobot
+pip install -e .[hilserl]
+```
+
+## 3.2 Configuration
+
+To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided in `gym_hil_env.json`. Key configuration sections include:
+
+### 3.2.1 Environment Type and Task
+
+```json
+{
+    "type": "hil",
+    "name": "franka_sim",
+    "task": "PandaPickCubeGamepad-v0",
+    "device": "cuda"
+}
+```
+
+Available tasks:
+- `PandaPickCubeBase-v0`: Basic environment
+- `PandaPickCubeGamepad-v0`: With gamepad control
+- `PandaPickCubeKeyboard-v0`: With keyboard control
+
+### 3.2.2 Gym Wrappers Configuration
+
+```json
+"wrapper": {
+    "gripper_penalty": -0.02,
+    "control_time_s": 15.0,
+    "use_gripper": true,
+    "fixed_reset_joint_positions": [0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785],
+    "end_effector_step_sizes": {
+        "x": 0.025,
+        "y": 0.025,
+        "z": 0.025
+    },
+    "control_mode": "gamepad"
+    }
+```
+
+Important parameters:
+- `gripper_penalty`: Penalty for excessive gripper movement
+- `use_gripper`: Whether to enable gripper control
+- `end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector
+- `control_mode`: Set to "gamepad" to use a gamepad controller
+
+## 3.3 Running with HIL RL of LeRobot
+
+### 3.3.1 Basic Usage
+
+To run the environment, set mode to null:
+
+```python
+python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
+```
+
+### 3.3.2 Recording a Dataset
+
+To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record:
+
+```python
+python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
+```
+
+### 3.3.3 Training a Policy
+
+To train a policy, checkout the example json in `train_gym_hil_env.json` and run the actor and learner servers:
+
+```python
+python lerobot/scripts/rl/actor.py --config_path path/to/train_gym_hil_env.json
+```
+
+In a different terminal, run the learner server:
+
+```python 
+python lerobot/scripts/rl/learner.py --config_path path/to/train_gym_hil_env.json
+```
+
+The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots. 
+
+Paper citation:
+
+```
+@article{luo2024precise,
+  title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
+  author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
+  journal={arXiv preprint arXiv:2410.21845},
+  year={2024}
+}
+```
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -55,7 +55,7 @@ conda install ffmpeg -c conda-forge

 Install 🤗 LeRobot:
 ```bash
-cd lerobot && pip install ".[feetech]"
+cd lerobot && pip install -e ".[feetech]"
 ```

 ## Troubleshooting
--- a/examples/11_use_moss.md
+++ b/examples/11_use_moss.md
@@ -141,7 +141,7 @@ python lerobot/scripts/configure_motor.py \
  --ID 1
 ```

-Note: These motors are currently limitated. They can take values between 0 and 4096 only, which corresponds to a full turn. They can't turn more than that. 2048 is at the middle of this range, so we can take -2048 steps (180 degrees anticlockwise) and reach the maximum range, or take +2048 steps (180 degrees clockwise) and reach the maximum range. The configuration step also sets the homing offset to 0, so that if you misassembled the arm, you can always update the homing offset to account for a shift up to ± 2048 steps (± 180 degrees).
+Note: These motors are currently limited. They can take values between 0 and 4096 only, which corresponds to a full turn. They can't turn more than that. 2048 is at the middle of this range, so we can take -2048 steps (180 degrees anticlockwise) and reach the maximum range, or take +2048 steps (180 degrees clockwise) and reach the maximum range. The configuration step also sets the homing offset to 0, so that if you misassembled the arm, you can always update the homing offset to account for a shift up to ± 2048 steps (± 180 degrees).

 Then unplug your motor and plug the second motor and set its ID to 2.
 ```bash
--- a/examples/12_use_so101.md
+++ b/examples/12_use_so101.md
@@ -61,7 +61,7 @@ conda install ffmpeg -c conda-forge

 Install 🤗 LeRobot:
 ```bash
-cd lerobot && pip install ".[feetech]"
+cd lerobot && pip install -e ".[feetech]"
 ```

 > [!NOTE]
--- a/lerobot/common/datasets/image_writer.py
+++ b/lerobot/common/datasets/image_writer.py
@@ -106,7 +106,7 @@ def worker_process(queue: queue.Queue, num_threads: int):
 class AsyncImageWriter:
    """
    This class abstract away the initialisation of processes or/and threads to
-    save images on disk asynchrounously, which is critical to control a robot and record data
+    save images on disk asynchronously, which is critical to control a robot and record data
    at a high frame rate.

    When `num_processes=0`, it creates a threads pool of size `num_threads`.
--- a/lerobot/common/datasets/lerobot_dataset.py
+++ b/lerobot/common/datasets/lerobot_dataset.py
@@ -944,7 +944,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
    def stop_image_writer(self) -> None:
        """
        Whenever wrapping this dataset inside a parallelized DataLoader, this needs to be called first to
-        remove the image_writer in order for the LeRobotDataset object to be pickleable and parallelized.
+        remove the image_writer in order for the LeRobotDataset object to be picklable and parallelized.
        """
        if self.image_writer is not None:
            self.image_writer.stop()
--- a/lerobot/common/datasets/video_utils.py
+++ b/lerobot/common/datasets/video_utils.py
@@ -101,7 +101,7 @@ def decode_video_frames_torchvision(
    keyframes_only = False
    torchvision.set_video_backend(backend)
    if backend == "pyav":
-        keyframes_only = True  # pyav doesnt support accuracte seek
+        keyframes_only = True  # pyav doesn't support accurate seek

    # set a video stream reader
    # TODO(rcadene): also load audio stream at the same time
--- a/lerobot/common/policies/pi0/modeling_pi0.py
+++ b/lerobot/common/policies/pi0/modeling_pi0.py
@@ -357,7 +357,7 @@ class PI0Policy(PreTrainedPolicy):
            if self.config.resize_imgs_with_padding is not None:
                img = resize_with_pad(img, *self.config.resize_imgs_with_padding, pad_value=0)

-            # Normalize from range [0,1] to [-1,1] as expacted by siglip
+            # Normalize from range [0,1] to [-1,1] as expected by siglip
            img = img * 2.0 - 1.0

            bsize = img.shape[0]
--- a/lerobot/common/policies/pi0fast/modeling_pi0fast.py
+++ b/lerobot/common/policies/pi0fast/modeling_pi0fast.py
@@ -516,7 +516,7 @@ class PI0FAST(nn.Module):
                        interpolate_like_pi=self.config.interpolate_like_pi,
                    )

-                # Normalize from range [0,1] to [-1,1] as expacted by siglip
+                # Normalize from range [0,1] to [-1,1] as expected by siglip
                img = img * 2.0 - 1.0

                bsize = img.shape[0]
--- a/lerobot/common/robot_devices/control_utils.py
+++ b/lerobot/common/robot_devices/control_utils.py
@@ -243,6 +243,11 @@ def control_loop(

    timestamp = 0
    start_episode_t = time.perf_counter()
+
+    # Controls starts, if policy is given it needs cleaning up
+    if policy is not None:
+        policy.reset()
+
    while timestamp < control_time_s:
        start_loop_t = time.perf_counter()

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -63,13 +63,13 @@ dependencies = [
    "opencv-python-headless>=4.9.0",
    "packaging>=24.2",
    "av>=14.2.0",
-    "pymunk>=6.6.0",
+    "pymunk>=6.6.0,<7.0.0",
    "pynput>=1.7.7",
    "pyzmq>=26.2.1",
    "rerun-sdk>=0.21.0",
    "termcolor>=2.4.0",
-    "torch>=2.2.1,<2.7",
-    "torchcodec==0.2.1; sys_platform != 'win32' and (sys_platform != 'linux' or (platform_machine != 'aarch64' and platform_machine != 'arm64' and platform_machine != 'armv7l')) and (sys_platform != 'darwin' or platform_machine != 'x86_64')",
+    "torch>=2.2.1",
+    "torchcodec>=0.2.1; sys_platform != 'win32' and (sys_platform != 'linux' or (platform_machine != 'aarch64' and platform_machine != 'arm64' and platform_machine != 'armv7l')) and (sys_platform != 'darwin' or platform_machine != 'x86_64')",
    "torchvision>=0.21.0",
    "wandb>=0.16.3",
    "zarr>=2.17.0",
--- a/tests/artifacts/image_transforms/default_transforms.safetensors
+++ b/tests/artifacts/image_transforms/default_transforms.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0389a716d51c1c615fb2a3bfa386d89f00b0deca08c4fa21b23e020a939d0213
+oid sha256:6b1e600768a8771c5fe650e038a1193597e3810f032041b2a0d021e4496381c1
 size 3686488
--- a/tests/artifacts/image_transforms/save_image_transforms_to_safetensors.py
+++ b/tests/artifacts/image_transforms/save_image_transforms_to_safetensors.py
@@ -28,7 +28,7 @@ from lerobot.common.datasets.transforms import (
 from lerobot.common.utils.random_utils import seeded_context

 ARTIFACT_DIR = Path("tests/artifacts/image_transforms")
-DATASET_REPO_ID = "lerobot/aloha_mobile_shrimp"
+DATASET_REPO_ID = "lerobot/aloha_static_cups_open"


 def save_default_config_transform(original_frame: torch.Tensor, output_dir: Path):
--- a/tests/artifacts/image_transforms/single_transforms.safetensors
+++ b/tests/artifacts/image_transforms/single_transforms.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0dc691503e7d90b2086bb408e89a65f772ce5ee6e3562ef8c127bcb09bd90851
+oid sha256:9d4ebab73eabddc58879a4e770289d19e00a1a4cf2fa5fa33cd3a3246992bc90
 size 40551392
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_/actions.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_/actions.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cc67af1d60f95d84c98d6c9ebd648990e0f0705368bd6b72d2b39533950b0179
+oid sha256:f3e4c8e85e146b043fd4e4984947c2a6f01627f174a19f18b5914cf690579d77
 size 5104
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_/grad_stats.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_/grad_stats.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:64518cf652105d15f5fd2cfc13d0681f66a4ec4797dc5d5dc2f7b0d91fe5dfd6
+oid sha256:1a7a8b1a457149109f843c32bcbb047d09de2201847b9b79f7501b447f77ecf4
 size 31672
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_/output_dict.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_/output_dict.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:32b6d14fab4244b5140adb345e47f662b6739c04974e04b21c3127caa988abbb
+oid sha256:5e6ce85296b2009e7c2060d336c0429b1c7197d9adb159e7df0ba18003067b36
 size 68
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_/param_stats.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_/param_stats.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1904ef0338f7b6efdec70ec235ee931b5751008bf4eb433edb0b3fa0838a4f1
+oid sha256:9b5f557e30aead3731c38cbd85af8c706395d8689a918ad88805b5a886245603
 size 33400
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/actions.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/actions.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fa544a97f00bf46393a09b006b44c2499bbf7d177782360a8c21cacbf200c07a
+oid sha256:2e6625cabfeb4800abc80252cf9112a9271c154edd01eb291658f143c951610b
 size 515400
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/grad_stats.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/grad_stats.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:83c7a8ae912300b5cedba31904f7ba22542059fd60dd86548a95e415713f719e
+oid sha256:224b5fa4828aa88171b68c036e8919c1eae563e2113f03b6461eadf5bf8525a6
 size 31672
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/output_dict.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/output_dict.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5a010633237b3a1141603c65174c551daa9e7b4c474af5a1376d73e5425bfb5d
+oid sha256:016d2fa8fe5f58017dfd46f4632fdc19dfd751e32a2c7cde2077c6f95546d6bd
 size 68
--- a/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/param_stats.safetensors
+++ b/tests/artifacts/policies/aloha_sim_insertion_human_act_1000_steps/param_stats.safetensors
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ec8b5c440e9fcec190c9be48b28ebb79f82ae63626afe7c811e4bb0c3dd08842
+oid sha256:021562ee3e4814425e367ed0c144d6fbe2eb28838247085716cf0b58fd69a075
 size 33400
--- a/tests/datasets/test_image_transforms.py
+++ b/tests/datasets/test_image_transforms.py
@@ -16,6 +16,7 @@

 import pytest
 import torch
+from packaging import version
 from safetensors.torch import load_file
 from torchvision.transforms import v2
 from torchvision.transforms.v2 import functional as F  # noqa: N812
@@ -253,7 +254,14 @@ def test_backward_compatibility_single_transforms(


@require_x86_64_kernel
+@pytest.mark.skipif(
+    version.parse(torch.__version__) < version.parse("2.7.0"),
+    reason="Test artifacts were generated with PyTorch >= 2.7.0 which has different multinomial behavior",
+)
 def test_backward_compatibility_default_config(img_tensor, default_transforms):
+    # NOTE: PyTorch versions have different randomness, it might break this test.
+    # See this PR: https://github.com/huggingface/lerobot/pull/1127.
+
    cfg = ImageTransformsConfig(enable=True)
    default_tf = ImageTransforms(cfg)

--- a/tests/optim/test_schedulers.py
+++ b/tests/optim/test_schedulers.py
@@ -37,7 +37,6 @@ def test_diffuser_scheduler(optimizer):
        "base_lrs": [0.001],
        "last_epoch": 1,
        "lr_lambdas": [None],
-        "verbose": False,
    }
    assert scheduler.state_dict() == expected_state_dict

@@ -56,7 +55,6 @@ def test_vqbet_scheduler(optimizer):
        "base_lrs": [0.001],
        "last_epoch": 1,
        "lr_lambdas": [None],
-        "verbose": False,
    }
    assert scheduler.state_dict() == expected_state_dict

@@ -77,7 +75,6 @@ def test_cosine_decay_with_warmup_scheduler(optimizer):
        "base_lrs": [0.001],
        "last_epoch": 1,
        "lr_lambdas": [None],
-        "verbose": False,
    }
    assert scheduler.state_dict() == expected_state_dict

--- a/tests/policies/test_policies.py
+++ b/tests/policies/test_policies.py
@@ -20,6 +20,7 @@ from pathlib import Path
 import einops
 import pytest
 import torch
+from packaging import version
 from safetensors.torch import load_file

 from lerobot import available_policies
@@ -408,7 +409,16 @@ def test_backward_compatibility(ds_repo_id: str, policy_name: str, policy_kwargs
        4. Check that this test now passes.
        5. Remember to restore `tests/scripts/save_policy_to_safetensors.py` to its original state.
        6. Remember to stage and commit the resulting changes to `tests/artifacts`.
+
+    NOTE: If the test does not pass, and you don't change the policy, it is likely that the test artifact
+    is out of date. For example, some PyTorch versions have different randomness, see this PR:
+    https://github.com/huggingface/lerobot/pull/1127.
+
    """
+    # NOTE: ACT policy has different randomness, after PyTorch 2.7.0
+    if policy_name == "act" and version.parse(torch.__version__) < version.parse("2.7.0"):
+        pytest.skip(f"Skipping act policy test with PyTorch {torch.__version__}. Requires PyTorch >= 2.7.0")
+
    ds_name = ds_repo_id.split("/")[-1]
    artifact_dir = Path("tests/artifacts/policies") / f"{ds_name}_{policy_name}_{file_name_extra}"
    saved_output_dict = load_file(artifact_dir / "output_dict.safetensors")
Author	SHA1	Message	Date
Michel Aractingi	26d2a13218	Added hilserl.mdx document which contains all the documentation for the RL training for the real robot and simulation and the training of the reward classifier	2025-06-02 18:36:35 +02:00
Adil Zouitine	2be7f3a3ff	(hotfix): nightly CI by clipping pymunk version below 7.0.0 (#1182 )	2025-06-02 13:18:02 +02:00
Adil Zouitine	0cf864870c	[Fix] Unpin torch beyond 2.6.0 & torchcodec beyond 0.2.1 (#1127 )	2025-05-28 16:54:20 +02:00
mshukor	1786916a16	Update README.md (#1163 )	2025-05-27 11:50:43 +02:00
mshukor	0507ad4f68	Update README.md (#1160 )	2025-05-27 11:45:07 +02:00
Ragnar	bed90e3a41	fix: typos and grammar (#1148 )	2025-05-25 17:20:45 +02:00
Francesco Capuano	6163daaaa4	Fix: emptying action queue between resets (#1117 )	2025-05-22 21:37:21 +02:00
Pepijn	8e2a394442	Add editable -e for feetech install command (#1133 )	2025-05-20 18:51:21 +02:00