diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 0e83a1fe..ea80e825 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -5,8 +5,10 @@
title: Installation
title: Get started
- sections:
- - local: getting_started_real_world_robot
- title: Getting Started with Real-World Robots
+ - local: il_robots
+ title: Imitation Learning for Robots
+ - local: il_sim
+ title: Imitation Learning in Sim
- local: cameras
title: Cameras
- local: integrate_hardware
@@ -30,6 +32,10 @@
- local: lekiwi
title: LeKiwi
title: "Robots"
+- sections:
+ - local: notebooks
+ title: Notebooks
+ title: "Resources"
- sections:
- local: contributing
title: Contribute to LeRobot
diff --git a/docs/source/getting_started_real_world_robot.mdx b/docs/source/il_robots.mdx
similarity index 98%
rename from docs/source/getting_started_real_world_robot.mdx
rename to docs/source/il_robots.mdx
index 19392d4c..d13e431c 100644
--- a/docs/source/getting_started_real_world_robot.mdx
+++ b/docs/source/il_robots.mdx
@@ -1,4 +1,4 @@
-# Getting Started with Real-World Robots
+# Imitation Learning on Real-World Robots
This tutorial will explain how to train a neural network to control a real robot autonomously.
@@ -273,6 +273,9 @@ python lerobot/scripts/train.py \
--resume=true
```
+#### Train using Collab
+If your local computer doesn't have a powerful GPU you could utilize Google Collab to train your model by following the [ACT training notebook](./notebooks#training-act).
+
#### Upload policy checkpoints
Once training is done, upload the latest checkpoint with:
diff --git a/docs/source/il_sim.mdx b/docs/source/il_sim.mdx
new file mode 100644
index 00000000..625b2fc0
--- /dev/null
+++ b/docs/source/il_sim.mdx
@@ -0,0 +1,152 @@
+# Imitation Learning in Sim
+
+This tutorial will explain how to train a neural network to control a robot in simulation with imitation learning.
+
+**You'll learn:**
+1. How to record a dataset in simulation with [gym-hil](https://github.com/huggingface/gym-hil) and visualize the dataset.
+2. How to train a policy using your data.
+3. How to evaluate your policy in simulation and visualize the results.
+
+For the simulation environment we use the same [repo](https://github.com/huggingface/gym-hil) that is also being used by the Human-In-the-Loop (HIL) reinforcement learning algorithm.
+This environment is based on [MuJoCo](https://mujoco.org) and allows you to record datasets in LeRobotDataset format.
+Teleoperation is easiest with a controller like the Logitech F710, but you can also use your keyboard if you are up for the challenge.
+
+## Installation
+
+First, install the `gym_hil` package within the LeRobot environment, go to your LeRobot folder and run this command:
+
+```bash
+pip install -e ".[hilserl]"
+```
+
+## Teleoperate and Record a Dataset
+
+To use `gym_hil` with LeRobot, you need to use a configuration file. An example config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_gym_hil_il.json).
+
+To teleoperate and collect a dataset, we need to modify this config file and you should add your `repo_id` here: `"repo_id": "il_gym",` and `"num_episodes": 30,` and make sure you set `mode` to `record`, "mode": "record".
+
+If you do not have a Nvidia GPU also change `"device": "cuda"` parameter in the config file (for example to `mps` for MacOS).
+
+By default the config file assumes you use a controller. To use your keyboard please change the envoirment specified at `"task"` in the config file and set it to `"PandaPickCubeKeyboard-v0"`.
+
+Then we can run this command to start:
+
+
+
+
Gamepad button mapping for robot control and episode management
+ +**Keyboard controls** + +For keyboard controls use the `spacebar` to enable control and the following keys to move the robot: +```bash + Arrow keys: Move in X-Y plane + Shift and Shift_R: Move in Z axis + Right Ctrl and Left Ctrl: Open and close gripper + ESC: Exit +``` + +## Visualize a dataset + +If you uploaded your dataset to the hub you can [visualize your dataset online](https://huggingface.co/spaces/lerobot/visualize_dataset) by copy pasting your repo id. + +
+
+
Dataset visualizer
+ + +## Train a policy + +To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: +```bash +python lerobot/scripts/train.py \ + --dataset.repo_id=${HF_USER}/il_gym \ + --policy.type=act \ + --output_dir=outputs/train/il_sim_test \ + --job_name=il_sim_test \ + --policy.device=cuda \ + --wandb.enable=true +``` + +Let's explain the command: +1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/il_gym`. +2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor states, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset. +4. We provided `policy.device=cuda` since we are training on a Nvidia GPU, but you could use `policy.device=mps` to train on Apple silicon. +5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. + +Training should take several hours, 100k steps (which is the default) will take about 1h on Nvidia A100. You will find checkpoints in `outputs/train/il_sim_test/checkpoints`. + +#### Train using Collab +If your local computer doesn't have a powerful GPU you could utilize Google Collab to train your model by following the [ACT training notebook](./notebooks#training-act). + +#### Upload policy checkpoints + +Once training is done, upload the latest checkpoint with: +```bash +huggingface-cli upload ${HF_USER}/il_sim_test \ + outputs/train/il_sim_test/checkpoints/last/pretrained_model +``` + +You can also upload intermediate checkpoints with: +```bash +CKPT=010000 +huggingface-cli upload ${HF_USER}/il_sim_test${CKPT} \ + outputs/train/il_sim_test/checkpoints/${CKPT}/pretrained_model +``` + +## Evaluate your policy in Sim + +To evaluate your policy we have to use the config file that can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/eval_config_gym_hil.json). + +Make sure to replace the `repo_id` with the dataset you trained on, for example `pepijn223/il_sim_dataset` and replace the `pretrained_policy_name_or_path` with your model id, for example `pepijn223/il_sim_model` + +Then you can run this command to visualize your trained policy + +
### Troubleshoot communication
diff --git a/lerobot/scripts/rl/eval_policy.py b/lerobot/scripts/rl/eval_policy.py
new file mode 100644
index 00000000..3762719b
--- /dev/null
+++ b/lerobot/scripts/rl/eval_policy.py
@@ -0,0 +1,74 @@
+# !/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+
+from lerobot.common.cameras import opencv # noqa: F401
+from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
+from lerobot.common.policies.factory import make_policy
+from lerobot.common.robots import ( # noqa: F401
+ RobotConfig,
+ make_robot_from_config,
+ so100_follower,
+)
+from lerobot.common.teleoperators import (
+ gamepad, # noqa: F401
+ so101_leader, # noqa: F401
+)
+from lerobot.configs import parser
+from lerobot.configs.train import TrainRLServerPipelineConfig
+from lerobot.scripts.rl.gym_manipulator import make_robot_env
+
+logging.basicConfig(level=logging.INFO)
+
+
+def eval_policy(env, policy, n_episodes):
+ sum_reward_episode = []
+ for _ in range(n_episodes):
+ obs, _ = env.reset()
+ episode_reward = 0.0
+ while True:
+ action = policy.select_action(obs)
+ obs, reward, terminated, truncated, _ = env.step(action)
+ episode_reward += reward
+ if terminated or truncated:
+ break
+ sum_reward_episode.append(episode_reward)
+
+ logging.info(f"Success after 20 steps {sum_reward_episode}")
+ logging.info(f"success rate {sum(sum_reward_episode) / len(sum_reward_episode)}")
+
+
+@parser.wrap()
+def main(cfg: TrainRLServerPipelineConfig):
+ env_cfg = cfg.env
+ env = make_robot_env(env_cfg)
+ dataset_cfg = cfg.dataset
+ dataset = LeRobotDataset(repo_id=dataset_cfg.repo_id)
+ dataset_meta = dataset.meta
+
+ policy = make_policy(
+ cfg=cfg.policy,
+ # env_cfg=cfg.env,
+ ds_meta=dataset_meta,
+ )
+ policy.from_pretrained(env_cfg.pretrained_policy_name_or_path)
+ policy.eval()
+
+ eval_policy(env, policy=policy, n_episodes=10)
+
+
+if __name__ == "__main__":
+ main()