# Getting Started with Real-World Robots This tutorial will explain you how to train a neural network to autonomously control a real robot. **You'll learn:** 1. How to record and visualize your dataset. 2. How to train a policy using your data and prepare it for evaluation. 3. How to evaluate your policy and visualize the results. By following these steps, you'll be able to replicate tasks like picking up a Lego block and placing it in a bin with a high success rate, as demonstrated in [this video](https://x.com/RemiCadene/status/1814680760592572934). This tutorial is not specifically made for one robot. We show and explain the commands and API examples, which can be used and easily modified for all supported robots. During the data collection phase, you will control an robot with an "telop" device (leader arm, keyboard). This process is known as "teleoperation." This technique is used to collect robot trajectories. After you collected trajectories, you'll train a neural network to imitate these trajectories and deploy the network to enable your robot to operate autonomously. If you encounter any issues at any step of the tutorial, feel free to seek help on [Discord](https://discord.com/invite/s3KuuzsPFb). ## Setup and Calibrate If you haven't yet setup and calibrated your robot and teleop device please so so by following the robot specific tutorial. ## Teleoperate To teleoperate an robot with an leader arm, in this example we use the SO101, run the command below, for each command we also specify a API example that does the same thing. ```bash python -m lerobot.teleoperate \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.cameras="{}" \ --robot.id=my_red_robot_arm \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_blue_leader_arm ``` ```python from lerobot.common.teleoperators.so101_leader import SO101LeaderConfig, SO101Leader from lerobot.common.robots.so101_follower import SO101FollowerConfig, SO101Follower robot_config = SO101FollowerConfig( port="/dev/tty.usbmodem58760431541", id="my_red_robot_arm", ) teleop_config = SO101LeaderConfig( port="/dev/tty.usbmodem58760431551", id="my_blue_leader_arm", ) robot = SO101Follower(robot_config) teleop_device = SO101Leader(teleop_config) robot.connect() teleop_device.connect() while True: action = teleop_device.get_action() robot.send_action(action) ``` The teleoperate command will automatically: 1. Identify any missing calibrations and initiate the calibration procedure. 2. Connect the robot and teleop device and start teleoperation. ## Cameras To add cameras to your setup follow this [Guide](./cameras#setup-cameras). ## Teleoperate with cameras We can now teleoperate again while at the same time visualizing the cameras and joint positions with `rerun`. In this example we use the Koch arm. ```bash python -m lerobot.teleoperate \ --robot.type=koch_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.id=my_koch_robot \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ --teleop.type=koch_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_koch_teleop \ --display_data=true ``` ```python from lerobot.common.cameras.opencv.configuration_opencv import OpenCVCameraConfig from lerobot.common.teleoperators.koch_leader import KochLeaderConfig, KochLeader from lerobot.common.robots.koch_follower import KochFollowerConfig, KochFollower camera_config = { "front": OpenCVCameraConfig(index_or_path=0, width=1920, height=1080, fps=30) } robot_config = KochFollowerConfig( port="/dev/tty.usbmodem585A0076841", id="my_red_robot_arm", cameras=camera_config ) teleop_config = KochLeaderConfig( port="/dev/tty.usbmodem58760431551", id="my_blue_leader_arm", ) robot = KochFollower(robot_config) teleop_device = KochLeader(teleop_config) robot.connect() teleop_device.connect() while True: observation = robot.get_observation() action = teleop_device.get_action() robot.send_action(action) ``` ## Teleoperate LeKiwi > [!TIP] > If you're using a Mac, you might need to give Terminal permission to access your keyboard. Go to System Preferences > Security & Privacy > Input Monitoring and check the box for Terminal. TODO(pepijn): modify these commands with new API, and explain you can also map the arms to a keyboard with the new API now To teleoperate SSH into your Raspberry Pi, and run `conda activate lerobot` and this script: ```bash python -m lerobot.common.robots.lekiwi.lekiwi_host ``` Then on your laptop, also run `conda activate lerobot` and this script: ```bash python -m lerobot.teleoperate \ --robot.type=lekiwi \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.cameras="{}" \ --robot.id=my_lekiwi \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_blue_leader_arm ``` > **NOTE:** To visualize the data, enable `--control.display_data=true`. This streams the data using `rerun`. For the `--control.type=remote_robot` you will also need to set `--control.viewer_ip` and `--control.viewer_port` You should see on your laptop something like this: ```[INFO] Connected to remote robot at tcp://172.17.133.91:5555 and video stream at tcp://172.17.133.91:5556.``` Now you can move the leader arm and use the keyboard (w,a,s,d) to drive forward, left, backwards, right. And use (z,x) to turn left or turn right. You can use (r,f) to increase and decrease the speed of the mobile robot. There are three speed modes, see the table below: | Speed Mode | Linear Speed (m/s) | Rotation Speed (deg/s) | | ---------- | ------------------ | ---------------------- | | Fast | 0.4 | 90 | | Medium | 0.25 | 60 | | Slow | 0.1 | 30 | | Key | Action | | --- | -------------- | | W | Move forward | | A | Move left | | S | Move backward | | D | Move right | | Z | Turn left | | X | Turn right | | R | Increase speed | | F | Decrease speed | > [!TIP] > If you use a different keyboard you can change the keys for each command in the [`LeKiwiConfig`](../lerobot/common/robot_devices/robots/configs.py). ### Wired version If you have the **wired** LeKiwi version please run all commands including both these teleoperation commands on your laptop. ## Record a dataset Once you're familiar with teleoperation, you can record your first dataset. We use the Hugging Face hub features for uploading your dataset. If you haven't previously used the Hub, make sure you can login via the cli using a write-access token, this token can be generated from the [Hugging Face settings](https://huggingface.co/settings/tokens). Add your token to the cli by running this command: ```bash huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential ``` Then store your Hugging Face repository name in a variable: ```bash HF_USER=$(huggingface-cli whoami | head -n 1) echo $HF_USER ``` Now you can record a dataset, to record 2 episodes and upload your dataset to the hub execute this command tailored to SO101: ```bash python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem585A0076841 \ --robot.id=my_red_robot_arm \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_blue_leader_arm \ --display_data=true \ --dataset.repo_id=aliberts/record-test \ --dataset.num_episodes=2 \ --dataset.single_task="Grab the black cube" ``` You will see a lot of lines appearing like this one: ``` INFO 2024-08-10 15:02:58 ol_robot.py:219 dt:33.34 (30.0hz) dtRlead: 5.06 (197.5hz) dtWfoll: 0.25 (3963.7hz) dtRfoll: 6.22 (160.7hz) dtRlaptop: 32.57 (30.7hz) dtRphone: 33.84 (29.5hz) ``` | Field | Meaning | |:---|:---| | `2024-08-10 15:02:58` | Timestamp when `print` was called. | | `ol_robot.py:219` | Source file and line number of the `print` call (`lerobot/scripts/control_robot.py` at line `219`). | | `dt: 33.34 (30.0 Hz)` | Delta time (ms) between teleop steps (target: 30.0 Hz, `--fps 30`). Yellow if step is too slow. | | `dtRlead: 5.06 (197.5 Hz)` | Delta time (ms) for reading present position from the **leader arm**. | | `dtWfoll: 0.25 (3963.7 Hz)` | Delta time (ms) for writing goal position to the **follower arm** (asynchronous). | | `dtRfoll: 6.22 (160.7 Hz)` | Delta time (ms) for reading present position from the **follower arm**. | | `dtRlaptop: 32.57 (30.7 Hz)` | Delta time (ms) for capturing an image from the **laptop camera** (async thread). | | `dtRphone: 33.84 (29.5 Hz)` | Delta time (ms) for capturing an image from the **phone camera** (async thread). | #### Dataset upload Locally your dataset is stored in this folder: `~/.cache/huggingface/lerobot/{repo-id}` (e.g. `data/cadene/so101_test`). At the end of data recording, your dataset will be uploaded on your Hugging Face page (e.g. https://huggingface.co/datasets/cadene/so101_test) that you can obtain by running: ```bash echo https://huggingface.co/datasets/${HF_USER}/so101_test ``` Your dataset will be automatically tagged with `LeRobot` for the community to find it easily, and you can also add custom tags (in this case `tutorial` for example). You can look for other LeRobot datasets on the hub by searching for `LeRobot` [tags](https://huggingface.co/datasets?other=LeRobot). #### Record function The `record` function provides a suite of tools for capturing and managing data during robot operation: ##### 1. Data Storage - Data is stored using the `LeRobotDataset` format and is stored on disk during recording. - By default, the dataset is pushed to your Hugging Face page after recording. - To disable uploading, use `--dataset.push_to_hub=False`. ##### 2. Checkpointing and Resuming - Checkpoints are automatically created during recording. - If an issue occurs, you can resume by re-running the same command with `--control.resume=true`. - To start recording from scratch, **manually delete** the dataset directory. ##### 3. Recording Parameters Set the flow of data recording using command-line arguments: - `--warmup_time_s=10` Number of seconds before starting data collection (default: **10 seconds**). Allows devices to warm up and synchronize. - `--dataset.episode_time_s=60` Duration of each data recording episode (default: **60 seconds**). - `--dataset.reset_time_s=60` Duration for resetting the environment after each episode (default: **60 seconds**). - `--dataset.num_episodes=50` Total number of episodes to record (default: **50**). ##### 4. Keyboard Controls During Recording Control the data recording flow using keyboard shortcuts: - Press **Right Arrow (`→`)**: Early stop the current episode or reset time and move to the next. - Press **Left Arrow (`←`)**: Cancel the current episode and re-record it. - Press **Escape (`ESC`)**: Immediately stop the session, encode videos, and upload the dataset. #### Tips for gathering data Once you're comfortable with data recording, you can create a larger dataset for training. A good starting task is grasping an object at different locations and placing it in a bin. We suggest recording at least 50 episodes, with 10 episodes per location. Keep the cameras fixed and maintain consistent grasping behavior throughout the recordings. Also make sure the object you are manipulating is visible on the camera's. A good rule of thumb is you should be able to do the task yourself by only looking at the camera images. In the following sections, you’ll train your neural network. After achieving reliable grasping performance, you can start introducing more variations during data collection, such as additional grasp locations, different grasping techniques, and altering camera positions. Avoid adding too much variation too quickly, as it may hinder your results. If you want to dive deeper into this important topic, you can check out the [blog post](https://huggingface.co/blog/lerobot-datasets#what-makes-a-good-dataset) we wrote on what makes a good dataset. #### Troubleshooting: - On Linux, if the left and right arrow keys and escape key don't have any effect during data recording, make sure you've set the `$DISPLAY` environment variable. See [pynput limitations](https://pynput.readthedocs.io/en/latest/limitations.html#linux). ## Visualize a dataset If you uploaded your dataset to the hub with `--control.push_to_hub=true`, you can [visualize your dataset online](https://huggingface.co/spaces/lerobot/visualize_dataset) by copy pasting your repo id given by: ```bash echo ${HF_USER}/so101_test ``` If you didn't upload with `--control.push_to_hub=false`, you can visualize it locally with (via a window in the browser `http://127.0.0.1:9090` with the visualization tool): ```bash python lerobot/scripts/visualize_dataset_html.py \ --repo-id ${HF_USER}/so101_test \ --local-files-only 1 ``` This will launch a local web server that looks like this:
Koch v1.1 leader and follower arms
## Replay an episode A useful feature is the `replay` function, which allows to replay on your robot any episode that you've recorded or episodes from any dataset out there. This function helps you test the repeatability of your robot's actions and assess transferability across robots of the same model. You can replay the first episode on your robot with: ```bash python -m lerobot.replay \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.id=black \ --dataset.repo_id=aliberts/record-test \ --dataset.episode=2 ``` Your robot should replicate movements similar to those you recorded. For example, check out [this video](https://x.com/RemiCadene/status/1793654950905680090) where we use `replay` on a Aloha robot from [Trossen Robotics](https://www.trossenrobotics.com). ## Train a policy To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: ```bash python lerobot/scripts/train.py \ --dataset.repo_id=${HF_USER}/so101_test \ --policy.type=act \ --output_dir=outputs/train/act_so101_test \ --job_name=act_so101_test \ --policy.device=cuda \ --wandb.enable=true ``` Let's explain the command: 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/so101_test`. 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](../lerobot/common/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor states, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset. 4. We provided `policy.device=cuda` since we are training on a Nvidia GPU, but you could use `policy.device=mps` to train on Apple silicon. 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. Training should take several hours. You will find checkpoints in `outputs/train/act_so101_test/checkpoints`. To resume training from a checkpoint, below is an example command to resume from `last` checkpoint of the `act_so101_test` policy: ```bash python lerobot/scripts/train.py \ --config_path=outputs/train/act_so101_test/checkpoints/last/pretrained_model/train_config.json \ --resume=true ``` #### Upload policy checkpoints Once training is done, upload the latest checkpoint with: ```bash huggingface-cli upload ${HF_USER}/act_so101_test \ outputs/train/act_so101_test/checkpoints/last/pretrained_model ``` You can also upload intermediate checkpoints with: ```bash CKPT=010000 huggingface-cli upload ${HF_USER}/act_so101_test${CKPT} \ outputs/train/act_so101_test/checkpoints/${CKPT}/pretrained_model ``` ## Evaluate your policy TODO(pepijn): modify this command further You can use the `record` script from [`lerobot/record.py`](https://github.com/huggingface/lerobot/blob/main/lerobot/record.py) but with a policy checkpoint as input. For instance, run this command to record 10 evaluation episodes: ```bash python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem585A0076841 \ --robot.id=my_red_robot_arm \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_blue_leader_arm \ --display_data=true \ --dataset.repo_id=aliberts/eval_act \ --policy.checkpoint_path=outputs/train/act_so101_test/checkpoints/last/pretrained_model \ ``` As you can see, it's almost the same command as previously used to record your training dataset. Two things changed: 1. There is an additional `--control.policy.path` argument which indicates the path to your policy checkpoint with (e.g. `outputs/train/eval_act_so101_test/checkpoints/last/pretrained_model`). You can also use the model repository if you uploaded a model checkpoint to the hub (e.g. `${HF_USER}/act_so101_test`). 2. The name of dataset begins by `eval` to reflect that you are running inference (e.g. `${HF_USER}/eval_act_so101_test`).