From f98200297d40700f34170c29e14c631955526342 Mon Sep 17 00:00:00 2001 From: Remi Date: Tue, 20 Aug 2024 16:41:39 +0200 Subject: [PATCH 01/11] Slightly improve tutorial and README (#370) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> --- README.md | 5 ++- examples/7_get_started_with_real_robot.md | 52 +++++++---------------- 2 files changed, 18 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 9b847484..ff34e82c 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@

-

Hot News: New tutorial: Getting starting with Real-World Robots

+

Hot new tutorial: Getting started with real-world robots

@@ -31,11 +31,12 @@

We just dropped an in-depth tutorial on how to build your own robot!

Teach it new skills by showing it a few moves with just a laptop.

Then watch your homemade robot act autonomously 🤯

+

For more info, see [our thread on X](https://x.com/RemiCadene/status/1825455895561859185) or [our tutorial page](https://github.com/huggingface/lerobot/blob/main/examples/7_get_started_with_real_robot.md).

-

State-of-the-art Machine Learning for real-world robotics

+

State-of-the-art AI for real-world robotics

--- diff --git a/examples/7_get_started_with_real_robot.md b/examples/7_get_started_with_real_robot.md index 3a4b59bb..23765363 100644 --- a/examples/7_get_started_with_real_robot.md +++ b/examples/7_get_started_with_real_robot.md @@ -13,9 +13,9 @@ By following these steps, you'll be able to replicate tasks like picking up a Le Although this tutorial is general and can be easily adapted to various types of robots by changing the configuration, it is specifically based on the [Koch v1.1](https://github.com/jess-moss/koch-v1-1), an affordable robot. The Koch v1.1 consists of a leader arm and a follower arm, each with 6 motors. It can work with one or several cameras to record the scene, which serve as visual sensors for the robot. -During the data collection phase, you'll control the follower arm by moving the leader arm, a process known as "teleoperation." This technique is used to collect robot trajectories. Afterward, you'll train a neural network to imitate these trajectories and deploy the network to enable your robot to operate autonomously. +During the data collection phase, you will control the follower arm by moving the leader arm. This process is known as "teleoperation." This technique is used to collect robot trajectories. Afterward, you'll train a neural network to imitate these trajectories and deploy the network to enable your robot to operate autonomously. -If you encounter any issues at any step of the tutorial, feel free to seek help on [Discord](https://discord.com/invite/s3KuuzsPFb). +If you encounter any issues at any step of the tutorial, feel free to seek help on [Discord](https://discord.com/invite/s3KuuzsPFb) or don't hesitate to iterate with us on the tutorial by creating issues or pull requests. Thanks! ## 1. Order and Assemble your Koch v1.1 @@ -25,9 +25,9 @@ Follow the sourcing and assembling instructions provided on the [Koch v1.1 Githu Koch v1.1 leader and follower arms -For a visual walkthrough of the assembly process, you can refer to [this video tutorial](https://youtu.be/5mdxvMlxoos). +For a visual walkthrough of the assembly process, you can refer to [this video tutorial](https://youtu.be/8nQIg9BwwTk). -## 2. Configure motors, Calibrate arms, Teleoperate your Koch v1.1 +## 2. Configure motors, calibrate arms, teleoperate your Koch v1.1 First, install the additional dependencies required for Koch v1.1 by running one of the following commands. @@ -49,7 +49,7 @@ Finally, connect both arms to your computer via USB. Note that the USB doesn't p *Copy pasting python code* -In the upcoming sections, you'll learn about our classes and functions by running some python code, in an interactive session, or by copy-pasting it in a python file. If it's your first time using the tutorial, we highly recommend going through these steps to get deeper intuition about how things work. Once you're more familiar, you can streamline the process by directly running the teleoperate script (which is detailed further in the tutorial): +In the upcoming sections, you'll learn about our classes and functions by running some python code, in an interactive session, or by copy-pasting it in a python file. If this is your first time using the tutorial., we highly recommend going through these steps to get deeper intuition about how things work. Once you're more familiar, you can streamline the process by directly running the teleoperate script (which is detailed further in the tutorial): ```bash python lerobot/scripts/control_robot.py teleoperate \ --robot-path lerobot/configs/robot/koch.yaml \ @@ -57,7 +57,7 @@ python lerobot/scripts/control_robot.py teleoperate \ ``` It will automatically: -1. Detect and help you correct any motor configurations issue. +1. Detect and help you correct any motor configuration issues. 2. Identify any missing calibrations and initiate the calibration procedure. 3. Connect the robot and start teleoperation. @@ -67,7 +67,7 @@ You can use the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dy **Instantiate the DynamixelMotorsBus** -To begin, you'll need to create two instances of the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dynamixel.py), one for each arm, using their corresponding USB ports (e.g. `DynamixelMotorsBus(port="/dev/tty.usbmodem575E0031751"`). +To begin, create two instances of the [`DynamixelMotorsBus`](../lerobot/common/robot_devices/motors/dynamixel.py), one for each arm, using their corresponding USB ports (e.g. `DynamixelMotorsBus(port="/dev/tty.usbmodem575E0031751"`). To find the correct ports for each arm, run the utility script twice: ```bash @@ -144,7 +144,7 @@ follower_arm = DynamixelMotorsBus( *Updating the YAML Configuration File* -Then, update the port values in the YAML configuration file for the Koch robot at [`lerobot/configs/robot/koch.yaml`](../lerobot/configs/robot/koch.yaml) with the ports you've identified: +Next, update the port values in the YAML configuration file for the Koch robot at [`lerobot/configs/robot/koch.yaml`](../lerobot/configs/robot/koch.yaml) with the ports you've identified: ```yaml [...] leader_arms: @@ -192,7 +192,7 @@ When you connect the leader arm for the first time, you might see an output simi Read failed due to communication error on port /dev/tty.usbmodem575E0032081 for group_key ID_shoulder_pan_shoulder_lift_elbow_flex_wrist_flex_wrist_roll_gripper: [TxRxResult] There is no status packet! /!\ A configuration issue has been detected with your motors: -If it's the first time that you use these motors, press enter to configure your motors... but before verify that all the cables are connected the proper way. If you find an issue, before making a modification, kill the python process, unplug the power cord to not damage the motors, rewire correctly, then plug the power again and relaunch the script. +If this is the first time you are using these motors, press enter to configure your motors... but before verify that all the cables are connected the proper way. If you find an issue, before making a modification, kill the python process, unplug the power cord to not damage the motors, rewire correctly, then plug the power again and relaunch the script. Motor indices detected: {9600: [1]} @@ -327,37 +327,15 @@ When you connect your robot for the first time, the [`KochRobot`](../lerobot/com Here are the positions you'll move the follower arm to: -
-
- Koch v1.1 follower arm zero position -

1. Zero position

-
-
- Koch v1.1 follower arm rotated position -

2. Rotated position

-
-
- Koch v1.1 follower arm rest position -

3. Rest position

-
-
+| 1. Zero position | 2. Rotated position | 3. Rest position | +|---|---|---| +| Koch v1.1 follower arm zero position | Koch v1.1 follower arm rotated position | Koch v1.1 follower arm rest position | And here are the corresponding positions for the leader arm: -
-
- Koch v1.1 leader arm zero position -

1. Zero position

-
-
- Koch v1.1 leader arm rotated position -

2. Rotated position

-
-
- Koch v1.1 leader arm rest position -

3. Rest position

-
-
+| 1. Zero position | 2. Rotated position | 3. Rest position | +|---|---|---| +| Koch v1.1 leader arm zero position | Koch v1.1 leader arm rotated position | Koch v1.1 leader arm rest position | You can watch a [video tutorial of the calibration procedure](https://youtu.be/8drnU9uRY24) for more details. From 996468bcceafe6688a74d321550fced20f8ec83b Mon Sep 17 00:00:00 2001 From: Remi Date: Tue, 20 Aug 2024 16:45:57 +0200 Subject: [PATCH 02/11] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ff34e82c..409a77cf 100644 --- a/README.md +++ b/README.md @@ -31,12 +31,13 @@

We just dropped an in-depth tutorial on how to build your own robot!

Teach it new skills by showing it a few moves with just a laptop.

Then watch your homemade robot act autonomously 🤯

-

For more info, see [our thread on X](https://x.com/RemiCadene/status/1825455895561859185) or [our tutorial page](https://github.com/huggingface/lerobot/blob/main/examples/7_get_started_with_real_robot.md).

+

For more info, see our thread on X or our tutorial page.

+

-

State-of-the-art AI for real-world robotics

+

LeRobot: State-of-the-art AI for real-world robotics

--- From b5ad79a7d3b769af540475187cb6b2f23be755a8 Mon Sep 17 00:00:00 2001 From: ellacroix <57907121+ellacroix@users.noreply.github.com> Date: Wed, 21 Aug 2024 14:14:01 +0200 Subject: [PATCH 03/11] Fix typo in tutorial (#371) --- examples/7_get_started_with_real_robot.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/7_get_started_with_real_robot.md b/examples/7_get_started_with_real_robot.md index 23765363..f738ec29 100644 --- a/examples/7_get_started_with_real_robot.md +++ b/examples/7_get_started_with_real_robot.md @@ -752,7 +752,7 @@ Before trying `record`, if you want to push your dataset to the hub, make sure y ```bash huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential ``` -Also, store your Hugging Face repositery name in a variable (e.g. `cadene` or `lerobot`). For instance, run this to use your Hugging Face user name as repositery: +Also, store your Hugging Face repository name in a variable (e.g. `cadene` or `lerobot`). For instance, run this to use your Hugging Face user name as repository: ```bash HF_USER=$(huggingface-cli whoami | head -n 1) echo $HF_USER From a2592a5563d20fecf0a8239e4af848d2bccbe4cd Mon Sep 17 00:00:00 2001 From: Zhuoheng Li Date: Fri, 23 Aug 2024 18:00:35 +0800 Subject: [PATCH 04/11] Provide more information to the user (#358) Co-authored-by: Alexander Soare Co-authored-by: Remi --- README.md | 9 +++++++- examples/4_train_policy_with_script.md | 30 ++++++++++++++++++++++++++ lerobot/common/utils/utils.py | 17 ++++++++++++++- lerobot/configs/default.yaml | 2 +- lerobot/configs/env/aloha.yaml | 5 +++++ lerobot/configs/env/xarm.yaml | 5 +++++ lerobot/scripts/eval.py | 22 +++++++++---------- lerobot/scripts/train.py | 1 + 8 files changed, 76 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 409a77cf..703e6488 100644 --- a/README.md +++ b/README.md @@ -267,13 +267,20 @@ checkpoints │ └── training_state.pth # optimizer/scheduler/rng state and training step ``` +To resume training from a checkpoint, you can add these to the `train.py` python command: +```bash + hydra.run.dir=your/original/experiment/dir resume=true +``` + +It will load the pretrained model, optimizer and scheduler states for training. For more information please see our tutorial on training resumption [here](https://github.com/huggingface/lerobot/blob/main/examples/5_resume_training.md). + To use wandb for logging training and evaluation curves, make sure you've run `wandb login` as a one-time setup step. Then, when running the training command above, enable WandB in the configuration by adding: ```bash wandb.enable=true ``` -A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser: +A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser. Please also check [here](https://github.com/huggingface/lerobot/blob/main/examples/4_train_policy_with_script.md#typical-logs-and-metrics) for the explaination of some commonly used metrics in logs. ![](media/wandb.png) diff --git a/examples/4_train_policy_with_script.md b/examples/4_train_policy_with_script.md index db9840a7..05c865dc 100644 --- a/examples/4_train_policy_with_script.md +++ b/examples/4_train_policy_with_script.md @@ -170,6 +170,36 @@ python lerobot/scripts/train.py --config-dir outputs/train/my_experiment/checkpo Note that you may still use the regular syntax for config parameter overrides (eg: by adding `training.offline_steps=200000`). +## Typical logs and metrics + +When you start the training process, you will first see your full configuration being printed in the terminal. You can check it to make sure that you config it correctly and your config is not overrided by other files. The final configuration will also be saved with the checkpoint. + +After that, you will see training log like this one: + +``` +INFO 2024-08-14 13:35:12 ts/train.py:192 step:0 smpl:64 ep:1 epch:0.00 loss:1.112 grdn:15.387 lr:2.0e-07 updt_s:1.738 data_s:4.774 +``` + +or evaluation log like: + +``` +INFO 2024-08-14 13:38:45 ts/train.py:226 step:100 smpl:6K ep:52 epch:0.25 ∑rwrd:20.693 success:0.0% eval_s:120.266 +``` + +These logs will also be saved in wandb if `wandb.enable` is set to `true`. Here are the meaning of some abbreviations: + +- `smpl`: number of samples seen during training. +- `ep`: number of episodes seen during training. An episode contains multiple samples in a complete manipulation task. +- `epch`: number of time all unique samples are seen (epoch). +- `grdn`: gradient norm. +- `∑rwrd`: compute the sum of rewards in every evaluation episode and then take an average of them. +- `success`: average success rate of eval episodes. Reward and success are usually different except for the sparsing reward setting, where reward=1 only when the task is completed successfully. +- `eval_s`: time to evaluate the policy in the environment, in second. +- `updt_s`: time to update the network parameters, in second. +- `data_s`: time to load a batch of data, in second. + +Some metrics are useful for initial performance profiling. For example, if you find the current GPU utilization is low via the `nvidia-smi` command and `data_s` sometimes is too high, you may need to modify batch size or number of dataloading workers to accelerate dataloading. We also recommend [pytorch profiler](https://github.com/huggingface/lerobot?tab=readme-ov-file#improve-your-code-with-profiling) for detailed performance probing. + --- So far we've seen how to train Diffusion Policy for PushT and ACT for ALOHA. Now, what if we want to train ACT for PushT? Well, there are aspects of the ACT configuration that are specific to the ALOHA environments, and these happen to be incompatible with PushT. Therefore, trying to run the following will almost certainly raise an exception of sorts (eg: feature dimension mismatch): diff --git a/lerobot/common/utils/utils.py b/lerobot/common/utils/utils.py index ef5e8375..a7cb6374 100644 --- a/lerobot/common/utils/utils.py +++ b/lerobot/common/utils/utils.py @@ -14,6 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. import logging +import os import os.path as osp import random from contextlib import contextmanager @@ -27,6 +28,12 @@ import torch from omegaconf import DictConfig +def inside_slurm(): + """Check whether the python process was launched through slurm""" + # TODO(rcadene): return False for interactive mode `--pty bash` + return "SLURM_JOB_ID" in os.environ + + def get_safe_torch_device(cfg_device: str, log: bool = False) -> torch.device: """Given a string, return a torch.device with checks on whether the device is available.""" match cfg_device: @@ -158,7 +165,15 @@ def init_hydra_config(config_path: str, overrides: list[str] | None = None) -> D version_base="1.2", ) cfg = hydra.compose(Path(config_path).stem, overrides) - + if cfg.eval.batch_size > cfg.eval.n_episodes: + raise ValueError( + "The eval batch size is greater than the number of eval episodes " + f"({cfg.eval.batch_size} > {cfg.eval.n_episodes}). As a result, {cfg.eval.batch_size} " + f"eval environments will be instantiated, but only {cfg.eval.n_episodes} will be used. " + "This might significantly slow down evaluation. To fix this, you should update your command " + f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={cfg.eval.batch_size}`), " + f"or lower the batch size (e.g. `eval.batch_size={cfg.eval.n_episodes}`)." + ) return cfg diff --git a/lerobot/configs/default.yaml b/lerobot/configs/default.yaml index a3ff1d41..7945513a 100644 --- a/lerobot/configs/default.yaml +++ b/lerobot/configs/default.yaml @@ -120,7 +120,7 @@ eval: # `batch_size` specifies the number of environments to use in a gym.vector.VectorEnv. batch_size: 1 # `use_async_envs` specifies whether to use asynchronous environments (multiprocessing). - use_async_envs: false + use_async_envs: true wandb: enable: false diff --git a/lerobot/configs/env/aloha.yaml b/lerobot/configs/env/aloha.yaml index 296a4481..6ea3cced 100644 --- a/lerobot/configs/env/aloha.yaml +++ b/lerobot/configs/env/aloha.yaml @@ -2,6 +2,11 @@ fps: 50 +eval: + # `use_async_envs` specifies whether to use asynchronous environments (multiprocessing). + # set it to false to avoid some problems of the aloha env + use_async_envs: false + env: name: aloha task: AlohaInsertion-v0 diff --git a/lerobot/configs/env/xarm.yaml b/lerobot/configs/env/xarm.yaml index 4320379a..8e3d9c51 100644 --- a/lerobot/configs/env/xarm.yaml +++ b/lerobot/configs/env/xarm.yaml @@ -2,6 +2,11 @@ fps: 15 +eval: + # `use_async_envs` specifies whether to use asynchronous environments (multiprocessing). + # set it to false to avoid some problems of the aloha env + use_async_envs: false + env: name: xarm task: XarmLift-v0 diff --git a/lerobot/scripts/eval.py b/lerobot/scripts/eval.py index a07f3530..980c373c 100644 --- a/lerobot/scripts/eval.py +++ b/lerobot/scripts/eval.py @@ -70,7 +70,13 @@ from lerobot.common.policies.factory import make_policy from lerobot.common.policies.policy_protocol import Policy from lerobot.common.policies.utils import get_device_from_parameters from lerobot.common.utils.io_utils import write_video -from lerobot.common.utils.utils import get_safe_torch_device, init_hydra_config, init_logging, set_global_seed +from lerobot.common.utils.utils import ( + get_safe_torch_device, + init_hydra_config, + init_logging, + inside_slurm, + set_global_seed, +) def rollout( @@ -79,7 +85,6 @@ def rollout( seeds: list[int] | None = None, return_observations: bool = False, render_callback: Callable[[gym.vector.VectorEnv], None] | None = None, - enable_progbar: bool = False, ) -> dict: """Run a batched policy rollout once through a batch of environments. @@ -109,7 +114,6 @@ def rollout( are returned optionally because they typically take more memory to cache. Defaults to False. render_callback: Optional rendering callback to be used after the environments are reset, and after every step. - enable_progbar: Enable a progress bar over rollout steps. Returns: The dictionary described above. """ @@ -136,7 +140,7 @@ def rollout( progbar = trange( max_steps, desc=f"Running rollout with at most {max_steps} steps", - disable=not enable_progbar, + disable=inside_slurm(), # we dont want progress bar when we use slurm, since it clutters the logs leave=False, ) while not np.all(done): @@ -210,8 +214,6 @@ def eval_policy( videos_dir: Path | None = None, return_episode_data: bool = False, start_seed: int | None = None, - enable_progbar: bool = False, - enable_inner_progbar: bool = False, ) -> dict: """ Args: @@ -224,8 +226,6 @@ def eval_policy( the "episodes" key of the returned dictionary. start_seed: The first seed to use for the first individual rollout. For all subsequent rollouts the seed is incremented by 1. If not provided, the environments are not manually seeded. - enable_progbar: Enable progress bar over batches. - enable_inner_progbar: Enable progress bar over steps in each batch. Returns: Dictionary with metrics and data regarding the rollouts. """ @@ -266,7 +266,8 @@ def eval_policy( if return_episode_data: episode_data: dict | None = None - progbar = trange(n_batches, desc="Stepping through eval batches", disable=not enable_progbar) + # we dont want progress bar when we use slurm, since it clutters the logs + progbar = trange(n_batches, desc="Stepping through eval batches", disable=inside_slurm()) for batch_ix in progbar: # Cache frames for rendering videos. Each item will be (b, h, w, c), and the list indexes the rollout # step. @@ -285,7 +286,6 @@ def eval_policy( seeds=list(seeds) if seeds else None, return_observations=return_episode_data, render_callback=render_frame if max_episodes_rendered > 0 else None, - enable_progbar=enable_inner_progbar, ) # Figure out where in each rollout sequence the first done condition was encountered (results after @@ -487,8 +487,6 @@ def main( max_episodes_rendered=10, videos_dir=Path(out_dir) / "videos", start_seed=hydra_cfg.seed, - enable_progbar=True, - enable_inner_progbar=True, ) print(info["aggregated"]) diff --git a/lerobot/scripts/train.py b/lerobot/scripts/train.py index d8fdfc1f..2fa7ae80 100644 --- a/lerobot/scripts/train.py +++ b/lerobot/scripts/train.py @@ -241,6 +241,7 @@ def train(cfg: DictConfig, out_dir: str | None = None, job_name: str | None = No raise NotImplementedError() init_logging() + logging.info(pformat(OmegaConf.to_container(cfg))) if cfg.training.online_steps > 0 and isinstance(cfg.dataset_repo_id, ListConfig): raise NotImplementedError("Online training with LeRobotMultiDataset is not implemented.") From 9c7649f140fb437471f915423439ba7096da422b Mon Sep 17 00:00:00 2001 From: Alexander Soare Date: Fri, 23 Aug 2024 12:27:08 +0100 Subject: [PATCH 05/11] Make sure `init_hydra_config` does not require any keys (#376) --- lerobot/common/utils/utils.py | 9 --------- lerobot/scripts/eval.py | 10 ++++++++++ lerobot/scripts/train.py | 10 ++++++++++ tests/test_utils.py | 9 +++++++++ 4 files changed, 29 insertions(+), 9 deletions(-) diff --git a/lerobot/common/utils/utils.py b/lerobot/common/utils/utils.py index a7cb6374..1aa0bc2d 100644 --- a/lerobot/common/utils/utils.py +++ b/lerobot/common/utils/utils.py @@ -165,15 +165,6 @@ def init_hydra_config(config_path: str, overrides: list[str] | None = None) -> D version_base="1.2", ) cfg = hydra.compose(Path(config_path).stem, overrides) - if cfg.eval.batch_size > cfg.eval.n_episodes: - raise ValueError( - "The eval batch size is greater than the number of eval episodes " - f"({cfg.eval.batch_size} > {cfg.eval.n_episodes}). As a result, {cfg.eval.batch_size} " - f"eval environments will be instantiated, but only {cfg.eval.n_episodes} will be used. " - "This might significantly slow down evaluation. To fix this, you should update your command " - f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={cfg.eval.batch_size}`), " - f"or lower the batch size (e.g. `eval.batch_size={cfg.eval.n_episodes}`)." - ) return cfg diff --git a/lerobot/scripts/eval.py b/lerobot/scripts/eval.py index 980c373c..482af786 100644 --- a/lerobot/scripts/eval.py +++ b/lerobot/scripts/eval.py @@ -454,6 +454,16 @@ def main( else: hydra_cfg = init_hydra_config(hydra_cfg_path, config_overrides) + if hydra_cfg.eval.batch_size > hydra_cfg.eval.n_episodes: + raise ValueError( + "The eval batch size is greater than the number of eval episodes " + f"({hydra_cfg.eval.batch_size} > {hydra_cfg.eval.n_episodes}). As a result, {hydra_cfg.eval.batch_size} " + f"eval environments will be instantiated, but only {hydra_cfg.eval.n_episodes} will be used. " + "This might significantly slow down evaluation. To fix this, you should update your command " + f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={hydra_cfg.eval.batch_size}`), " + f"or lower the batch size (e.g. `eval.batch_size={hydra_cfg.eval.n_episodes}`)." + ) + if out_dir is None: out_dir = f"outputs/eval/{dt.now().strftime('%Y-%m-%d/%H-%M-%S')}_{hydra_cfg.env.name}_{hydra_cfg.policy.name}" diff --git a/lerobot/scripts/train.py b/lerobot/scripts/train.py index 2fa7ae80..45807503 100644 --- a/lerobot/scripts/train.py +++ b/lerobot/scripts/train.py @@ -288,6 +288,16 @@ def train(cfg: DictConfig, out_dir: str | None = None, job_name: str | None = No "you meant to resume training, please use `resume=true` in your command or yaml configuration." ) + if cfg.eval.batch_size > cfg.eval.n_episodes: + raise ValueError( + "The eval batch size is greater than the number of eval episodes " + f"({cfg.eval.batch_size} > {cfg.eval.n_episodes}). As a result, {cfg.eval.batch_size} " + f"eval environments will be instantiated, but only {cfg.eval.n_episodes} will be used. " + "This might significantly slow down evaluation. To fix this, you should update your command " + f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={cfg.eval.batch_size}`), " + f"or lower the batch size (e.g. `eval.batch_size={cfg.eval.n_episodes}`)." + ) + # log metrics to terminal and wandb logger = Logger(cfg, out_dir, wandb_job_name=job_name) diff --git a/tests/test_utils.py b/tests/test_utils.py index d4a8e34a..e5ba2267 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -1,5 +1,6 @@ import random from typing import Callable +from uuid import uuid4 import numpy as np import pytest @@ -13,6 +14,7 @@ from lerobot.common.datasets.utils import ( ) from lerobot.common.utils.utils import ( get_global_random_state, + init_hydra_config, seeded_context, set_global_random_state, set_global_seed, @@ -83,3 +85,10 @@ def test_reset_episode_index(): correct_episode_index = [0, 0, 1, 2, 2, 2] dataset = reset_episode_index(dataset) assert dataset["episode_index"] == correct_episode_index + + +def test_init_hydra_config_empty(): + test_file = f"/tmp/test_init_hydra_config_empty_{uuid4().hex}.yaml" + with open(test_file, "w") as f: + f.write("\n") + init_hydra_config(test_file) From 97086cdcdf46e2b2663447ba16de2c9b09ceef29 Mon Sep 17 00:00:00 2001 From: Alexander Soare Date: Mon, 26 Aug 2024 12:28:16 +0100 Subject: [PATCH 06/11] Make gripper_open_degree a config param (#379) --- lerobot/common/robot_devices/robots/koch.py | 17 +++++++++++------ lerobot/configs/robot/koch.yaml | 3 +++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/lerobot/common/robot_devices/robots/koch.py b/lerobot/common/robot_devices/robots/koch.py index f5966999..064ea629 100644 --- a/lerobot/common/robot_devices/robots/koch.py +++ b/lerobot/common/robot_devices/robots/koch.py @@ -26,7 +26,6 @@ URL_TEMPLATE = ( # In nominal degree range ]-180, +180[ ZERO_POSITION_DEGREE = 0 ROTATED_POSITION_DEGREE = 90 -GRIPPER_OPEN_DEGREE = 35.156 def assert_drive_mode(drive_mode): @@ -165,6 +164,11 @@ class KochRobotConfig: follower_arms: dict[str, MotorsBus] = field(default_factory=lambda: {}) cameras: dict[str, Camera] = field(default_factory=lambda: {}) + # Optionally set the leader arm in torque mode with the gripper motor set to this angle. This makes it + # possible to squeeze the gripper and have it spring back to an open position on its own. If None, the + # gripper is not put in torque mode. + gripper_open_degree: float | None = None + class KochRobot: # TODO(rcadene): Implement force feedback @@ -339,11 +343,12 @@ class KochRobot: print(f"Activating torque on {name} follower arm.") self.follower_arms[name].write("Torque_Enable", 1) - # Enable torque on the gripper of the leader arms, and move it to 45 degrees, - # so that we can use it as a trigger to close the gripper of the follower arms. - for name in self.leader_arms: - self.leader_arms[name].write("Torque_Enable", 1, "gripper") - self.leader_arms[name].write("Goal_Position", GRIPPER_OPEN_DEGREE, "gripper") + if self.config.gripper_open_degree is not None: + # Set the leader arm in torque mode with the gripper motor set to an angle. This makes it possible + # to squeeze the gripper and have it spring back to an open position on its own. + for name in self.leader_arms: + self.leader_arms[name].write("Torque_Enable", 1, "gripper") + self.leader_arms[name].write("Goal_Position", self.config.gripper_open_degree, "gripper") # Connect the cameras for name in self.cameras: diff --git a/lerobot/configs/robot/koch.yaml b/lerobot/configs/robot/koch.yaml index 224040ab..084b3624 100644 --- a/lerobot/configs/robot/koch.yaml +++ b/lerobot/configs/robot/koch.yaml @@ -37,3 +37,6 @@ cameras: fps: 30 width: 640 height: 480 +# Sets the leader arm in torque mode with the gripper motor set to this angle. This makes it possible +# to squeeze the gripper and have it spring back to an open position on its own. +gripper_open_degree: 35.156 From 9ce98bb93cacfde18c12e6fd5f1432bc8a0a3eff Mon Sep 17 00:00:00 2001 From: Alexander Soare Date: Mon, 26 Aug 2024 14:30:18 +0100 Subject: [PATCH 07/11] Add safety limits on relative action target (#373) --- lerobot/common/robot_devices/robots/koch.py | 93 ++++++++++++++++++--- lerobot/configs/robot/koch.yaml | 4 + 2 files changed, 85 insertions(+), 12 deletions(-) diff --git a/lerobot/common/robot_devices/robots/koch.py b/lerobot/common/robot_devices/robots/koch.py index 064ea629..cdd81250 100644 --- a/lerobot/common/robot_devices/robots/koch.py +++ b/lerobot/common/robot_devices/robots/koch.py @@ -1,7 +1,9 @@ +import logging import pickle import time from dataclasses import dataclass, field, replace from pathlib import Path +from typing import Sequence import numpy as np import torch @@ -164,11 +166,30 @@ class KochRobotConfig: follower_arms: dict[str, MotorsBus] = field(default_factory=lambda: {}) cameras: dict[str, Camera] = field(default_factory=lambda: {}) + # Optionally limit the magnitude of the relative positional target vector for safety purposes. + # Set this to a positive scalar to have the same value for all motors, or a list that is the same length + # as the number of motors in your follower arms (assumes all follower arms have the same number of + # motors). + max_relative_target: list[float] | float | None = None + # Optionally set the leader arm in torque mode with the gripper motor set to this angle. This makes it # possible to squeeze the gripper and have it spring back to an open position on its own. If None, the # gripper is not put in torque mode. gripper_open_degree: float | None = None + def __setattr__(self, prop: str, val): + if prop == "max_relative_target" and val is not None and isinstance(val, Sequence): + for name in self.follower_arms: + if len(self.follower_arms[name].motors) != len(val): + raise ValueError( + f"len(max_relative_target)={len(val)} but the follower arm with name {name} has " + f"{len(self.follower_arms[name].motors)} motors. Please make sure that the " + f"`max_relative_target` list has as many parameters as there are motors per arm. " + "Note: This feature does not yet work with robots where different follower arms have " + "different numbers of motors." + ) + super().__setattr__(prop, val) + class KochRobot: # TODO(rcadene): Implement force feedback @@ -210,7 +231,10 @@ class KochRobot: }, ), } - robot = KochRobot(leader_arms, follower_arms) + robot = KochRobot( + leader_arms=leader_arms, + follower_arms=follower_arms, + ) # Connect motors buses and cameras if any (Required) robot.connect() @@ -222,7 +246,10 @@ class KochRobot: Example of highest frequency data collection without camera: ```python # Assumes leader and follower arms have been instantiated already (see first example) - robot = KochRobot(leader_arms, follower_arms) + robot = KochRobot( + leader_arms=leader_arms, + follower_arms=follower_arms, + ) robot.connect() while True: observation, action = robot.teleop_step(record_data=True) @@ -240,7 +267,11 @@ class KochRobot: } # Assumes leader and follower arms have been instantiated already (see first example) - robot = KochRobot(leader_arms, follower_arms, cameras) + robot = KochRobot( + leader_arms=leader_arms, + follower_arms=follower_arms, + cameras=cameras, + ) robot.connect() while True: observation, action = robot.teleop_step(record_data=True) @@ -249,7 +280,11 @@ class KochRobot: Example of controlling the robot with a policy (without running multiple policies in parallel to ensure highest frequency): ```python # Assumes leader and follower arms + cameras have been instantiated already (see previous example) - robot = KochRobot(leader_arms, follower_arms, cameras) + robot = KochRobot( + leader_arms=leader_arms, + follower_arms=follower_arms, + cameras=cameras, + ) robot.connect() while True: # Uses the follower arms and cameras to capture an observation @@ -397,7 +432,7 @@ class KochRobot: # Send action for name in self.follower_arms: before_fwrite_t = time.perf_counter() - self.follower_arms[name].write("Goal_Position", follower_goal_pos[name]) + self.send_action(torch.tensor(follower_goal_pos[name]), [name]) self.logs[f"write_follower_{name}_goal_pos_dt_s"] = time.perf_counter() - before_fwrite_t # Early exit when recording data is not requested @@ -479,21 +514,55 @@ class KochRobot: obs_dict[f"observation.images.{name}"] = torch.from_numpy(images[name]) return obs_dict - def send_action(self, action: torch.Tensor): - """The provided action is expected to be a vector.""" + def send_action(self, action: torch.Tensor, follower_names: list[str] | None = None): + """Command the follower arms to move to a target joint configuration. + + The relative action magnitude may be clipped depending on the configuration parameter + `max_relative_target`. + + Args: + action: tensor containing the concatenated joint positions for the follower arms. + follower_names: Pass follower arm names to only control a subset of all the follower arms. + """ if not self.is_connected: raise RobotDeviceNotConnectedError( "KochRobot is not connected. You need to run `robot.connect()`." ) + if follower_names is None: + follower_names = list(self.follower_arms) + elif not set(follower_names).issubset(self.follower_arms): + raise ValueError( + f"You provided {follower_names=} but only the following arms are registered: " + f"{list(self.follower_arms)}" + ) + from_idx = 0 to_idx = 0 follower_goal_pos = {} - for name in self.follower_arms: - if name in self.follower_arms: - to_idx += len(self.follower_arms[name].motor_names) - follower_goal_pos[name] = action[from_idx:to_idx].numpy() - from_idx = to_idx + for name in follower_names: + to_idx += len(self.follower_arms[name].motor_names) + this_action = action[from_idx:to_idx] + + if self.config.max_relative_target is not None: + if not isinstance(self.config.max_relative_target, list): + max_relative_target = [self.config.max_relative_target for _ in range(from_idx, to_idx)] + max_relative_target = torch.tensor(self.config.max_relative_target) + # Cap relative action target magnitude for safety. + current_pos = torch.tensor(self.follower_arms[name].read("Present_Position")) + diff = this_action - current_pos + safe_diff = torch.minimum(diff, max_relative_target) + safe_diff = torch.maximum(safe_diff, -max_relative_target) + safe_action = current_pos + safe_diff + if not torch.allclose(safe_action, action): + logging.warning( + "Relative action magnitude had to be clamped to be safe.\n" + f" requested relative action target: {diff}\n" + f" clamped relative action target: {safe_diff}" + ) + + follower_goal_pos[name] = safe_action.numpy() + from_idx = to_idx for name in self.follower_arms: self.follower_arms[name].write("Goal_Position", follower_goal_pos[name].astype(np.int32)) diff --git a/lerobot/configs/robot/koch.yaml b/lerobot/configs/robot/koch.yaml index 084b3624..d40d5ff3 100644 --- a/lerobot/configs/robot/koch.yaml +++ b/lerobot/configs/robot/koch.yaml @@ -37,6 +37,10 @@ cameras: fps: 30 width: 640 height: 480 +# `max_relative_target` limits the magnitude of the relative positional target vector for safety purposes. +# Set this to a positive scalar to have the same value for all motors, or a list that is the same length as +# the number of motors in your follower arms. +max_relative_target: null # Sets the leader arm in torque mode with the gripper motor set to this angle. This makes it possible # to squeeze the gripper and have it spring back to an open position on its own. gripper_open_degree: 35.156 From aad59e6b6b44abe732c9316d2cd822f7bea385c4 Mon Sep 17 00:00:00 2001 From: Mishig Date: Mon, 26 Aug 2024 17:38:48 +0200 Subject: [PATCH 08/11] Fix videos in visualize_dataset are not in sync (#382) --- .../templates/visualize_dataset_template.html | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/lerobot/templates/visualize_dataset_template.html b/lerobot/templates/visualize_dataset_template.html index 16ca0fa3..e28473c1 100644 --- a/lerobot/templates/visualize_dataset_template.html +++ b/lerobot/templates/visualize_dataset_template.html @@ -75,7 +75,7 @@ {% for video_info in videos_info %}

{{ video_info.filename }}

-
+ + {% if videos_info[0].language_instruction %} +

+ Language Instruction: {{ videos_info[0].language_instruction }} +

+ {% endif %} +