Release cleanup (#132)

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Cadene <re.cadene@gmail.com>
2024-05-06 03:03:14 +02:00
parent 6eaffbef1d
commit f5e76393eb
19 changed files with 312 additions and 237 deletions
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,11 +1,15 @@
-# What does this PR do?
+## What this does
+Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

 Examples:
- Fixes # (issue)
- Adds new dataset
- Optimizes something
+|  Title               | Label           |
+|----------------------|-----------------|
+| Fixes #[issue]       | (🐛 Bug)        |
+| Adds new dataset     | (🗃️ Dataset)    |
+| Optimizes something  | (⚡️ Performance) |

-## How was it tested?
+## How it was tested
+Explain/show how you tested your changes.

 Examples:
 - Added `test_something` in `tests/test_stuff.py`.
@@ -13,6 +17,7 @@ Examples:
 - Optimized `some_function`, it now runs X times faster than previously.

 ## How to checkout & try? (for the reviewer)
+Provide a simple way for the reviewer to try out your changes.

 Examples:
 ```bash
@@ -22,11 +27,8 @@ DATA_DIR=tests/data pytest -sx tests/test_stuff.py::test_something
 python lerobot/scripts/train.py --some.option=true
 ```

-## Before submitting
-Please read the [contributor guideline](https://github.com/huggingface/lerobot/blob/main/CONTRIBUTING.md#submitting-a-pull-request-pr).
-
-
-## Who can review?
-
-Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
+## SECTION TO REMOVE BEFORE SUBMITTING YOUR PR
+**Note**: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
 members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.
+
+**Note**: Before submitting this PR, please read the [contributor guideline](https://github.com/huggingface/lerobot/blob/main/CONTRIBUTING.md#submitting-a-pull-request-pr).
--- a/README.md
+++ b/README.md
@@ -29,15 +29,15 @@
 ---


-🤗 LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier for entry to robotics so that everyone can contribute and benefit from sharing datasets and pretrained models.
+🤗 LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry to robotics so that everyone can contribute and benefit from sharing datasets and pretrained models.

 🤗 LeRobot contains state-of-the-art approaches that have been shown to transfer to the real-world with a focus on imitation learning and reinforcement learning.

-🤗 LeRobot already provides a set of pretrained models, datasets with human collected demonstrations, and simulated environments so that everyone can get started. In the coming weeks, the plan is to add more and more support for real-world robotics on the most affordable and capable robots out there.
+🤗 LeRobot already provides a set of pretrained models, datasets with human collected demonstrations, and simulation environments to get started without assembling a robot. In the coming weeks, the plan is to add more and more support for real-world robotics on the most affordable and capable robots out there.

-🤗 LeRobot hosts pretrained models and datasets on this HuggingFace community page: [huggingface.co/lerobot](https://huggingface.co/lerobot)
+🤗 LeRobot hosts pretrained models and datasets on this Hugging Face community page: [huggingface.co/lerobot](https://huggingface.co/lerobot)

-#### Examples of pretrained models and environments
+#### Examples of pretrained models on simulation environments

 <table>
  <tr>
@@ -54,10 +54,12 @@

 ### Acknowledgment

- ACT policy and ALOHA environment are adapted from [ALOHA](https://tonyzhaozh.github.io/aloha/)
- Diffusion policy and Pusht environment are adapted from [Diffusion Policy](https://diffusion-policy.cs.columbia.edu/)
- TDMPC policy and Simxarm environment are adapted from [FOWM](https://www.yunhaifeng.com/FOWM/)
- Abstractions and utilities for Reinforcement Learning come from [TorchRL](https://github.com/pytorch/rl)
+- Thanks to Tony Zaho, Zipeng Fu and colleagues for open sourcing ACT policy, ALOHA environments and datasets. Ours are adapted from [ALOHA](https://tonyzhaozh.github.io/aloha) and [Mobile ALOHA](https://mobile-aloha.github.io).
+- Thanks to Cheng Chi, Zhenjia Xu and colleagues for open sourcing Diffusion policy, Pusht environment and datasets, as well as UMI datasets. Ours are adapted from [Diffusion Policy](https://diffusion-policy.cs.columbia.edu) and [UMI Gripper](https://umi-gripper.github.io).
+- Thanks to Nicklas Hansen, Yunhai Feng and colleagues for open sourcing TDMPC policy, Simxarm environments and datasets. Ours are adapted from [TDMPC](https://github.com/nicklashansen/tdmpc) and [FOWM](https://www.yunhaifeng.com/FOWM).
+- Thanks to Vincent Moens and colleagues for open sourcing [TorchRL](https://github.com/pytorch/rl). It allowed for quick experimentations on the design of `LeRobot`.
+- Thanks to Antonio Loquercio and Ashish Kumar for their early support.
+

 ## Installation

@@ -86,7 +88,7 @@ For instance, to install 🤗 LeRobot with aloha and pusht, use:
 pip install ".[aloha, pusht]"
 ```

-To use [Weights and Biases](https://docs.wandb.ai/quickstart) for experiments tracking, log in with
+To use [Weights and Biases](https://docs.wandb.ai/quickstart) for experiment tracking, log in with
 ```bash
 wandb login
 ```
@@ -95,6 +97,7 @@ wandb login

 ```
 .
+├── examples             # contains demonstration examples, start here to learn about LeRobot
 ├── lerobot
 |   ├── configs          # contains hydra yaml files with all options that you can override in the command line
 |   |   ├── default.yaml   # selected by default, it loads pusht environment and diffusion policy
@@ -103,69 +106,84 @@ wandb login
 |   ├── common           # contains classes and utilities
 |   |   ├── datasets       # various datasets of human demonstrations: aloha, pusht, xarm
 |   |   ├── envs           # various sim environments: aloha, pusht, xarm
-|   |   └── policies       # various policies: act, diffusion, tdmpc
-|   └── scripts                  # contains functions to execute via command line
-|       ├── visualize_dataset.py  # load a dataset and render its demonstrations
-|       ├── eval.py               # load policy and evaluate it on an environment
-|       └── train.py              # train a policy via imitation learning and/or reinforcement learning
+|   |   ├── policies       # various policies: act, diffusion, tdmpc
+|   |   └── utils          # various utilities
+|   └── scripts          # contains functions to execute via command line
+|       ├── eval.py                 # load policy and evaluate it on an environment
+|       ├── train.py                # train a policy via imitation learning and/or reinforcement learning
+|       ├── push_dataset_to_hub.py  # convert your dataset into LeRobot dataset format and upload it to the Hugging Face hub
+|       └── visualize_dataset.py    # load a dataset and render its demonstrations
 ├── outputs               # contains results of scripts execution: logs, videos, model checkpoints
-├── .github
-|   └── workflows
-|       └── test.yml      # defines install settings for continuous integration and specifies end-to-end tests
 └── tests                 # contains pytest utilities for continuous integration
-
 ```

 ### Visualize datasets

-Check out [examples](./examples) to see how you can import our dataset class, download the data from the HuggingFace hub and use our rendering utilities.
+Check out [example 1](./examples/1_load_lerobot_dataset.py) that illustrates how to use our dataset class which automatically download data from the Hugging Face hub.

-Or you can achieve the same result by executing our script from the command line:
+You can also locally visualize episodes from a dataset by executing our script from the command line:
 ```bash
 python lerobot/scripts/visualize_dataset.py \
-env=pusht \
-hydra.run.dir=outputs/visualize_dataset/example
-# >>> ['outputs/visualize_dataset/example/episode_0.mp4']
+    --repo-id lerobot/pusht \
+    --episode-index 0
 ```

+It will open `rerun.io` and display the camera streams, robot states and actions, like this:
+
+https://github-production-user-asset-6210df.s3.amazonaws.com/4681518/328035972-fd46b787-b532-47e2-bb6f-fd536a55a7ed.mov?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240505%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240505T172924Z&X-Amz-Expires=300&X-Amz-Signature=d680b26c532eeaf80740f08af3320d22ad0b8a4e4da1bcc4f33142c15b509eda&X-Amz-SignedHeaders=host&actor_id=24889239&key_id=0&repo_id=748713144
+
+
+Our script can also visualize datasets stored on a distant server. See `python lerobot/scripts/visualize_dataset.py --help` for more instructions.
+
 ### Evaluate a pretrained policy

-Check out [examples](./examples) to see how you can load a pretrained policy from HuggingFace hub, load up the corresponding environment and model, and run an evaluation.
+Check out [example 2](./examples/2_evaluate_pretrained_policy.py) that illustrates how to download a pretrained policy from Hugging Face hub, and run an evaluation on its corresponding environment.

-Or you can achieve the same result by executing our script from the command line:
+We also provide a more capable script to parallelize the evaluation over multiple environments during the same rollout. Here is an example with a pretrained model hosted on [lerobot/diffusion_pusht](https://huggingface.co/lerobot/diffusion_pusht):
 ```bash
 python lerobot/scripts/eval.py \
-p lerobot/diffusion_pusht \
-eval_episodes=10 \
-hydra.run.dir=outputs/eval/example_hub
+    -p lerobot/diffusion_pusht \
+    eval.n_episodes=10 \
+    eval.batch_size=10
 ```

-After training your own policy, you can also re-evaluate the checkpoints with:
-
+Note: After training your own policy, you can re-evaluate the checkpoints with:
 ```bash
 python lerobot/scripts/eval.py \
-p PATH/TO/TRAIN/OUTPUT/FOLDER \
-eval_episodes=10 \
-hydra.run.dir=outputs/eval/example_dir
+    -p PATH/TO/TRAIN/OUTPUT/FOLDER
 ```

 See `python lerobot/scripts/eval.py --help` for more instructions.

 ### Train your own policy

-Check out [examples](./examples) to see how you can start training a model on a dataset, which will be automatically downloaded if needed.
+Check out [example 3](./examples/3_train_policy.py) that illustrates how to start training a model.

-In general, you can use our training script to easily train any policy on any environment:
+In general, you can use our training script to easily train any policy. To use wandb for logging training and evaluation curves, make sure you ran `wandb login`. Here is an example of training the ACT policy on trajectories collected by humans on the Aloha simulation environment for the insertion task:
 ```bash
 python lerobot/scripts/train.py \
-env=aloha \
-task=sim_insertion \
-repo_id=lerobot/aloha_sim_insertion_scripted \
-policy=act \
-hydra.run.dir=outputs/train/aloha_act
+    policy=act \
+    env=aloha \
+    env.task=AlohaInsertion-v0 \
+    dataset_repo_id=lerobot/aloha_sim_insertion_human
 ```

-After training, you may want to revisit model evaluation to change the evaluation settings. In fact, during training every checkpoint is already evaluated but on a low number of episodes for efficiency. Check out [example](./examples) to evaluate any model checkpoint on more episodes to increase statistical significance.
+The experiment directory is automatically generated and will show up in yellow in your terminal. It looks like `outputs/train/2024-05-05/20-21-12_aloha_act_default`. You can manually specify an experiment directory by adding this argument to the `train.py` python command:
+```bash
+    hydra.run.dir=your/new/experiment/dir
+```
+
+A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of logs from wandb:
+![](media/wandb.png)
+
+You can deactivate wandb by adding these arguments to the `train.py` python command:
+```bash
+    wandb.disable_artifact=true \
+    wandb.enable=false
+```
+
+Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. After training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See `python lerobot/scripts/eval.py --help` for more instructions.
+

 ## Contribute

@@ -173,98 +191,40 @@ If you would like to contribute to 🤗 LeRobot, please check out our [contribut

 ### Add a new dataset

-```python
-# TODO(rcadene, AdilZouitine): rewrite this section
-```
-
-To add a dataset to the hub, first login and use a token generated from [huggingface settings](https://huggingface.co/settings/tokens) with write access:
+To add a dataset to the hub, you need to login using a write-access token, which can be generated from the [Hugging Face settings](https://huggingface.co/settings/tokens):
 ```bash
 huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
 ```

-Then you can upload it to the hub with:
+Then move your dataset folder in `data` directory (e.g. `data/aloha_ping_pong`), and push your dataset to the hub with:
 ```bash
-HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli upload $HF_USER/$DATASET data/$DATASET \
--repo-type dataset  \
--revision v1.0
+python lerobot/scripts/push_dataset_to_hub.py \
+--data-dir data \
+--dataset-id aloha_ping_ping \
+--raw-format aloha_hdf5 \
+--community-id lerobot
 ```

-You will need to set the corresponding version as a default argument in your dataset class:
-```python
-  version: str | None = "v1.1",
-```
-See: [`lerobot/common/datasets/pusht.py`](https://github.com/Cadene/lerobot/blob/main/lerobot/common/datasets/pusht.py)
+See `python lerobot/scripts/push_dataset_to_hub.py --help` for more instructions.

-For instance, for [lerobot/pusht](https://huggingface.co/datasets/lerobot/pusht), we used:
-```bash
-HF_USER=lerobot
-DATASET=pusht
-```
+If your dataset format is not supported, implement your own in `lerobot/common/datasets/push_dataset_to_hub/${raw_format}_format.py` by copying examples like [pusht_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py), [umi_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/umi_zarr_format.py), [aloha_hdf5](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py), or [xarm_pkl](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/xarm_pkl_format.py).

-If you want to improve an existing dataset, you can download it locally with:
-```bash
-mkdir -p data/$DATASET
-HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ${HF_USER}/$DATASET \
--repo-type dataset \
--local-dir data/$DATASET \
--local-dir-use-symlinks=False \
--revision v1.0
-```
-
-Iterate on your code and dataset with:
-```bash
-DATA_DIR=data python train.py
-```
-
-Upload a new version (v2.0 or v1.1 if the changes are respectively more or less significant):
-```bash
-HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli upload $HF_USER/$DATASET data/$DATASET \
--repo-type dataset \
--revision v1.1 \
--delete "*"
-```
-
-Then you will need to set the corresponding version as a default argument in your dataset class:
-```python
-  version: str | None = "v1.1",
-```
-See: [`lerobot/common/datasets/pusht.py`](https://github.com/Cadene/lerobot/blob/main/lerobot/common/datasets/pusht.py)
-
-
-Finally, you might want to mock the dataset if you need to update the unit tests as well:
-```bash
-python tests/scripts/mock_dataset.py --in-data-dir data/$DATASET --out-data-dir tests/data/$DATASET
-```

 ### Add a pretrained policy

-```python
-# TODO(rcadene, alexander-soare): rewrite this section
-```
-
-Once you have trained a policy you may upload it to the HuggingFace hub.
-
-Firstly, make sure you have a model repository set up on the hub. The hub ID looks like HF_USER/REPO_NAME.
-
-Secondly, assuming you have trained a policy, you need the following (which should all be in any of the subdirectories of `checkpoints` in your training output folder, if you've used the LeRobot training script):
+Once you have trained a policy you may upload it to the Hugging Face hub using a hub id that looks like `${hf_user}/${repo_name}` (e.g. [lerobot/diffusion_pusht](https://huggingface.co/lerobot/diffusion_pusht)).

+You first need to find the checkpoint located inside your experiment directory (e.g. `outputs/train/2024-05-05/20-21-12_aloha_act_default/checkpoints/002500`). It should contain:
 - `config.json`: A serialized version of the policy configuration (following the policy's dataclass config).
- `model.safetensors`: The `torch.nn.Module` parameters saved in [Hugging Face Safetensors](https://huggingface.co/docs/safetensors/index) format.
- `config.yaml`: This is the consolidated Hydra training configuration containing the policy, environment, and dataset configs. The policy configuration should match `config.json` exactly. The environment config is useful for anyone who wants to evaluate your policy. The dataset config just serves as a paper trail for reproducibility.
-
-To upload these to the hub, run the following with a desired revision ID.
+- `model.safetensors`: A set of `torch.nn.Module` parameters, saved in [Hugging Face Safetensors](https://huggingface.co/docs/safetensors/index) format.
+- `config.yaml`: A consolidated Hydra training configuration containing the policy, environment, and dataset configs. The policy configuration should match `config.json` exactly. The environment config is useful for anyone who wants to evaluate your policy. The dataset config just serves as a paper trail for reproducibility.

+To upload these to the hub, run the following:
 ```bash
-huggingface-cli upload $HUB_ID PATH/TO/OUTPUT/DIR --revision $REVISION_ID
+huggingface-cli upload ${hf_user}/${repo_name} path/to/checkpoint/dir
 ```

-If you want this to be the default revision also run the following (don't worry, it won't upload the files again; it will just adjust the file pointers):
-
-```bash
-huggingface-cli upload $HUB_ID PATH/TO/OUTPUT/DIR
-```
-
-See `eval.py` for an example of how a user may use your policy.
+See [eval.py](https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/eval.py) for an example of how other people may use your policy.


 ### Improve your code with profiling
@@ -291,9 +251,14 @@ with profile(
            # insert code to profile, potentially whole body of eval_policy function
 ```

-```bash
-python lerobot/scripts/eval.py \
--config outputs/pusht/.hydra/config.yaml \
-pretrained_model_path=outputs/pusht/model.pt \
-eval_episodes=7
+## Citation
+
+If you want, you can cite this work with:
+```
+@misc{cadene2024lerobot,
+    author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Wolf, Thomas},
+    title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
+    howpublished = "\url{https://github.com/huggingface/lerobot}",
+    year = {2024}
+}
 ```
--- a/examples/1_load_lerobot_dataset.py
+++ b/examples/1_load_lerobot_dataset.py
@@ -14,6 +14,7 @@ The script ends with examples of how to batch process data using PyTorch's DataL
 """

 from pathlib import Path
+from pprint import pprint

 import imageio
 import torch
@@ -21,39 +22,36 @@ import torch
 import lerobot
 from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

-print("List of available datasets", lerobot.available_datasets)
-# # >>> ['lerobot/aloha_sim_insertion_human', 'lerobot/aloha_sim_insertion_scripted',
-# #     'lerobot/aloha_sim_transfer_cube_human', 'lerobot/aloha_sim_transfer_cube_scripted',
-# #     'lerobot/pusht', 'lerobot/xarm_lift_medium']
+print("List of available datasets:")
+pprint(lerobot.available_datasets)

+# Let's take one for this example
 repo_id = "lerobot/pusht"

-# You can easily load a dataset from a Hugging Face repositery
+# You can easily load a dataset from a Hugging Face repository
 dataset = LeRobotDataset(repo_id)

-# LeRobotDataset is actually a thin wrapper around an underlying Hugging Face dataset  (see https://huggingface.co/docs/datasets/index for more information).
-# TODO(rcadene): update to make the print pretty
-print(f"{dataset=}")
-print(f"{dataset.hf_dataset=}")
+# LeRobotDataset is actually a thin wrapper around an underlying Hugging Face dataset
+# (see https://huggingface.co/docs/datasets/index for more information).
+print(dataset)
+print(dataset.hf_dataset)

-# and provides additional utilities for robotics and compatibility with pytorch
-print(f"number of samples/frames: {dataset.num_samples=}")
-print(f"number of episodes: {dataset.num_episodes=}")
-print(f"average number of frames per episode: {dataset.num_samples / dataset.num_episodes:.3f}")
+# And provides additional utilities for robotics and compatibility with Pytorch
+print(f"\naverage number of frames per episode: {dataset.num_samples / dataset.num_episodes:.3f}")
 print(f"frames per second used during data collection: {dataset.fps=}")
-print(f"keys to access images from cameras: {dataset.image_keys=}")
+print(f"keys to access images from cameras: {dataset.camera_keys=}\n")

 # Access frame indexes associated to first episode
 episode_index = 0
 from_idx = dataset.episode_data_index["from"][episode_index].item()
 to_idx = dataset.episode_data_index["to"][episode_index].item()

-# LeRobot datasets actually subclass PyTorch datasets so you can do everything you know and love from working with the latter, like iterating through the dataset.
-# Here we grab all the image frames.
+# LeRobot datasets actually subclass PyTorch datasets so you can do everything you know and love from working
+# with the latter, like iterating through the dataset. Here we grab all the image frames.
 frames = [dataset[idx]["observation.image"] for idx in range(from_idx, to_idx)]

-# Video frames are now float32 in range [0,1] channel first (c,h,w) to follow pytorch convention.
-# To visualize them, we convert to uint8 range [0,255]
+# Video frames are now float32 in range [0,1] channel first (c,h,w) to follow pytorch convention. To visualize
+# them, we convert to uint8 in range [0,255]
 frames = [(frame * 255).type(torch.uint8) for frame in frames]
 # and to channel last (h,w,c).
 frames = [frame.permute((1, 2, 0)).numpy() for frame in frames]
@@ -62,9 +60,9 @@ frames = [frame.permute((1, 2, 0)).numpy() for frame in frames]
 Path("outputs/examples/1_load_lerobot_dataset").mkdir(parents=True, exist_ok=True)
 imageio.mimsave("outputs/examples/1_load_lerobot_dataset/episode_0.mp4", frames, fps=dataset.fps)

-# For many machine learning applications we need to load the history of past observations or trajectories of future actions.
-# Our datasets can load previous and future frames for each key/modality,
-# using timestamps differences with the current loaded frame. For instance:
+# For many machine learning applications we need to load the history of past observations or trajectories of
+# future actions. Our datasets can load previous and future frames for each key/modality, using timestamps
+# differences with the current loaded frame. For instance:
 delta_timestamps = {
    # loads 4 images: 1 second before current frame, 500 ms before, 200 ms before, and current frame
    "observation.image": [-1, -0.5, -0.20, 0],
@@ -74,12 +72,12 @@ delta_timestamps = {
    "action": [t / dataset.fps for t in range(64)],
 }
 dataset = LeRobotDataset(repo_id, delta_timestamps=delta_timestamps)
-print(f"{dataset[0]['observation.image'].shape=}")  # (4,c,h,w)
+print(f"\n{dataset[0]['observation.image'].shape=}")  # (4,c,h,w)
 print(f"{dataset[0]['observation.state'].shape=}")  # (8,c)
-print(f"{dataset[0]['action'].shape=}")  # (64,c)
+print(f"{dataset[0]['action'].shape=}\n")  # (64,c)

-# Finally, our datasets are fully compatible with PyTorch dataloaders and samplers
-# because they are just PyTorch datasets.
+# Finally, our datasets are fully compatible with PyTorch dataloaders and samplers because they are just
+# PyTorch datasets.
 dataloader = torch.utils.data.DataLoader(
    dataset,
    num_workers=0,
--- a/examples/2_evaluate_pretrained_policy.py
+++ b/examples/2_evaluate_pretrained_policy.py
@@ -5,23 +5,108 @@ training outputs directory. In the latter case, you might want to run examples/3

 from pathlib import Path

+import gym_pusht  # noqa: F401
+import gymnasium as gym
+import imageio
+import numpy
+import torch
 from huggingface_hub import snapshot_download

-from lerobot.scripts.eval import eval
+from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy

-# Get a pretrained policy from the hub.
-pretrained_policy_name = "lerobot/diffusion_pusht"
-pretrained_policy_path = Path(snapshot_download(pretrained_policy_name))
+# Create a directory to store the video of the evaluation
+output_directory = Path("outputs/eval/example_pusht_diffusion")
+output_directory.mkdir(parents=True, exist_ok=True)
+
+device = torch.device("cuda")
+
+# Download the diffusion policy for pusht environment
+pretrained_policy_path = Path(snapshot_download("lerobot/diffusion_pusht"))
 # OR uncomment the following to evaluate a policy from the local outputs/train folder.
 # pretrained_policy_path = Path("outputs/train/example_pusht_diffusion")

-# Override some config parameters to do with evaluation.
-overrides = [
-    "eval.n_episodes=10",
-    "eval.batch_size=10",
-    "device=cuda",
-]
+policy = DiffusionPolicy.from_pretrained(pretrained_policy_path)
+policy.eval()
+policy.to(device)

-# Evaluate the policy and save the outputs including metrics and videos.
-# TODO(rcadene, alexander-soare): dont call eval, but add the minimal code snippet to rollout
-eval(pretrained_policy_path=pretrained_policy_path)
+# Initialize evaluation environment to render two observation types:
+# an image of the scene and state/position of the agent. The environment
+# also automatically stops running after 300 interactions/steps.
+env = gym.make(
+    "gym_pusht/PushT-v0",
+    obs_type="pixels_agent_pos",
+    max_episode_steps=300,
+)
+
+# Reset the policy and environmens to prepare for rollout
+policy.reset()
+numpy_observation, info = env.reset(seed=42)
+
+# Prepare to collect every rewards and all the frames of the episode,
+# from initial state to final state.
+rewards = []
+frames = []
+
+# Render frame of the initial state
+frames.append(env.render())
+
+step = 0
+done = False
+while not done:
+    # Prepare observation for the policy running in Pytorch
+    state = torch.from_numpy(numpy_observation["agent_pos"])
+    image = torch.from_numpy(numpy_observation["pixels"])
+
+    # Convert to float32 with image from channel first in [0,255]
+    # to channel last in [0,1]
+    state = state.to(torch.float32)
+    image = image.to(torch.float32) / 255
+    image = image.permute(2, 0, 1)
+
+    # Send data tensors from CPU to GPU
+    state = state.to(device, non_blocking=True)
+    image = image.to(device, non_blocking=True)
+
+    # Add extra (empty) batch dimension, required to forward the policy
+    state = state.unsqueeze(0)
+    image = image.unsqueeze(0)
+
+    # Create the policy input dictionary
+    observation = {
+        "observation.state": state,
+        "observation.image": image,
+    }
+
+    # Predict the next action with respect to the current observation
+    with torch.inference_mode():
+        action = policy.select_action(observation)
+
+    # Prepare the action for the environment
+    numpy_action = action.squeeze(0).to("cpu").numpy()
+
+    # Step through the environment and receive a new observation
+    numpy_observation, reward, terminated, truncated, info = env.step(numpy_action)
+    print(f"{step=} {reward=} {terminated=}")
+
+    # Keep track of all the rewards and frames
+    rewards.append(reward)
+    frames.append(env.render())
+
+    # The rollout is considered done when the success state is reach (i.e. terminated is True),
+    # or the maximum number of iterations is reached (i.e. truncated is True)
+    done = terminated | truncated | done
+    step += 1
+
+if terminated:
+    print("Success!")
+else:
+    print("Failure!")
+
+# Get the speed of environment (i.e. its number of frames per second).
+fps = env.metadata["render_fps"]
+
+# Encode all frames into a mp4 video.
+video_path = output_directory / "rollout.mp4"
+imageio.mimsave(str(video_path), numpy.stack(frames), fps=fps)
+
+print(f"Video of the evaluation is available in '{video_path}'.")
--- a/examples/3_train_policy.py
+++ b/examples/3_train_policy.py
@@ -4,36 +4,42 @@ Once you have trained a model with this script, you can try to evaluate it on
 examples/2_evaluate_pretrained_policy.py
 """

-import os
 from pathlib import Path

 import torch
-from omegaconf import OmegaConf

-from lerobot.common.datasets.factory import make_dataset
+from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
 from lerobot.common.policies.diffusion.configuration_diffusion import DiffusionConfig
 from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy
-from lerobot.common.utils.utils import init_hydra_config

+# Create a directory to store the training checkpoint.
 output_directory = Path("outputs/train/example_pusht_diffusion")
-os.makedirs(output_directory, exist_ok=True)
+output_directory.mkdir(parents=True, exist_ok=True)

-# Number of offline training steps (we'll only do offline training for this example.
+# Number of offline training steps (we'll only do offline training for this example.)
 # Adjust as you prefer. 5000 steps are needed to get something worth evaluating.
 training_steps = 5000
 device = torch.device("cuda")
 log_freq = 250

 # Set up the dataset.
-hydra_cfg = init_hydra_config("lerobot/configs/default.yaml", overrides=["env=pusht"])
-dataset = make_dataset(hydra_cfg)
+delta_timestamps = {
+    # Load the previous image and state at -0.1 seconds before current frame,
+    # then load current image and state corresponding to 0.0 second.
+    "observation.image": [-0.1, 0.0],
+    "observation.state": [-0.1, 0.0],
+    # Load the previous action (-0.1), the next action to be executed (0.0),
+    # and 14 future actions with a 0.1 seconds spacing. All these actions will be
+    # used to supervise the policy.
+    "action": [-0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4],
+}
+dataset = LeRobotDataset("lerobot/pusht", delta_timestamps=delta_timestamps)

 # Set up the the policy.
 # Policies are initialized with a configuration class, in this case `DiffusionConfig`.
 # For this example, no arguments need to be passed because the defaults are set up for PushT.
 # If you're doing something different, you will likely need to change at least some of the defaults.
 cfg = DiffusionConfig()
-# TODO(alexander-soare): Remove LR scheduler from the policy.
 policy = DiffusionPolicy(cfg, dataset_stats=dataset.stats)
 policy.train()
 policy.to(device)
@@ -69,7 +75,5 @@ while not done:
            done = True
            break

-# Save the policy.
+# Save a policy checkpoint.
 policy.save_pretrained(output_directory)
-# Save the Hydra configuration so we have the environment configuration for eval.
-OmegaConf.save(hydra_cfg, output_directory / "config.yaml")
--- a/lerobot/init.py
+++ b/lerobot/init.py
@@ -85,13 +85,6 @@ available_datasets = list(
    itertools.chain(*available_datasets_per_env.values(), available_real_world_datasets)
 )

-# TODO(rcadene, aliberts, alexander-soare): Add real-world env with a gym API
-available_datasets_without_env = ["lerobot/umi_cup_in_the_wild"]
-
-available_datasets = list(
-    itertools.chain(*available_datasets_per_env.values(), available_datasets_without_env)
-)
-
 available_policies = [
    "act",
    "diffusion",
--- a/lerobot/common/datasets/_video_benchmark/README.md
+++ b/lerobot/common/datasets/_video_benchmark/README.md
@@ -37,16 +37,16 @@ How to decode videos?
 ## Variables

 **Image content**
-We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this bechmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).
+We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this benchmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).

 **Requested timestamps**
-In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
+In this benchmark, we focus on the loading time of random access, so we are not interested in sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
 - `single_frame`: 1 frame,
 - `2_frames`: 2 consecutive frames (e.g. `[t, t + 1 / fps]`),
 - `2_frames_4_space`: 2 consecutive frames with 4 frames of spacing (e.g `[t, t + 4 / fps]`),

 **Data augmentations**
-We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robusts (e.g. robust to color changes, compression, etc.).
+We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robust (e.g. robust to color changes, compression, etc.).


 ## Results
--- a/lerobot/common/datasets/lerobot_dataset.py
+++ b/lerobot/common/datasets/lerobot_dataset.py
@@ -47,6 +47,7 @@ class LeRobotDataset(torch.utils.data.Dataset):

    @property
    def fps(self) -> int:
+        """Frames per second used during data collection."""
        return self.info["fps"]

    @property
@@ -61,15 +62,22 @@ class LeRobotDataset(torch.utils.data.Dataset):
        return self.hf_dataset.features

    @property
-    def image_keys(self) -> list[str]:
-        image_keys = []
+    def camera_keys(self) -> list[str]:
+        """Keys to access image and video stream from cameras."""
+        keys = []
        for key, feats in self.hf_dataset.features.items():
-            if isinstance(feats, datasets.Image):
-                image_keys.append(key)
-        return image_keys + self.video_frame_keys
+            if isinstance(feats, (datasets.Image, VideoFrame)):
+                keys.append(key)
+        return keys

    @property
-    def video_frame_keys(self):
+    def video_frame_keys(self) -> list[str]:
+        """Keys to access video frames that requires to be decoded into images.
+
+        Note: It is empty if the dataset contains images only,
+        or equal to `self.cameras` if the dataset contains videos only,
+        or can even be a subset of `self.cameras` in a case of a mixed image/video dataset.
+        """
        video_frame_keys = []
        for key, feats in self.hf_dataset.features.items():
            if isinstance(feats, VideoFrame):
@@ -78,10 +86,12 @@ class LeRobotDataset(torch.utils.data.Dataset):

    @property
    def num_samples(self) -> int:
+        """Number of samples/frames."""
        return len(self.hf_dataset)

    @property
    def num_episodes(self) -> int:
+        """Number of episodes."""
        return len(self.hf_dataset.unique("episode_index"))

    @property
@@ -121,6 +131,22 @@ class LeRobotDataset(torch.utils.data.Dataset):

        return item

+    def __repr__(self):
+        return (
+            f"{self.__class__.__name__}(\n"
+            f"  Repository ID: '{self.repo_id}',\n"
+            f"  Version: '{self.version}',\n"
+            f"  Split: '{self.split}',\n"
+            f"  Number of Samples: {self.num_samples},\n"
+            f"  Number of Episodes: {self.num_episodes},\n"
+            f"  Type: {'video (.mp4)' if self.video else 'image (.png)'},\n"
+            f"  Recorded Frames per Second: {self.fps},\n"
+            f"  Camera Keys: {self.camera_keys},\n"
+            f"  Video Frame Keys: {self.video_frame_keys if self.video else 'N/A'},\n"
+            f"  Transformations: {self.transform},\n"
+            f")"
+        )
+
    @classmethod
    def from_preloaded(
        cls,
--- a/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
@@ -142,12 +142,12 @@ def load_from_raw(raw_dir, out_dir, fps, video, debug):
 def to_hf_dataset(data_dict, video) -> Dataset:
    features = {}

-    image_keys = [key for key in data_dict if "observation.images." in key]
-    for image_key in image_keys:
+    keys = [key for key in data_dict if "observation.images." in key]
+    for key in keys:
        if video:
-            features[image_key] = VideoFrame()
+            features[key] = VideoFrame()
        else:
-            features[image_key] = Image()
+            features[key] = Image()

    features["observation.state"] = Sequence(
        length=data_dict["observation.state"].shape[1], feature=Value(dtype="float32", id=None)
--- a/lerobot/configs/default.yaml
+++ b/lerobot/configs/default.yaml
@@ -25,7 +25,7 @@ training:
  eval_freq: ???
  save_freq: ???
  log_freq: 250
-  save_model: false
+  save_model: true

 eval:
  n_episodes: 1
--- a/lerobot/scripts/eval.py
+++ b/lerobot/scripts/eval.py
@@ -583,17 +583,18 @@ if __name__ == "__main__":
            pretrained_policy_path = Path(
                snapshot_download(args.pretrained_policy_name_or_path, revision=args.revision)
            )
-        except HFValidationError:
-            logging.warning(
-                "The provided pretrained_policy_name_or_path is not a valid Hugging Face Hub repo ID. "
-                "Treating it as a local directory."
-            )
-        except RepositoryNotFoundError:
-            logging.warning(
-                "The provided pretrained_policy_name_or_path was not found on the Hugging Face Hub. Treating "
-                "it as a local directory."
-            )
-        pretrained_policy_path = Path(args.pretrained_policy_name_or_path)
+        except (HFValidationError, RepositoryNotFoundError) as e:
+            if isinstance(e, HFValidationError):
+                error_message = (
+                    "The provided pretrained_policy_name_or_path is not a valid Hugging Face Hub repo ID."
+                )
+            else:
+                error_message = (
+                    "The provided pretrained_policy_name_or_path was not found on the Hugging Face Hub."
+                )
+
+            logging.warning(f"{error_message} Treating it as a local directory.")
+            pretrained_policy_path = Path(args.pretrained_policy_name_or_path)
        if not pretrained_policy_path.is_dir() or not pretrained_policy_path.exists():
            raise ValueError(
                "The provided pretrained_policy_name_or_path is not a valid/existing Hugging Face Hub "
--- a/lerobot/scripts/push_dataset_to_hub.py
+++ b/lerobot/scripts/push_dataset_to_hub.py
@@ -60,7 +60,7 @@ import torch
 from huggingface_hub import HfApi
 from safetensors.torch import save_file

-from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
+from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset
 from lerobot.common.datasets.push_dataset_to_hub._download_raw import download_raw
 from lerobot.common.datasets.push_dataset_to_hub.compute_stats import compute_stats
 from lerobot.common.datasets.utils import flatten_dict
@@ -252,7 +252,7 @@ def main():
    parser.add_argument(
        "--revision",
        type=str,
-        default="v1.2",
+        default=CODEBASE_VERSION,
        help="Codebase version used to generate the dataset.",
    )
    parser.add_argument(
--- a/lerobot/scripts/train.py
+++ b/lerobot/scripts/train.py
@@ -8,7 +8,6 @@ import hydra
 import torch
 from datasets import concatenate_datasets
 from datasets.utils import disable_progress_bars, enable_progress_bars
-from diffusers.optimization import get_scheduler

 from lerobot.common.datasets.factory import make_dataset
 from lerobot.common.datasets.utils import cycle
@@ -55,6 +54,8 @@ def make_optimizer_and_scheduler(cfg, policy):
            cfg.training.adam_weight_decay,
        )
        assert cfg.training.online_steps == 0, "Diffusion Policy does not handle online training."
+        from diffusers.optimization import get_scheduler
+
        lr_scheduler = get_scheduler(
            cfg.training.lr_scheduler,
            optimizer=optimizer,
@@ -336,7 +337,7 @@ def train(cfg: dict, out_dir=None, job_name=None):
    logging.info(f"{num_total_params=} ({format_big_number(num_total_params)})")

    # Note: this helper will be used in offline and online training loops.
-    def _maybe_eval_and_maybe_save(step):
+    def evaluate_and_checkpoint_if_needed(step):
        if step % cfg.training.eval_freq == 0:
            logging.info(f"Eval policy at step {step}")
            eval_info = eval_policy(
@@ -392,9 +393,9 @@ def train(cfg: dict, out_dir=None, job_name=None):
        if step % cfg.training.log_freq == 0:
            log_train_info(logger, train_info, step, cfg, offline_dataset, is_offline)

-        # Note: _maybe_eval_and_maybe_save happens **after** the `step`th training update has completed, so we pass in
-        # step + 1.
-        _maybe_eval_and_maybe_save(step + 1)
+        # Note: evaluate_and_checkpoint_if_needed happens **after** the `step`th training update has completed,
+        # so we pass in step + 1.
+        evaluate_and_checkpoint_if_needed(step + 1)

        step += 1

@@ -460,9 +461,9 @@ def train(cfg: dict, out_dir=None, job_name=None):
            if step % cfg.training.log_freq == 0:
                log_train_info(logger, train_info, step, cfg, online_dataset, is_offline)

-            # Note: _maybe_eval_and_maybe_save happens **after** the `step`th training update has completed, so we pass
-            # in step + 1.
-            _maybe_eval_and_maybe_save(step + 1)
+            # Note: evaluate_and_checkpoint_if_needed happens **after** the `step`th training update has completed,
+            # so we pass in step + 1.
+            evaluate_and_checkpoint_if_needed(step + 1)

            step += 1
            online_step += 1
--- a/lerobot/scripts/visualize_dataset.py
+++ b/lerobot/scripts/visualize_dataset.py
@@ -32,7 +32,7 @@ local$ rerun lerobot_pusht_episode_0.rrd
 ```

 - Visualize data stored on a distant machine through streaming:
-(You need to forward the websocket port to the distant machine, with 
+(You need to forward the websocket port to the distant machine, with
 `ssh -L 9087:localhost:9087 username@remote-host`)
 ```
 distant$ python lerobot/scripts/visualize_dataset.py \
@@ -131,7 +131,7 @@ def visualize_dataset(
            rr.set_time_seconds("timestamp", batch["timestamp"][i].item())

            # display each camera image
-            for key in dataset.image_keys:
+            for key in dataset.camera_keys:
                # TODO(rcadene): add `.compress()`? is it lossless?
                rr.log(key, rr.Image(to_hwc_uint8_numpy(batch[key][i])))

--- a/media/wandb.png
+++ b/media/wandb.png
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,9 +4,9 @@ version = "0.1.0"
 description = "🤗 LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch"
 authors = [
    "Rémi Cadène <re.cadene@gmail.com>",
+    "Simon Alibert <alibert.sim@gmail.com>",
    "Alexander Soare <alexander.soare159@gmail.com>",
    "Quentin Gallouédec <quentin.gallouedec@ec-lyon.fr>",
-    "Simon Alibert <alibert.sim@gmail.com>",
    "Adil Zouitine <adilzouitinegm@gmail.com>",
    "Thomas Wolf <thomaswolfcontact@gmail.com>",
 ]
--- a/tests/test_available.py
+++ b/tests/test_available.py
@@ -15,7 +15,7 @@ from tests.utils import require_env
 def test_available_env_task(env_name: str, task_name: list):
    """
    This test verifies that all environments listed in `lerobot/__init__.py` can
-    be sucessfully imported — if they're installed — and that their
+    be successfully imported — if they're installed — and that their
    `available_tasks_per_env` are valid.
    """
    package_name = f"gym_{env_name}"
--- a/tests/test_datasets.py
+++ b/tests/test_datasets.py
@@ -41,7 +41,7 @@ def test_factory(env_name, repo_id, policy_name):
    )
    dataset = make_dataset(cfg)
    delta_timestamps = dataset.delta_timestamps
-    image_keys = dataset.image_keys
+    camera_keys = dataset.camera_keys

    item = dataset[0]

@@ -71,7 +71,7 @@ def test_factory(env_name, repo_id, policy_name):
        else:
            assert item[key].ndim == ndim, f"{key}"

-        if key in image_keys:
+        if key in camera_keys:
            assert item[key].dtype == torch.float32, f"{key}"
            # TODO(rcadene): we assume for now that image normalization takes place in the model
            assert item[key].max() <= 1.0, f"{key}"
--- a/tests/test_examples.py
+++ b/tests/test_examples.py
@@ -46,7 +46,7 @@ def test_examples_3_and_2():
    # Pass empty globals to allow dictionary comprehension https://stackoverflow.com/a/32897127/4391249.
    exec(file_contents, {})

-    for file_name in ["model.safetensors", "config.json", "config.yaml"]:
+    for file_name in ["model.safetensors", "config.json"]:
        assert Path(f"outputs/train/example_pusht_diffusion/{file_name}").exists()

    path = "examples/2_evaluate_pretrained_policy.py"
@@ -58,16 +58,16 @@ def test_examples_3_and_2():
    file_contents = _find_and_replace(
        file_contents,
        [
-            ('pretrained_policy_name = "lerobot/diffusion_pusht"', ""),
-            ("pretrained_policy_path = Path(snapshot_download(pretrained_policy_name))", ""),
+            ('pretrained_policy_path = Path(snapshot_download("lerobot/diffusion_pusht"))', ""),
            (
                '# pretrained_policy_path = Path("outputs/train/example_pusht_diffusion")',
                'pretrained_policy_path = Path("outputs/train/example_pusht_diffusion")',
            ),
-            ('"eval.n_episodes=10"', '"eval.n_episodes=1"'),
-            ('"eval.batch_size=10"', '"eval.batch_size=1"'),
-            ('"device=cuda"', '"device=cpu"'),
+            ('device = torch.device("cuda")', 'device = torch.device("cpu")'),
+            ("step += 1", "break"),
        ],
    )

-    assert Path("outputs/train/example_pusht_diffusion").exists()
+    exec(file_contents, {})
+
+    assert Path("outputs/eval/example_pusht_diffusion/rollout.mp4").exists()