diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 5f5a509c7..9f5de8230 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -19,9 +19,13 @@ title: Train RL in Simulation - local: async title: Use Async Inference + title: "Tutorials" +- sections: + - local: lerobot-dataset-v3 + title: Using LeRobotDataset - local: porting_datasets_v3 title: Porting Large Datasets - title: "Tutorials" + title: "Datasets" - sections: - local: smolvla title: Finetune SmolVLA diff --git a/docs/source/lerobot-dataset-v3.mdx b/docs/source/lerobot-dataset-v3.mdx new file mode 100644 index 000000000..4f33d9a25 --- /dev/null +++ b/docs/source/lerobot-dataset-v3.mdx @@ -0,0 +1,169 @@ +# LeRobotDataset v3.0 + +`LeRobotDataset v3.0` is a standardized format for robot learning data. It provides unified access to multi-modal time-series data, sensorimotor signals and multi‑camera video, as well as rich metadata for indexing, search, and visualization on the Hugging Face Hub. + +This docs will guide you to: + +- Understand the v3.0 design and directory layout +- Record a dataset and push it to the Hub +- Load datasets for training with `LeRobotDataset` +- Stream datasets without downloading using `StreamingLeRobotDataset` +- Migrate existing `v2.1` datasets to `v3.0` + +## What’s new in `v3` + +- **File-based storage**: Many episodes per Parquet/MP4 file (v2 used one file per episode). +- **Relational metadata**: Episode boundaries and lookups are resolved through metadata, not filenames. +- **Hub-native streaming**: Consume datasets directly from the Hub with `StreamingLeRobotDataset`. +- **Lower file-system pressure**: Fewer, larger files ⇒ faster initialization and fewer issues at scale. +- **Unified organization**: Clean directory layout with consistent path templates across data and videos. + +## Installation + +`LeRobotDataset v3.0` will be included in `lerobot >= 0.4.0`. + +Until that stable release, you can use the main branch by following the [build from source instructions](./installation#from-source). + +## Record a dataset + +Run the command below to record a dataset with the SO-101 and push to the Hub: + +```bash +lerobot-record \ + --robot.type=so101_follower \ + --robot.port=/dev/tty.usbmodem585A0076841 \ + --robot.id=my_awesome_follower_arm \ + --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ + --teleop.type=so101_leader \ + --teleop.port=/dev/tty.usbmodem58760431551 \ + --teleop.id=my_awesome_leader_arm \ + --display_data=true \ + --dataset.repo_id=${HF_USER}/record-test \ + --dataset.num_episodes=5 \ + --dataset.single_task="Grab the black cube" +``` + +See the [recording guide](./il_robots#record-a-dataset) for more details. + +## Format design + +A core v3 principle is **decoupling storage from the user API**: data is stored efficiently (few large files), while the public API exposes intuitive episode-level access. + +`v3` has three pillars: + +1. **Tabular data**: Low‑dimensional, high‑frequency signals (states, actions, timestamps) stored in **Apache Parquet**. Access is memory‑mapped or streamed via the `datasets` stack. +2. **Visual data**: Camera frames concatenated and encoded into **MP4**. Frames from the same episode are grouped; videos are sharded per camera for practical sizes. +3. **Metadata**: JSON/Parquet records describing schema (feature names, dtypes, shapes), frame rates, normalization stats, and **episode segmentation** (start/end offsets into shared Parquet/MP4 files). + +> To scale to millions of episodes, tabular rows and video frames from multiple episodes are **concatenated** into larger files. Episode‑specific views are reconstructed **via metadata**, not file boundaries. + +
+
+