Dataset v3 (#1412)

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Remi Cadene <re.cadene@gmail.com> Co-authored-by: Tavish <tavish9.chen@gmail.com> Co-authored-by: fracapuano <francesco.capuano@huggingface.co> Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
2025-09-15 09:53:30 +02:00
parent d602e8169c
commit f55c6e89f0
50 changed files with 4642 additions and 4092 deletions
--- a/README.md
+++ b/README.md
@@ -233,7 +233,7 @@ Under the hood, the `LeRobotDataset` format makes use of several ways to seriali

 Here are the important details and internal structure organization of a typical `LeRobotDataset` instantiated with `dataset = LeRobotDataset("lerobot/aloha_static_coffee")`. The exact features will change from dataset to dataset but not the main aspects:

-```
+````
 dataset attributes:
  ├ hf_dataset: a Hugging Face dataset (backed by Arrow/parquet). Typical features example:
  │  ├ observation.images.cam_high (VideoFrame):
@@ -246,20 +246,30 @@ dataset attributes:
  │  ├ timestamp (float32): timestamp in the episode
  │  ├ next.done (bool): indicates the end of an episode ; True for the last frame in each episode
  │  └ index (int64): general index in the whole dataset
-  ├ episode_data_index: contains 2 tensors with the start and end indices of each episode
-  │  ├ from (1D int64 tensor): first frame index for each episode — shape (num episodes,) starts with 0
-  │  └ to: (1D int64 tensor): last frame index for each episode — shape (num episodes,)
-  ├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance
-  │  ├ observation.images.cam_high: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}
-  │  ...
-  ├ info: a dictionary of metadata on the dataset
-  │  ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with
-  │  ├ fps (float): frame per second the dataset is recorded/synchronized to
-  │  ├ video (bool): indicates if frames are encoded in mp4 video files to save space or stored as png files
-  │  └ encoding (dict): if video, this documents the main options that were used with ffmpeg to encode the videos
-  ├ videos_dir (Path): where the mp4 videos or png images are stored/accessed
-  └ camera_keys (list of string): the keys to access camera features in the item returned by the dataset (e.g. `["observation.images.cam_high", ...]`)
-```
+  ├ meta: a LeRobotDatasetMetadata object containing:
+  │  ├ info: a dictionary of metadata on the dataset
+  │  │  ├ codebase_version (str): this is to keep track of the codebase version the dataset was created with
+  │  │  ├ fps (int): frame per second the dataset is recorded/synchronized to
+  │  │  ├ features (dict): all features contained in the dataset with their shapes and types
+  │  │  ├ total_episodes (int): total number of episodes in the dataset
+  │  │  ├ total_frames (int): total number of frames in the dataset
+  │  │  ├ robot_type (str): robot type used for recording
+  │  │  ├ data_path (str): formattable string for the parquet files
+  │  │  └ video_path (str): formattable string for the video files (if using videos)
+  │  ├ episodes: a DataFrame containing episode metadata with columns:
+  │  │  ├ episode_index (int): index of the episode
+  │  │  ├ tasks (list): list of tasks for this episode
+  │  │  ├ length (int): number of frames in this episode
+  │  │  ├ dataset_from_index (int): start index of this episode in the dataset
+  │  │  └ dataset_to_index (int): end index of this episode in the dataset
+  │  ├ stats: a dictionary of statistics (max, mean, min, std) for each feature in the dataset, for instance
+  │  │  ├ observation.images.front_cam: {'max': tensor with same number of dimensions (e.g. `(c, 1, 1)` for images, `(c,)` for states), etc.}
+  │  │  └ ...
+  │  └ tasks: a DataFrame containing task information with task names as index and task_index as values
+  ├ root (Path): local directory where the dataset is stored
+  ├ image_transforms (Callable): optional image transformations to apply to visual modalities
+  └ delta_timestamps (dict): optional delta timestamps for temporal queries
+decoding videos (e.g., 'pyav', 'torchcodec')

 A `LeRobotDataset` is serialised using several widespread file formats for each of its parts, namely:

@@ -283,7 +293,7 @@ lerobot-eval \
    --eval.n_episodes=10 \
    --policy.use_amp=false \
    --policy.device=cuda
-```
+````

 Note: After training your own policy, you can re-evaluate the checkpoints with:

--- a/benchmarks/video/run_video_benchmark.py
+++ b/benchmarks/video/run_video_benchmark.py
@@ -108,7 +108,8 @@ def save_decoded_frames(


 def save_first_episode(imgs_dir: Path, dataset: LeRobotDataset) -> None:
-    ep_num_images = dataset.episode_data_index["to"][0].item()
+    episode_index = 0
+    ep_num_images = dataset.meta.episodes["length"][episode_index]
    if imgs_dir.exists() and len(list(imgs_dir.glob("frame_*.png"))) == ep_num_images:
        return

@@ -265,7 +266,8 @@ def benchmark_encoding_decoding(
            overwrite=True,
        )

-    ep_num_images = dataset.episode_data_index["to"][0].item()
+    episode_index = 0
+    ep_num_images = dataset.meta.episodes["length"][episode_index]
    width, height = tuple(dataset[0][dataset.meta.camera_keys[0]].shape[-2:])
    num_pixels = width * height
    video_size_bytes = video_path.stat().st_size
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -19,6 +19,8 @@
    title: Train RL in Simulation
  - local: async
    title: Use Async Inference
+  - local: porting_datasets_v3
+    title: Porting Large Datasets
  title: "Tutorials"
 - sections:
  - local: smolvla
--- a/docs/source/porting_datasets_v3.mdx
+++ b/docs/source/porting_datasets_v3.mdx
@@ -0,0 +1,321 @@
+# Porting Large Datasets to LeRobot Dataset v3.0
+
+This tutorial explains how to port large-scale robotic datasets to the LeRobot Dataset v3.0 format. We'll use the **DROID 1.0.1** dataset as our primary example, which demonstrates handling multi-terabyte datasets with thousands of shards across SLURM clusters.
+
+## File Organization: v2.1 vs v3.0
+
+Dataset v3.0 fundamentally changes how data is organized and stored:
+
+**v2.1 Structure (Episode-based)**:
+
+```
+dataset/
+├── data/chunk-000/episode_000000.parquet
+├── data/chunk-000/episode_000001.parquet
+├── videos/chunk-000/camera/episode_000000.mp4
+└── meta/episodes.jsonl
+```
+
+**v3.0 Structure (File-based)**:
+
+```
+dataset/
+├── data/chunk-000/file-000.parquet        # Multiple episodes per file
+├── videos/camera/chunk-000/file-000.mp4   # Consolidated video chunks
+└── meta/episodes/chunk-000/file-000.parquet  # Structured metadata
+```
+
+This transition from individual episode files to file-based chunks dramatically improves performance and reduces storage overhead.
+
+## What's New in Dataset v3.0
+
+Dataset v3.0 introduces significant improvements for handling large datasets:
+
+### 🏗️ **Enhanced File Organization**
+
+- **File-based structure**: Episodes are now grouped into chunked files rather than individual episode files
+- **Configurable file sizes**: for data and video files
+- **Improved storage efficiency**: Better compression and reduced overhead
+
+### 📊 **Modern Metadata Management**
+
+- **Parquet-based metadata**: Replaced JSON Lines with efficient parquet format
+- **Structured episode access**: Direct pandas DataFrame access via `dataset.meta.episodes`
+- **Per-episode statistics**: Enhanced statistics tracking at episode level
+
+### 🚀 **Performance Enhancements**
+
+- **Memory-mapped access**: Improved RAM usage through PyArrow memory mapping
+- **Faster loading**: Significantly reduced dataset initialization time
+- **Better scalability**: Designed for datasets with millions of episodes
+
+## Prerequisites
+
+Before porting large datasets, ensure you have:
+
+- **LeRobot installed** with v3.0 support. Follow our [Installation Guide](./installation).
+- **Sufficient storage**: Raw datasets can be very large (e.g., DROID requires 2TB)
+- **Cluster access** (recommended for large datasets): SLURM or similar job scheduler
+- **Dataset-specific dependencies**: For DROID, you'll need TensorFlow Dataset utilities
+
+## Understanding the DROID Dataset
+
+[DROID 1.0.1](https://droid-dataset.github.io/droid/the-droid-dataset) is an excellent example of a large-scale robotic dataset:
+
+- **Size**: 1.7TB (RLDS format), 8.7TB (raw data)
+- **Structure**: 2048 pre-defined TensorFlow dataset shards
+- **Content**: 76,000+ robot manipulation trajectories from Franka Emika Panda robots
+- **Scope**: Real-world manipulation tasks across multiple environments and objects
+- **Format**: Originally in TensorFlow Records/RLDS format, requiring conversion to LeRobot format
+- **Hosting**: Google Cloud Storage with public access via `gsutil`
+
+The dataset contains diverse manipulation demonstrations with:
+
+- Multiple camera views (wrist camera, exterior cameras)
+- Natural language task descriptions
+- Robot proprioceptive state and actions
+- Success/failure annotations
+
+### DROID Features Schema
+
+```python
+DROID_FEATURES = {
+    # Episode markers
+    "is_first": {"dtype": "bool", "shape": (1,)},
+    "is_last": {"dtype": "bool", "shape": (1,)},
+    "is_terminal": {"dtype": "bool", "shape": (1,)},
+
+    # Language instructions
+    "language_instruction": {"dtype": "string", "shape": (1,)},
+    "language_instruction_2": {"dtype": "string", "shape": (1,)},
+    "language_instruction_3": {"dtype": "string", "shape": (1,)},
+
+    # Robot state
+    "observation.state.gripper_position": {"dtype": "float32", "shape": (1,)},
+    "observation.state.cartesian_position": {"dtype": "float32", "shape": (6,)},
+    "observation.state.joint_position": {"dtype": "float32", "shape": (7,)},
+
+    # Camera observations
+    "observation.images.wrist_left": {"dtype": "image"},
+    "observation.images.exterior_1_left": {"dtype": "image"},
+    "observation.images.exterior_2_left": {"dtype": "image"},
+
+    # Actions
+    "action.gripper_position": {"dtype": "float32", "shape": (1,)},
+    "action.cartesian_position": {"dtype": "float32", "shape": (6,)},
+    "action.joint_position": {"dtype": "float32", "shape": (7,)},
+
+    # Standard LeRobot format
+    "observation.state": {"dtype": "float32", "shape": (8,)},  # joints + gripper
+    "action": {"dtype": "float32", "shape": (8,)},  # joints + gripper
+}
+```
+
+## Approach 1: Single Computer Porting
+
+### Step 1: Install Dependencies
+
+For DROID specifically:
+
+```bash
+pip install tensorflow
+pip install tensorflow_datasets
+```
+
+For other datasets, install the appropriate readers for your source format.
+
+### Step 2: Download Raw Data
+
+Download DROID from Google Cloud Storage using `gsutil`:
+
+```bash
+# Install Google Cloud SDK if not already installed
+# https://cloud.google.com/sdk/docs/install
+
+# Download the full RLDS dataset (1.7TB)
+gsutil -m cp -r gs://gresearch/robotics/droid/1.0.1 /your/data/
+
+# Or download just the 100-episode sample (2GB) for testing
+gsutil -m cp -r gs://gresearch/robotics/droid_100 /your/data/
+```
+
+> [!WARNING]
+> Large datasets require substantial time and storage:
+>
+> - **Full DROID (1.7TB)**: Several days to download depending on bandwidth
+> - **Processing time**: 7+ days for local porting of full dataset
+> - **Upload time**: 3+ days to push to Hugging Face Hub
+> - **Local storage**: ~400GB for processed LeRobot format
+
+### Step 3: Port the Dataset
+
+```bash
+python examples/port_datasets/port_droid.py \
+    --raw-dir /your/data/droid/1.0.1 \
+    --repo-id your_id/droid_1.0.1 \
+    --push-to-hub
+```
+
+### Development and Testing
+
+For development, you can port a single shard:
+
+```bash
+python examples/port_datasets/port_droid.py \
+    --raw-dir /your/data/droid/1.0.1 \
+    --repo-id your_id/droid_1.0.1_test \
+    --num-shards 2048 \
+    --shard-index 0
+```
+
+This approach works for smaller datasets or testing, but large datasets require cluster computing.
+
+## Approach 2: SLURM Cluster Porting (Recommended)
+
+For large datasets like DROID, parallel processing across multiple nodes dramatically reduces processing time.
+
+### Step 1: Install Cluster Dependencies
+
+```bash
+pip install datatrove  # Hugging Face's distributed processing library
+```
+
+### Step 2: Configure Your SLURM Environment
+
+Find your partition information:
+
+```bash
+sinfo --format="%R"  # List available partitions
+sinfo -N -p your_partition -h -o "%N cpus=%c mem=%m"  # Check resources
+```
+
+Choose a **CPU partition** - no GPU needed for dataset porting.
+
+### Step 3: Launch Parallel Porting Jobs
+
+```bash
+python examples/port_datasets/slurm_port_shards.py \
+    --raw-dir /your/data/droid/1.0.1 \
+    --repo-id your_id/droid_1.0.1 \
+    --logs-dir /your/logs \
+    --job-name port_droid \
+    --partition your_partition \
+    --workers 2048 \
+    --cpus-per-task 8 \
+    --mem-per-cpu 1950M
+```
+
+#### Parameter Guidelines
+
+- **`--workers`**: Number of parallel jobs (max 2048 for DROID's shard count)
+- **`--cpus-per-task`**: 8 CPUs recommended for frame encoding parallelization
+- **`--mem-per-cpu`**: ~16GB total RAM (8×1950M) for loading raw frames
+
+> [!TIP]
+> Start with fewer workers (e.g., 100) to test your cluster configuration before launching thousands of jobs.
+
+### Step 4: Monitor Progress
+
+Check running jobs:
+
+```bash
+squeue -u $USER
+```
+
+Monitor overall progress:
+
+```bash
+jobs_status /your/logs
+```
+
+Inspect individual job logs:
+
+```bash
+less /your/logs/port_droid/slurm_jobs/JOB_ID_WORKER_ID.out
+```
+
+Debug failed jobs:
+
+```bash
+failed_logs /your/logs/port_droid
+```
+
+### Step 5: Aggregate Shards
+
+Once all porting jobs complete:
+
+```bash
+python examples/port_datasets/slurm_aggregate_shards.py \
+    --repo-id your_id/droid_1.0.1 \
+    --logs-dir /your/logs \
+    --job-name aggr_droid \
+    --partition your_partition \
+    --workers 2048 \
+    --cpus-per-task 8 \
+    --mem-per-cpu 1950M
+```
+
+### Step 6: Upload to Hub
+
+```bash
+python examples/port_datasets/slurm_upload.py \
+    --repo-id your_id/droid_1.0.1 \
+    --logs-dir /your/logs \
+    --job-name upload_droid \
+    --partition your_partition \
+    --workers 50 \
+    --cpus-per-task 4 \
+    --mem-per-cpu 1950M
+```
+
+> [!NOTE]
+> Upload uses fewer workers (50) since it's network-bound rather than compute-bound.
+
+## Dataset v3.0 File Structure
+
+Your completed dataset will have this modern structure:
+
+```
+dataset/
+├── meta/
+│   ├── episodes/
+│   │   └── chunk-000/
+│   │       └── file-000.parquet    # Episode metadata
+│   ├── tasks.parquet               # Task definitions
+│   ├── stats.json                  # Aggregated statistics
+│   └── info.json                   # Dataset information
+├── data/
+│   └── chunk-000/
+│       └── file-000.parquet        # Consolidated episode data
+└── videos/
+    └── camera_key/
+        └── chunk-000/
+            └── file-000.mp4        # Consolidated video files
+```
+
+This replaces the old episode-per-file structure with efficient, optimally-sized chunks.
+
+## Migrating from Dataset v2.1
+
+If you have existing datasets in v2.1 format, use the migration tool:
+
+```bash
+python src/lerobot/datasets/v30/convert_dataset_v21_to_v30.py \
+    --repo-id your_id/existing_dataset
+```
+
+This automatically:
+
+- Converts file structure to v3.0 format
+- Migrates metadata from JSON Lines to parquet
+- Aggregates statistics and creates per-episode stats
+- Updates version information
+
+## Performance Benefits
+
+Dataset v3.0 provides significant improvements for large datasets:
+
+- **Faster loading**: 3-5x reduction in initialization time
+- **Memory efficiency**: Better RAM usage through memory mapping
+- **Scalable processing**: Handles millions of episodes efficiently
+- **Storage optimization**: Reduced file count and improved compression
--- a/examples/1_load_lerobot_dataset.py
+++ b/examples/1_load_lerobot_dataset.py
@@ -92,11 +92,11 @@ print(dataset.hf_dataset)
 # LeRobot datasets also subclasses PyTorch datasets so you can do everything you know and love from working
 # with the latter, like iterating through the dataset.
 # The __getitem__ iterates over the frames of the dataset. Since our datasets are also structured by
-# episodes, you can access the frame indices of any episode using the episode_data_index. Here, we access
+# episodes, you can access the frame indices of any episode using dataset.meta.episodes. Here, we access
 # frame indices associated to the first episode:
 episode_index = 0
-from_idx = dataset.episode_data_index["from"][episode_index].item()
-to_idx = dataset.episode_data_index["to"][episode_index].item()
+from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
+to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]

 # Then we grab all the image frames from the first camera:
 camera_key = dataset.meta.camera_keys[0]
--- a/examples/port_datasets/display_error_files.py
+++ b/examples/port_datasets/display_error_files.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import json
+from pathlib import Path
+
+
+def find_missing_workers(completions_dir, world_size):
+    """Find workers that are not completed and returns their indices."""
+    full = list(range(world_size))
+
+    completed = []
+    for path in completions_dir.glob("*"):
+        if path.name in [".", ".."]:
+            continue
+        index = path.name.lstrip("0")
+        index = 0 if index == "" else int(index)
+        completed.append(index)
+
+    missing_workers = set(full) - set(completed)
+    return missing_workers
+
+
+def find_output_files(slurm_dir, worker_indices):
+    """Find output files associated to worker indices, and return tuples
+    of (worker index, output file path)
+    """
+    out_files = []
+    for path in slurm_dir.glob("*.out"):
+        _, worker_id = path.name.replace(".out", "").split("_")
+        worker_id = int(worker_id)
+        if worker_id in worker_indices:
+            out_files.append((worker_id, path))
+    return out_files
+
+
+def display_error_files(logs_dir, job_name):
+    executor_path = Path(logs_dir) / job_name / "executor.json"
+    completions_dir = Path(logs_dir) / job_name / "completions"
+
+    with open(executor_path) as f:
+        executor = json.load(f)
+
+    missing_workers = find_missing_workers(completions_dir, executor["world_size"])
+
+    for missing in sorted(missing_workers)[::-1]:
+        print(missing)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--logs-dir",
+        type=str,
+        help="Path to logs directory for `datatrove`.",
+    )
+    parser.add_argument(
+        "--job-name",
+        type=str,
+        default="port_droid",
+        help="Job name used in slurm, and name of the directory created inside the provided logs directory.",
+    )
+
+    args = parser.parse_args()
+
+    display_error_files(**vars(args))
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/port_datasets/port_droid.py
+++ b/examples/port_datasets/port_droid.py
@@ -0,0 +1,430 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import logging
+import time
+from pathlib import Path
+
+import numpy as np
+import tensorflow_datasets as tfds
+
+from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
+from lerobot.utils.utils import get_elapsed_time_in_days_hours_minutes_seconds
+
+DROID_SHARDS = 2048
+DROID_FPS = 15
+DROID_ROBOT_TYPE = "Franka"
+
+# Dataset schema slightly adapted from: https://droid-dataset.github.io/droid/the-droid-dataset.html#-dataset-schema
+DROID_FEATURES = {
+    # true on first step of the episode
+    "is_first": {
+        "dtype": "bool",
+        "shape": (1,),
+        "names": None,
+    },
+    # true on last step of the episode
+    "is_last": {
+        "dtype": "bool",
+        "shape": (1,),
+        "names": None,
+    },
+    # true on last step of the episode if it is a terminal step, True for demos
+    "is_terminal": {
+        "dtype": "bool",
+        "shape": (1,),
+        "names": None,
+    },
+    # language_instruction is also stored as "task" to follow LeRobot standard
+    "language_instruction": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "language_instruction_2": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "language_instruction_3": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "observation.state.gripper_position": {
+        "dtype": "float32",
+        "shape": (1,),
+        "names": {
+            "axes": ["gripper"],
+        },
+    },
+    "observation.state.cartesian_position": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "observation.state.joint_position": {
+        "dtype": "float32",
+        "shape": (7,),
+        "names": {
+            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
+        },
+    },
+    # Add this new feature to follow LeRobot standard of using joint position + gripper
+    "observation.state": {
+        "dtype": "float32",
+        "shape": (8,),
+        "names": {
+            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
+        },
+    },
+    # Initially called wrist_image_left
+    "observation.images.wrist_left": {
+        "dtype": "video",
+        "shape": (180, 320, 3),
+        "names": [
+            "height",
+            "width",
+            "channels",
+        ],
+    },
+    # Initially called exterior_image_1_left
+    "observation.images.exterior_1_left": {
+        "dtype": "video",
+        "shape": (180, 320, 3),
+        "names": [
+            "height",
+            "width",
+            "channels",
+        ],
+    },
+    # Initially called exterior_image_2_left
+    "observation.images.exterior_2_left": {
+        "dtype": "video",
+        "shape": (180, 320, 3),
+        "names": [
+            "height",
+            "width",
+            "channels",
+        ],
+    },
+    "action.gripper_position": {
+        "dtype": "float32",
+        "shape": (1,),
+        "names": {
+            "axes": ["gripper"],
+        },
+    },
+    "action.gripper_velocity": {
+        "dtype": "float32",
+        "shape": (1,),
+        "names": {
+            "axes": ["gripper"],
+        },
+    },
+    "action.cartesian_position": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "action.cartesian_velocity": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "action.joint_position": {
+        "dtype": "float32",
+        "shape": (7,),
+        "names": {
+            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
+        },
+    },
+    "action.joint_velocity": {
+        "dtype": "float32",
+        "shape": (7,),
+        "names": {
+            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
+        },
+    },
+    # This feature was called "action" in RLDS dataset and consists of [6x joint velocities, 1x gripper position]
+    "action.original": {
+        "dtype": "float32",
+        "shape": (7,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw", "gripper"],
+        },
+    },
+    # Add this new feature to follow LeRobot standard of using joint position + gripper
+    "action": {
+        "dtype": "float32",
+        "shape": (8,),
+        "names": {
+            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
+        },
+    },
+    "discount": {
+        "dtype": "float32",
+        "shape": (1,),
+        "names": None,
+    },
+    "reward": {
+        "dtype": "float32",
+        "shape": (1,),
+        "names": None,
+    },
+    # Meta data that are the same for all frames in the episode
+    "task_category": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "building": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "collector_id": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "date": {
+        "dtype": "string",
+        "shape": (1,),
+        "names": None,
+    },
+    "camera_extrinsics.wrist_left": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "camera_extrinsics.exterior_1_left": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "camera_extrinsics.exterior_2_left": {
+        "dtype": "float32",
+        "shape": (6,),
+        "names": {
+            "axes": ["x", "y", "z", "roll", "pitch", "yaw"],
+        },
+    },
+    "is_episode_successful": {
+        "dtype": "bool",
+        "shape": (1,),
+        "names": None,
+    },
+}
+
+
+def is_episode_successful(tf_episode_metadata):
+    # Adapted from: https://github.com/droid-dataset/droid_policy_learning/blob/dd1020eb20d981f90b5ff07dc80d80d5c0cb108b/robomimic/utils/rlds_utils.py#L8
+    return "/success/" in tf_episode_metadata["file_path"].numpy().decode()
+
+
+def generate_lerobot_frames(tf_episode):
+    m = tf_episode["episode_metadata"]
+    frame_meta = {
+        "task_category": m["building"].numpy().decode(),
+        "building": m["building"].numpy().decode(),
+        "collector_id": m["collector_id"].numpy().decode(),
+        "date": m["date"].numpy().decode(),
+        "camera_extrinsics.wrist_left": m["extrinsics_wrist_cam"].numpy(),
+        "camera_extrinsics.exterior_1_left": m["extrinsics_exterior_cam_1"].numpy(),
+        "camera_extrinsics.exterior_2_left": m["extrinsics_exterior_cam_2"].numpy(),
+        "is_episode_successful": np.array([is_episode_successful(m)]),
+    }
+    for f in tf_episode["steps"]:
+        # Dataset schema slightly adapted from: https://droid-dataset.github.io/droid/the-droid-dataset.html#-dataset-schema
+        frame = {
+            "is_first": np.array([f["is_first"].numpy()]),
+            "is_last": np.array([f["is_last"].numpy()]),
+            "is_terminal": np.array([f["is_terminal"].numpy()]),
+            "language_instruction": f["language_instruction"].numpy().decode(),
+            "language_instruction_2": f["language_instruction_2"].numpy().decode(),
+            "language_instruction_3": f["language_instruction_3"].numpy().decode(),
+            "observation.state.gripper_position": f["observation"]["gripper_position"].numpy(),
+            "observation.state.cartesian_position": f["observation"]["cartesian_position"].numpy(),
+            "observation.state.joint_position": f["observation"]["joint_position"].numpy(),
+            "observation.images.wrist_left": f["observation"]["wrist_image_left"].numpy(),
+            "observation.images.exterior_1_left": f["observation"]["exterior_image_1_left"].numpy(),
+            "observation.images.exterior_2_left": f["observation"]["exterior_image_2_left"].numpy(),
+            "action.gripper_position": f["action_dict"]["gripper_position"].numpy(),
+            "action.gripper_velocity": f["action_dict"]["gripper_velocity"].numpy(),
+            "action.cartesian_position": f["action_dict"]["cartesian_position"].numpy(),
+            "action.cartesian_velocity": f["action_dict"]["cartesian_velocity"].numpy(),
+            "action.joint_position": f["action_dict"]["joint_position"].numpy(),
+            "action.joint_velocity": f["action_dict"]["joint_velocity"].numpy(),
+            "discount": np.array([f["discount"].numpy()]),
+            "reward": np.array([f["reward"].numpy()]),
+            "action.original": f["action"].numpy(),
+        }
+
+        # language_instruction is also stored as "task" to follow LeRobot standard
+        frame["task"] = frame["language_instruction"]
+
+        # Add this new feature to follow LeRobot standard of using joint position + gripper
+        frame["observation.state"] = np.concatenate(
+            [frame["observation.state.joint_position"], frame["observation.state.gripper_position"]]
+        )
+        frame["action"] = np.concatenate([frame["action.joint_position"], frame["action.gripper_position"]])
+
+        # Meta data that are the same for all frames in the episode
+        frame.update(frame_meta)
+
+        # Cast fp64 to fp32
+        for key in frame:
+            if isinstance(frame[key], np.ndarray) and frame[key].dtype == np.float64:
+                frame[key] = frame[key].astype(np.float32)
+
+        yield frame
+
+
+def port_droid(
+    raw_dir: Path,
+    repo_id: str,
+    push_to_hub: bool = False,
+    num_shards: int | None = None,
+    shard_index: int | None = None,
+):
+    dataset_name = raw_dir.parent.name
+    version = raw_dir.name
+    data_dir = raw_dir.parent.parent
+
+    builder = tfds.builder(f"{dataset_name}/{version}", data_dir=data_dir, version="")
+
+    if num_shards is not None:
+        tfds_num_shards = builder.info.splits["train"].num_shards
+        if tfds_num_shards != DROID_SHARDS:
+            raise ValueError(
+                f"Number of shards of Droid dataset is expected to be {DROID_SHARDS} but is {tfds_num_shards}."
+            )
+        if num_shards != tfds_num_shards:
+            raise ValueError(
+                f"We only shard over the fixed number of shards provided by tensorflow dataset ({tfds_num_shards}), but {num_shards} shards provided instead."
+            )
+        if shard_index >= tfds_num_shards:
+            raise ValueError(
+                f"Shard index is greater than the num of shards ({shard_index} >= {num_shards})."
+            )
+
+        raw_dataset = builder.as_dataset(split=f"train[{shard_index}shard]")
+    else:
+        raw_dataset = builder.as_dataset(split="train")
+
+    lerobot_dataset = LeRobotDataset.create(
+        repo_id=repo_id,
+        robot_type=DROID_ROBOT_TYPE,
+        fps=DROID_FPS,
+        features=DROID_FEATURES,
+    )
+
+    start_time = time.time()
+    num_episodes = raw_dataset.cardinality().numpy().item()
+    logging.info(f"Number of episodes {num_episodes}")
+
+    for episode_index, episode in enumerate(raw_dataset):
+        elapsed_time = time.time() - start_time
+        d, h, m, s = get_elapsed_time_in_days_hours_minutes_seconds(elapsed_time)
+
+        logging.info(
+            f"{episode_index} / {num_episodes} episodes processed (after {d} days, {h} hours, {m} minutes, {s:.3f} seconds)"
+        )
+
+        for frame in generate_lerobot_frames(episode):
+            lerobot_dataset.add_frame(frame)
+
+        lerobot_dataset.save_episode()
+        logging.info("Save_episode")
+
+    if push_to_hub:
+        lerobot_dataset.push_to_hub(
+            # Add openx tag, since it belongs to the openx collection of datasets
+            tags=["openx"],
+            private=False,
+        )
+
+
+def validate_dataset(repo_id):
+    """Sanity check that ensure meta data can be loaded and all files are present."""
+    meta = LeRobotDatasetMetadata(repo_id)
+
+    if meta.total_episodes == 0:
+        raise ValueError("Number of episodes is 0.")
+
+    for ep_idx in range(meta.total_episodes):
+        data_path = meta.root / meta.get_data_file_path(ep_idx)
+
+        if not data_path.exists():
+            raise ValueError(f"Parquet file is missing in: {data_path}")
+
+        for vid_key in meta.video_keys:
+            vid_path = meta.root / meta.get_video_file_path(ep_idx, vid_key)
+            if not vid_path.exists():
+                raise ValueError(f"Video file is missing in: {vid_path}")
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--raw-dir",
+        type=Path,
+        required=True,
+        help="Directory containing input raw datasets (e.g. `path/to/dataset` or `path/to/dataset/version).",
+    )
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        help="Repositery identifier on Hugging Face: a community or a user name `/` the name of the dataset, required when push-to-hub is True",
+    )
+    parser.add_argument(
+        "--push-to-hub",
+        action="store_true",
+        help="Upload to hub.",
+    )
+    parser.add_argument(
+        "--num-shards",
+        type=int,
+        default=None,
+        help="Number of shards. Can be either None to load the full dataset, or 2048 to load one of the 2048 tensorflow dataset files.",
+    )
+    parser.add_argument(
+        "--shard-index",
+        type=int,
+        default=None,
+        help="Index of the shard. Can be either None to load the full dataset, or in [0,2047] to load one of the 2048 tensorflow dataset files.",
+    )
+
+    args = parser.parse_args()
+
+    port_droid(**vars(args))
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/port_datasets/slurm_aggregate_shards.py
+++ b/examples/port_datasets/slurm_aggregate_shards.py
@@ -0,0 +1,148 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import logging
+from pathlib import Path
+
+from datatrove.executor import LocalPipelineExecutor
+from datatrove.executor.slurm import SlurmPipelineExecutor
+from datatrove.pipeline.base import PipelineStep
+from port_datasets.droid_rlds.port_droid import DROID_SHARDS
+
+from lerobot.datasets.aggregate import aggregate_datasets
+from lerobot.utils.utils import init_logging
+
+
+class AggregateDatasets(PipelineStep):
+    def __init__(
+        self,
+        repo_ids: list[str],
+        aggregated_repo_id: str,
+    ):
+        super().__init__()
+        self.repo_ids = repo_ids
+        self.aggr_repo_id = aggregated_repo_id
+
+    def run(self, data=None, rank: int = 0, world_size: int = 1):
+        init_logging()
+
+        # Since aggregate_datasets already handles parallel processing internally,
+        # we only need one worker to run the entire aggregation
+        if rank == 0:
+            logging.info(f"Starting aggregation of {len(self.repo_ids)} datasets into {self.aggr_repo_id}")
+            aggregate_datasets(self.repo_ids, self.aggr_repo_id)
+            logging.info("Aggregation complete!")
+        else:
+            logging.info(f"Worker {rank} skipping - only worker 0 performs aggregation")
+
+
+def make_aggregate_executor(
+    repo_ids, repo_id, job_name, logs_dir, workers, partition, cpus_per_task, mem_per_cpu, slurm=True
+):
+    kwargs = {
+        "pipeline": [
+            AggregateDatasets(repo_ids, repo_id),
+        ],
+        "logging_dir": str(logs_dir / job_name),
+    }
+
+    if slurm:
+        # For aggregation, we only need 1 task since aggregate_datasets handles everything
+        kwargs.update(
+            {
+                "job_name": job_name,
+                "tasks": 1,  # Only need 1 task for aggregation
+                "workers": 1,  # Only need 1 worker
+                "time": "08:00:00",
+                "partition": partition,
+                "cpus_per_task": cpus_per_task,
+                "sbatch_args": {"mem-per-cpu": mem_per_cpu},
+            }
+        )
+        executor = SlurmPipelineExecutor(**kwargs)
+    else:
+        kwargs.update(
+            {
+                "tasks": 1,
+                "workers": 1,
+            }
+        )
+        executor = LocalPipelineExecutor(**kwargs)
+
+    return executor
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        help="Repository identifier on Hugging Face: a community or a user name `/` the name of the dataset, required when push-to-hub is True.",
+    )
+    parser.add_argument(
+        "--logs-dir",
+        type=Path,
+        help="Path to logs directory for `datatrove`.",
+    )
+    parser.add_argument(
+        "--job-name",
+        type=str,
+        default="aggr_droid",
+        help="Job name used in slurm, and name of the directory created inside the provided logs directory.",
+    )
+    parser.add_argument(
+        "--slurm",
+        type=int,
+        default=1,
+        help="Launch over slurm. Use `--slurm 0` to launch sequentially (useful to debug).",
+    )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=1,  # Changed default to 1 since aggregation doesn't need multiple workers
+        help="Number of slurm workers. For aggregation, this should be 1.",
+    )
+    parser.add_argument(
+        "--partition",
+        type=str,
+        help="Slurm partition. Ideally a CPU partition. No need for GPU partition.",
+    )
+    parser.add_argument(
+        "--cpus-per-task",
+        type=int,
+        default=8,
+        help="Number of cpus that each slurm worker will use.",
+    )
+    parser.add_argument(
+        "--mem-per-cpu",
+        type=str,
+        default="1950M",
+        help="Memory per cpu that each worker will use.",
+    )
+
+    args = parser.parse_args()
+    kwargs = vars(args)
+    kwargs["slurm"] = kwargs.pop("slurm") == 1
+
+    repo_ids = [f"{args.repo_id}_world_{DROID_SHARDS}_rank_{rank}" for rank in range(DROID_SHARDS)]
+    aggregate_executor = make_aggregate_executor(repo_ids, **kwargs)
+    aggregate_executor.run()
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/port_datasets/slurm_port_shards.py
+++ b/examples/port_datasets/slurm_port_shards.py
@@ -0,0 +1,162 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+from pathlib import Path
+
+from datatrove.executor import LocalPipelineExecutor
+from datatrove.executor.slurm import SlurmPipelineExecutor
+from datatrove.pipeline.base import PipelineStep
+from port_datasets.droid_rlds.port_droid import DROID_SHARDS
+
+
+class PortDroidShards(PipelineStep):
+    def __init__(
+        self,
+        raw_dir: Path | str,
+        repo_id: str = None,
+    ):
+        super().__init__()
+        self.raw_dir = Path(raw_dir)
+        self.repo_id = repo_id
+
+    def run(self, data=None, rank: int = 0, world_size: int = 1):
+        from datasets.utils.tqdm import disable_progress_bars
+        from port_datasets.droid_rlds.port_droid import port_droid, validate_dataset
+
+        from lerobot.utils.utils import init_logging
+
+        init_logging()
+        disable_progress_bars()
+
+        shard_repo_id = f"{self.repo_id}_world_{world_size}_rank_{rank}"
+
+        try:
+            validate_dataset(shard_repo_id)
+            return
+        except Exception:
+            pass  # nosec B110 - Dataset doesn't exist yet, continue with porting
+
+        port_droid(
+            self.raw_dir,
+            shard_repo_id,
+            push_to_hub=False,
+            num_shards=world_size,
+            shard_index=rank,
+        )
+
+        validate_dataset(shard_repo_id)
+
+
+def make_port_executor(
+    raw_dir, repo_id, job_name, logs_dir, workers, partition, cpus_per_task, mem_per_cpu, slurm=True
+):
+    kwargs = {
+        "pipeline": [
+            PortDroidShards(raw_dir, repo_id),
+        ],
+        "logging_dir": str(logs_dir / job_name),
+    }
+
+    if slurm:
+        kwargs.update(
+            {
+                "job_name": job_name,
+                "tasks": DROID_SHARDS,
+                "workers": workers,
+                "time": "08:00:00",
+                "partition": partition,
+                "cpus_per_task": cpus_per_task,
+                "sbatch_args": {"mem-per-cpu": mem_per_cpu},
+            }
+        )
+        executor = SlurmPipelineExecutor(**kwargs)
+    else:
+        kwargs.update(
+            {
+                "tasks": 1,
+                "workers": 1,
+            }
+        )
+        executor = LocalPipelineExecutor(**kwargs)
+
+    return executor
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--raw-dir",
+        type=Path,
+        required=True,
+        help="Directory containing input raw datasets (e.g. `path/to/dataset` or `path/to/dataset/version).",
+    )
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        help="Repositery identifier on Hugging Face: a community or a user name `/` the name of the dataset, required when push-to-hub is True.",
+    )
+    parser.add_argument(
+        "--logs-dir",
+        type=Path,
+        help="Path to logs directory for `datatrove`.",
+    )
+    parser.add_argument(
+        "--job-name",
+        type=str,
+        default="port_droid",
+        help="Job name used in slurm, and name of the directory created inside the provided logs directory.",
+    )
+    parser.add_argument(
+        "--slurm",
+        type=int,
+        default=1,
+        help="Launch over slurm. Use `--slurm 0` to launch sequentially (useful to debug).",
+    )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=2048,
+        help="Number of slurm workers. It should be less than the maximum number of shards.",
+    )
+    parser.add_argument(
+        "--partition",
+        type=str,
+        help="Slurm partition. Ideally a CPU partition. No need for GPU partition.",
+    )
+    parser.add_argument(
+        "--cpus-per-task",
+        type=int,
+        default=8,
+        help="Number of cpus that each slurm worker will use.",
+    )
+    parser.add_argument(
+        "--mem-per-cpu",
+        type=str,
+        default="1950M",
+        help="Memory per cpu that each worker will use.",
+    )
+
+    args = parser.parse_args()
+    kwargs = vars(args)
+    kwargs["slurm"] = kwargs.pop("slurm") == 1
+    port_executor = make_port_executor(**kwargs)
+    port_executor.run()
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/port_datasets/slurm_upload.py
+++ b/examples/port_datasets/slurm_upload.py
@@ -0,0 +1,281 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import logging
+import os
+from pathlib import Path
+
+from datatrove.executor import LocalPipelineExecutor
+from datatrove.executor.slurm import SlurmPipelineExecutor
+from datatrove.pipeline.base import PipelineStep
+from huggingface_hub import HfApi
+from huggingface_hub.constants import REPOCARD_NAME
+from port_datasets.droid_rlds.port_droid import DROID_SHARDS
+
+from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDatasetMetadata
+from lerobot.datasets.utils import create_lerobot_dataset_card
+from lerobot.utils.utils import init_logging
+
+
+class UploadDataset(PipelineStep):
+    def __init__(
+        self,
+        repo_id: str,
+        branch: str | None = None,
+        revision: str | None = None,
+        tags: list | None = None,
+        license: str | None = "apache-2.0",
+        private: bool = False,
+        distant_repo_id: str | None = None,
+        **card_kwargs,
+    ):
+        super().__init__()
+        self.repo_id = repo_id
+        self.distant_repo_id = self.repo_id if distant_repo_id is None else distant_repo_id
+        self.branch = branch
+        self.tags = tags
+        self.license = license
+        self.private = private
+        self.card_kwargs = card_kwargs
+        self.revision = revision if revision else CODEBASE_VERSION
+
+        if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER", "0") != "1":
+            logging.warning(
+                'HF_HUB_ENABLE_HF_TRANSFER is not set to "1". Install hf_transfer and set the env '
+                "variable for faster uploads:\npip install hf-transfer\nexport HF_HUB_ENABLE_HF_TRANSFER=1"
+            )
+
+        self.create_repo()
+
+    def create_repo(self):
+        logging.info(f"Loading meta data from {self.repo_id}...")
+        meta = LeRobotDatasetMetadata(self.repo_id)
+
+        logging.info(f"Creating repo {self.distant_repo_id}...")
+        hub_api = HfApi()
+        hub_api.create_repo(
+            repo_id=self.distant_repo_id,
+            private=self.private,
+            repo_type="dataset",
+            exist_ok=True,
+        )
+        if self.branch:
+            hub_api.create_branch(
+                repo_id=self.distant_repo_id,
+                branch=self.branch,
+                revision=self.revision,
+                repo_type="dataset",
+                exist_ok=True,
+            )
+
+        if not hub_api.file_exists(
+            self.distant_repo_id, REPOCARD_NAME, repo_type="dataset", revision=self.branch
+        ):
+            card = create_lerobot_dataset_card(
+                tags=self.tags, dataset_info=meta.info, license=self.license, **self.card_kwargs
+            )
+            card.push_to_hub(repo_id=self.distant_repo_id, repo_type="dataset", revision=self.branch)
+
+            hub_api.create_tag(self.distant_repo_id, tag=CODEBASE_VERSION, repo_type="dataset")
+
+        def list_files_recursively(directory):
+            base_path = Path(directory)
+            return [str(file.relative_to(base_path)) for file in base_path.rglob("*") if file.is_file()]
+
+        logging.info(f"Listing all local files from {self.repo_id}...")
+        self.file_paths = list_files_recursively(meta.root)
+        self.file_paths = sorted(self.file_paths)
+
+    def create_chunks(self, lst, n):
+        from itertools import islice
+
+        it = iter(lst)
+        return [list(islice(it, size)) for size in [len(lst) // n + (i < len(lst) % n) for i in range(n)]]
+
+    def create_commits(self, additions):
+        import logging
+        import math
+        import random
+        import time
+
+        from huggingface_hub import create_commit
+        from huggingface_hub.utils import HfHubHTTPError
+
+        FILES_BETWEEN_COMMITS = 10  # noqa: N806
+        BASE_DELAY = 0.1  # noqa: N806
+        MAX_RETRIES = 12  # noqa: N806
+
+        # Split the files into smaller chunks for faster commit
+        # and avoiding "A commit has happened since" error
+        num_chunks = math.ceil(len(additions) / FILES_BETWEEN_COMMITS)
+        chunks = self.create_chunks(additions, num_chunks)
+
+        for chunk in chunks:
+            retries = 0
+            while True:
+                try:
+                    create_commit(
+                        self.distant_repo_id,
+                        repo_type="dataset",
+                        operations=chunk,
+                        commit_message=f"DataTrove upload ({len(chunk)} files)",
+                        revision=self.branch,
+                    )
+                    # TODO: every 100 chunks super_squach_commits()
+                    logging.info("create_commit completed!")
+                    break
+                except HfHubHTTPError as e:
+                    if "A commit has happened since" in e.server_message:
+                        if retries >= MAX_RETRIES:
+                            logging.error(f"Failed to create commit after {MAX_RETRIES=}. Giving up.")
+                            raise e
+                        logging.info("Commit creation race condition issue. Waiting...")
+                        time.sleep(BASE_DELAY * 2**retries + random.uniform(0, 2))
+                        retries += 1
+                    else:
+                        raise e
+
+    def run(self, data=None, rank: int = 0, world_size: int = 1):
+        import logging
+
+        from datasets.utils.tqdm import disable_progress_bars
+        from huggingface_hub import CommitOperationAdd, preupload_lfs_files
+
+        from lerobot.datasets.lerobot_dataset import LeRobotDatasetMetadata
+        from lerobot.utils.utils import init_logging
+
+        init_logging()
+        disable_progress_bars()
+
+        chunks = self.create_chunks(self.file_paths, world_size)
+        file_paths = chunks[rank]
+
+        if len(file_paths) == 0:
+            raise ValueError(file_paths)
+
+        logging.info("Pre-uploading LFS files...")
+        for i, path in enumerate(file_paths):
+            logging.info(f"{i}: {path}")
+
+        meta = LeRobotDatasetMetadata(self.repo_id)
+        additions = [
+            CommitOperationAdd(path_in_repo=path, path_or_fileobj=meta.root / path) for path in file_paths
+        ]
+        preupload_lfs_files(
+            repo_id=self.distant_repo_id, repo_type="dataset", additions=additions, revision=self.branch
+        )
+
+        logging.info("Creating commits...")
+        self.create_commits(additions)
+        logging.info("Done!")
+
+
+def make_upload_executor(
+    repo_id, job_name, logs_dir, workers, partition, cpus_per_task, mem_per_cpu, slurm=True
+):
+    kwargs = {
+        "pipeline": [
+            UploadDataset(repo_id),
+        ],
+        "logging_dir": str(logs_dir / job_name),
+    }
+
+    if slurm:
+        kwargs.update(
+            {
+                "job_name": job_name,
+                "tasks": DROID_SHARDS,
+                "workers": workers,
+                "time": "08:00:00",
+                "partition": partition,
+                "cpus_per_task": cpus_per_task,
+                "sbatch_args": {"mem-per-cpu": mem_per_cpu},
+            }
+        )
+        executor = SlurmPipelineExecutor(**kwargs)
+    else:
+        kwargs.update(
+            {
+                "tasks": DROID_SHARDS,
+                "workers": 1,
+            }
+        )
+        executor = LocalPipelineExecutor(**kwargs)
+
+    return executor
+
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        help="Repositery identifier on Hugging Face: a community or a user name `/` the name of the dataset, required when push-to-hub is True.",
+    )
+    parser.add_argument(
+        "--logs-dir",
+        type=Path,
+        help="Path to logs directory for `datatrove`.",
+    )
+    parser.add_argument(
+        "--job-name",
+        type=str,
+        default="upload_droid",
+        help="Job name used in slurm, and name of the directory created inside the provided logs directory.",
+    )
+    parser.add_argument(
+        "--slurm",
+        type=int,
+        default=1,
+        help="Launch over slurm. Use `--slurm 0` to launch sequentially (useful to debug).",
+    )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=50,
+        help="Number of slurm workers. It should be less than the maximum number of shards.",
+    )
+    parser.add_argument(
+        "--partition",
+        type=str,
+        help="Slurm partition. Ideally a CPU partition. No need for GPU partition.",
+    )
+    parser.add_argument(
+        "--cpus-per-task",
+        type=int,
+        default=8,
+        help="Number of cpus that each slurm worker will use.",
+    )
+    parser.add_argument(
+        "--mem-per-cpu",
+        type=str,
+        default="1950M",
+        help="Memory per cpu that each worker will use.",
+    )
+
+    init_logging()
+
+    args = parser.parse_args()
+    kwargs = vars(args)
+    kwargs["slurm"] = kwargs.pop("slurm") == 1
+    upload_executor = make_upload_executor(**kwargs)
+    upload_executor.run()
+
+
+if __name__ == "__main__":
+    main()
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -84,7 +84,6 @@ dependencies = [

    # Support dependencies
    "deepdiff>=7.0.1,<9.0.0",
-    "flask>=3.0.3,<4.0.0",
    "imageio[ffmpeg]>=2.34.0,<3.0.0",
    "termcolor>=2.4.0,<4.0.0",
 ]
--- a/src/lerobot/datasets/aggregate.py
+++ b/src/lerobot/datasets/aggregate.py
@@ -0,0 +1,502 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team.
+# All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import shutil
+from pathlib import Path
+
+import pandas as pd
+import tqdm
+
+from lerobot.datasets.compute_stats import aggregate_stats
+from lerobot.datasets.lerobot_dataset import LeRobotDatasetMetadata
+from lerobot.datasets.utils import (
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_DATA_PATH,
+    DEFAULT_EPISODES_PATH,
+    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+    DEFAULT_VIDEO_PATH,
+    get_parquet_file_size_in_mb,
+    get_video_size_in_mb,
+    to_parquet_with_hf_images,
+    update_chunk_file_indices,
+    write_info,
+    write_stats,
+    write_tasks,
+)
+from lerobot.datasets.video_utils import concatenate_video_files
+
+
+def validate_all_metadata(all_metadata: list[LeRobotDatasetMetadata]):
+    """Validates that all dataset metadata have consistent properties.
+
+    Ensures all datasets have the same fps, robot_type, and features to guarantee
+    compatibility when aggregating them into a single dataset.
+
+    Args:
+        all_metadata: List of LeRobotDatasetMetadata objects to validate.
+
+    Returns:
+        tuple: A tuple containing (fps, robot_type, features) from the first metadata.
+
+    Raises:
+        ValueError: If any metadata has different fps, robot_type, or features
+                   than the first metadata in the list.
+    """
+
+    fps = all_metadata[0].fps
+    robot_type = all_metadata[0].robot_type
+    features = all_metadata[0].features
+
+    for meta in tqdm.tqdm(all_metadata, desc="Validate all meta data"):
+        if fps != meta.fps:
+            raise ValueError(f"Same fps is expected, but got fps={meta.fps} instead of {fps}.")
+        if robot_type != meta.robot_type:
+            raise ValueError(
+                f"Same robot_type is expected, but got robot_type={meta.robot_type} instead of {robot_type}."
+            )
+        if features != meta.features:
+            raise ValueError(
+                f"Same features is expected, but got features={meta.features} instead of {features}."
+            )
+
+    return fps, robot_type, features
+
+
+def update_data_df(df, src_meta, dst_meta):
+    """Updates a data DataFrame with new indices and task mappings for aggregation.
+
+    Adjusts episode indices, frame indices, and task indices to account for
+    previously aggregated data in the destination dataset.
+
+    Args:
+        df: DataFrame containing the data to be updated.
+        src_meta: Source dataset metadata.
+        dst_meta: Destination dataset metadata.
+
+    Returns:
+        pd.DataFrame: Updated DataFrame with adjusted indices.
+    """
+
+    def _update(row):
+        row["episode_index"] = row["episode_index"] + dst_meta.info["total_episodes"]
+        row["index"] = row["index"] + dst_meta.info["total_frames"]
+        task = src_meta.tasks.iloc[row["task_index"]].name
+        row["task_index"] = dst_meta.tasks.loc[task].task_index.item()
+        return row
+
+    return df.apply(_update, axis=1)
+
+
+def update_meta_data(
+    df,
+    dst_meta,
+    meta_idx,
+    data_idx,
+    videos_idx,
+):
+    """Updates metadata DataFrame with new chunk, file, and timestamp indices.
+
+    Adjusts all indices and timestamps to account for previously aggregated
+    data and videos in the destination dataset.
+
+    Args:
+        df: DataFrame containing the metadata to be updated.
+        dst_meta: Destination dataset metadata.
+        meta_idx: Dictionary containing current metadata chunk and file indices.
+        data_idx: Dictionary containing current data chunk and file indices.
+        videos_idx: Dictionary containing current video indices and timestamps.
+
+    Returns:
+        pd.DataFrame: Updated DataFrame with adjusted indices and timestamps.
+    """
+
+    def _update(row):
+        row["meta/episodes/chunk_index"] = row["meta/episodes/chunk_index"] + meta_idx["chunk"]
+        row["meta/episodes/file_index"] = row["meta/episodes/file_index"] + meta_idx["file"]
+        row["data/chunk_index"] = row["data/chunk_index"] + data_idx["chunk"]
+        row["data/file_index"] = row["data/file_index"] + data_idx["file"]
+        for key, video_idx in videos_idx.items():
+            row[f"videos/{key}/chunk_index"] = row[f"videos/{key}/chunk_index"] + video_idx["chunk"]
+            row[f"videos/{key}/file_index"] = row[f"videos/{key}/file_index"] + video_idx["file"]
+            row[f"videos/{key}/from_timestamp"] = (
+                row[f"videos/{key}/from_timestamp"] + video_idx["latest_duration"]
+            )
+            row[f"videos/{key}/to_timestamp"] = (
+                row[f"videos/{key}/to_timestamp"] + video_idx["latest_duration"]
+            )
+
+        row["dataset_from_index"] = row["dataset_from_index"] + dst_meta.info["total_frames"]
+        row["dataset_to_index"] = row["dataset_to_index"] + dst_meta.info["total_frames"]
+        row["episode_index"] = row["episode_index"] + dst_meta.info["total_episodes"]
+        return row
+
+    return df.apply(_update, axis=1)
+
+
+def aggregate_datasets(
+    repo_ids: list[str],
+    aggr_repo_id: str,
+    roots: list[Path] | None = None,
+    aggr_root: Path | None = None,
+    data_files_size_in_mb: float | None = None,
+    video_files_size_in_mb: float | None = None,
+    chunk_size: int | None = None,
+):
+    """Aggregates multiple LeRobot datasets into a single unified dataset.
+
+    This is the main function that orchestrates the aggregation process by:
+    1. Loading and validating all source dataset metadata
+    2. Creating a new destination dataset with unified tasks
+    3. Aggregating videos, data, and metadata from all source datasets
+    4. Finalizing the aggregated dataset with proper statistics
+
+    Args:
+        repo_ids: List of repository IDs for the datasets to aggregate.
+        aggr_repo_id: Repository ID for the aggregated output dataset.
+        roots: Optional list of root paths for the source datasets.
+        aggr_root: Optional root path for the aggregated dataset.
+        data_files_size_in_mb: Maximum size for data files in MB (defaults to DEFAULT_DATA_FILE_SIZE_IN_MB)
+        video_files_size_in_mb: Maximum size for video files in MB (defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB)
+        chunk_size: Maximum number of files per chunk (defaults to DEFAULT_CHUNK_SIZE)
+    """
+    logging.info("Start aggregate_datasets")
+
+    if data_files_size_in_mb is None:
+        data_files_size_in_mb = DEFAULT_DATA_FILE_SIZE_IN_MB
+    if video_files_size_in_mb is None:
+        video_files_size_in_mb = DEFAULT_VIDEO_FILE_SIZE_IN_MB
+    if chunk_size is None:
+        chunk_size = DEFAULT_CHUNK_SIZE
+
+    all_metadata = (
+        [LeRobotDatasetMetadata(repo_id) for repo_id in repo_ids]
+        if roots is None
+        else [
+            LeRobotDatasetMetadata(repo_id, root=root) for repo_id, root in zip(repo_ids, roots, strict=False)
+        ]
+    )
+    fps, robot_type, features = validate_all_metadata(all_metadata)
+    video_keys = [key for key in features if features[key]["dtype"] == "video"]
+
+    dst_meta = LeRobotDatasetMetadata.create(
+        repo_id=aggr_repo_id,
+        fps=fps,
+        robot_type=robot_type,
+        features=features,
+        root=aggr_root,
+    )
+
+    logging.info("Find all tasks")
+    unique_tasks = pd.concat([m.tasks for m in all_metadata]).index.unique()
+    dst_meta.tasks = pd.DataFrame({"task_index": range(len(unique_tasks))}, index=unique_tasks)
+
+    meta_idx = {"chunk": 0, "file": 0}
+    data_idx = {"chunk": 0, "file": 0}
+    videos_idx = {
+        key: {"chunk": 0, "file": 0, "latest_duration": 0, "episode_duration": 0} for key in video_keys
+    }
+
+    dst_meta.episodes = {}
+
+    for src_meta in tqdm.tqdm(all_metadata, desc="Copy data and videos"):
+        videos_idx = aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chunk_size)
+        data_idx = aggregate_data(src_meta, dst_meta, data_idx, data_files_size_in_mb, chunk_size)
+
+        meta_idx = aggregate_metadata(src_meta, dst_meta, meta_idx, data_idx, videos_idx)
+
+        dst_meta.info["total_episodes"] += src_meta.total_episodes
+        dst_meta.info["total_frames"] += src_meta.total_frames
+
+    finalize_aggregation(dst_meta, all_metadata)
+    logging.info("Aggregation complete.")
+
+
+def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chunk_size):
+    """Aggregates video chunks from a source dataset into the destination dataset.
+
+    Handles video file concatenation and rotation based on file size limits.
+    Creates new video files when size limits are exceeded.
+
+    Args:
+        src_meta: Source dataset metadata.
+        dst_meta: Destination dataset metadata.
+        videos_idx: Dictionary tracking video chunk and file indices.
+        video_files_size_in_mb: Maximum size for video files in MB (defaults to DEFAULT_VIDEO_FILE_SIZE_IN_MB)
+        chunk_size: Maximum number of files per chunk (defaults to DEFAULT_CHUNK_SIZE)
+
+    Returns:
+        dict: Updated videos_idx with current chunk and file indices.
+    """
+    for key, video_idx in videos_idx.items():
+        unique_chunk_file_pairs = {
+            (chunk, file)
+            for chunk, file in zip(
+                src_meta.episodes[f"videos/{key}/chunk_index"],
+                src_meta.episodes[f"videos/{key}/file_index"],
+                strict=False,
+            )
+        }
+        unique_chunk_file_pairs = sorted(unique_chunk_file_pairs)
+
+        chunk_idx = video_idx["chunk"]
+        file_idx = video_idx["file"]
+
+        for src_chunk_idx, src_file_idx in unique_chunk_file_pairs:
+            src_path = src_meta.root / DEFAULT_VIDEO_PATH.format(
+                video_key=key,
+                chunk_index=src_chunk_idx,
+                file_index=src_file_idx,
+            )
+
+            dst_path = dst_meta.root / DEFAULT_VIDEO_PATH.format(
+                video_key=key,
+                chunk_index=chunk_idx,
+                file_index=file_idx,
+            )
+
+            # If a new file is created, we don't want to increment the latest_duration
+            update_latest_duration = False
+
+            if not dst_path.exists():
+                # First write to this destination file
+                dst_path.parent.mkdir(parents=True, exist_ok=True)
+                shutil.copy(str(src_path), str(dst_path))
+                continue  # not accumulating further, already copied the file in place
+
+            # Check file sizes before appending
+            src_size = get_video_size_in_mb(src_path)
+            dst_size = get_video_size_in_mb(dst_path)
+
+            if dst_size + src_size >= video_files_size_in_mb:
+                # Rotate to a new chunk/file
+                chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, chunk_size)
+                dst_path = dst_meta.root / DEFAULT_VIDEO_PATH.format(
+                    video_key=key,
+                    chunk_index=chunk_idx,
+                    file_index=file_idx,
+                )
+                dst_path.parent.mkdir(parents=True, exist_ok=True)
+                shutil.copy(str(src_path), str(dst_path))
+            else:
+                # Get the timestamps shift for this video
+                timestamps_shift_s = dst_meta.info["total_frames"] / dst_meta.info["fps"]
+
+                # Append to existing video file
+                concatenate_video_files(
+                    [dst_path, src_path],
+                    dst_path,
+                )
+                # Update the latest_duration when appending (shifts timestamps!)
+                update_latest_duration = not update_latest_duration
+
+        # Update the videos_idx with the final chunk and file indices for this key
+        videos_idx[key]["chunk"] = chunk_idx
+        videos_idx[key]["file"] = file_idx
+
+        if update_latest_duration:
+            videos_idx[key]["latest_duration"] += timestamps_shift_s
+
+    return videos_idx
+
+
+def aggregate_data(src_meta, dst_meta, data_idx, data_files_size_in_mb, chunk_size):
+    """Aggregates data chunks from a source dataset into the destination dataset.
+
+    Reads source data files, updates indices to match the aggregated dataset,
+    and writes them to the destination with proper file rotation.
+
+    Args:
+        src_meta: Source dataset metadata.
+        dst_meta: Destination dataset metadata.
+        data_idx: Dictionary tracking data chunk and file indices.
+
+    Returns:
+        dict: Updated data_idx with current chunk and file indices.
+    """
+    unique_chunk_file_ids = {
+        (c, f)
+        for c, f in zip(
+            src_meta.episodes["data/chunk_index"], src_meta.episodes["data/file_index"], strict=False
+        )
+    }
+
+    unique_chunk_file_ids = sorted(unique_chunk_file_ids)
+
+    for src_chunk_idx, src_file_idx in unique_chunk_file_ids:
+        src_path = src_meta.root / DEFAULT_DATA_PATH.format(
+            chunk_index=src_chunk_idx, file_index=src_file_idx
+        )
+        df = pd.read_parquet(src_path)
+        df = update_data_df(df, src_meta, dst_meta)
+
+        data_idx = append_or_create_parquet_file(
+            df,
+            src_path,
+            data_idx,
+            data_files_size_in_mb,
+            chunk_size,
+            DEFAULT_DATA_PATH,
+            contains_images=len(dst_meta.image_keys) > 0,
+            aggr_root=dst_meta.root,
+        )
+
+    return data_idx
+
+
+def aggregate_metadata(src_meta, dst_meta, meta_idx, data_idx, videos_idx):
+    """Aggregates metadata from a source dataset into the destination dataset.
+
+    Reads source metadata files, updates all indices and timestamps,
+    and writes them to the destination with proper file rotation.
+
+    Args:
+        src_meta: Source dataset metadata.
+        dst_meta: Destination dataset metadata.
+        meta_idx: Dictionary tracking metadata chunk and file indices.
+        data_idx: Dictionary tracking data chunk and file indices.
+        videos_idx: Dictionary tracking video indices and timestamps.
+
+    Returns:
+        dict: Updated meta_idx with current chunk and file indices.
+    """
+    chunk_file_ids = {
+        (c, f)
+        for c, f in zip(
+            src_meta.episodes["meta/episodes/chunk_index"],
+            src_meta.episodes["meta/episodes/file_index"],
+            strict=False,
+        )
+    }
+
+    chunk_file_ids = sorted(chunk_file_ids)
+    for chunk_idx, file_idx in chunk_file_ids:
+        src_path = src_meta.root / DEFAULT_EPISODES_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
+        df = pd.read_parquet(src_path)
+        df = update_meta_data(
+            df,
+            dst_meta,
+            meta_idx,
+            data_idx,
+            videos_idx,
+        )
+
+        for k in videos_idx:
+            videos_idx[k]["latest_duration"] += videos_idx[k]["episode_duration"]
+
+        meta_idx = append_or_create_parquet_file(
+            df,
+            src_path,
+            meta_idx,
+            DEFAULT_DATA_FILE_SIZE_IN_MB,
+            DEFAULT_CHUNK_SIZE,
+            DEFAULT_EPISODES_PATH,
+            contains_images=False,
+            aggr_root=dst_meta.root,
+        )
+
+    return meta_idx
+
+
+def append_or_create_parquet_file(
+    df: pd.DataFrame,
+    src_path: Path,
+    idx: dict[str, int],
+    max_mb: float,
+    chunk_size: int,
+    default_path: str,
+    contains_images: bool = False,
+    aggr_root: Path = None,
+):
+    """Appends data to an existing parquet file or creates a new one based on size constraints.
+
+    Manages file rotation when size limits are exceeded to prevent individual files
+    from becoming too large. Handles both regular parquet files and those containing images.
+
+    Args:
+        df: DataFrame to write to the parquet file.
+        src_path: Path to the source file (used for size estimation).
+        idx: Dictionary containing current 'chunk' and 'file' indices.
+        max_mb: Maximum allowed file size in MB before rotation.
+        chunk_size: Maximum number of files per chunk before incrementing chunk index.
+        default_path: Format string for generating file paths.
+        contains_images: Whether the data contains images requiring special handling.
+        aggr_root: Root path for the aggregated dataset.
+
+    Returns:
+        dict: Updated index dictionary with current chunk and file indices.
+    """
+    dst_path = aggr_root / default_path.format(chunk_index=idx["chunk"], file_index=idx["file"])
+
+    if not dst_path.exists():
+        dst_path.parent.mkdir(parents=True, exist_ok=True)
+        if contains_images:
+            to_parquet_with_hf_images(df, dst_path)
+        else:
+            df.to_parquet(dst_path)
+        return idx
+
+    src_size = get_parquet_file_size_in_mb(src_path)
+    dst_size = get_parquet_file_size_in_mb(dst_path)
+
+    if dst_size + src_size >= max_mb:
+        idx["chunk"], idx["file"] = update_chunk_file_indices(idx["chunk"], idx["file"], chunk_size)
+        new_path = aggr_root / default_path.format(chunk_index=idx["chunk"], file_index=idx["file"])
+        new_path.parent.mkdir(parents=True, exist_ok=True)
+        final_df = df
+        target_path = new_path
+    else:
+        existing_df = pd.read_parquet(dst_path)
+        final_df = pd.concat([existing_df, df], ignore_index=True)
+        target_path = dst_path
+
+    if contains_images:
+        to_parquet_with_hf_images(final_df, target_path)
+    else:
+        final_df.to_parquet(target_path)
+
+    return idx
+
+
+def finalize_aggregation(aggr_meta, all_metadata):
+    """Finalizes the dataset aggregation by writing summary files and statistics.
+
+    Writes the tasks file, info file with total counts and splits, and
+    aggregated statistics from all source datasets.
+
+    Args:
+        aggr_meta: Aggregated dataset metadata.
+        all_metadata: List of all source dataset metadata objects.
+    """
+    logging.info("write tasks")
+    write_tasks(aggr_meta.tasks, aggr_meta.root)
+
+    logging.info("write info")
+    aggr_meta.info.update(
+        {
+            "total_tasks": len(aggr_meta.tasks),
+            "total_episodes": sum(m.total_episodes for m in all_metadata),
+            "total_frames": sum(m.total_frames for m in all_metadata),
+            "splits": {"train": f"0:{sum(m.total_episodes for m in all_metadata)}"},
+        }
+    )
+    write_info(aggr_meta.info, aggr_meta.root)
+
+    logging.info("write stats")
+    aggr_meta.stats = aggregate_stats([m.stats for m in all_metadata])
+    write_stats(aggr_meta.stats, aggr_meta.root)
--- a/src/lerobot/datasets/backward_compatibility.py
+++ b/src/lerobot/datasets/backward_compatibility.py
@@ -14,33 +14,13 @@

 import packaging.version

-V2_MESSAGE = """
+V30_MESSAGE = """
 The dataset you requested ({repo_id}) is in {version} format.

-We introduced a new format since v2.0 which is not backward compatible with v1.x.
-Please, use our conversion script. Modify the following command with your own task description:
+We introduced a new format since v3.0 which is not backward compatible with v2.1.
+Please, update your dataset to the new format using this command:
 ```
-python -m lerobot.datasets.v2.convert_dataset_v1_to_v2 \\
-    --repo-id {repo_id} \\
-    --single-task "TASK DESCRIPTION."  # <---- /!\\ Replace TASK DESCRIPTION /!\\
-```
-
-A few examples to replace TASK DESCRIPTION: "Pick up the blue cube and place it into the bin.", "Insert the
-peg into the socket.", "Slide open the ziploc bag.", "Take the elevator to the 1st floor.", "Open the top
-cabinet, store the pot inside it then close the cabinet.", "Push the T-shaped block onto the T-shaped
-target.", "Grab the spray paint on the shelf and place it in the bin on top of the robot dog.", "Fold the
-sweatshirt.", ...
-
-If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
-or open an [issue on GitHub](https://github.com/huggingface/lerobot/issues/new/choose).
-"""
-
-V21_MESSAGE = """
-The dataset you requested ({repo_id}) is in {version} format.
-While current version of LeRobot is backward-compatible with it, the version of your dataset still uses global
-stats instead of per-episode stats. Update your dataset stats to the new format using this command:
-```
-python -m lerobot.datasets.v21.convert_dataset_v20_to_v21 --repo-id={repo_id}
+python -m lerobot.datasets.v30.convert_dataset_v21_to_v30 --repo-id={repo_id}
 ```

 If you encounter a problem, contact LeRobot maintainers on [Discord](https://discord.com/invite/s3KuuzsPFb)
@@ -58,7 +38,12 @@ class CompatibilityError(Exception): ...

 class BackwardCompatibilityError(CompatibilityError):
    def __init__(self, repo_id: str, version: packaging.version.Version):
-        message = V2_MESSAGE.format(repo_id=repo_id, version=version)
+        if version.major == 2 and version.minor == 1:
+            message = V30_MESSAGE.format(repo_id=repo_id, version=version)
+        else:
+            raise NotImplementedError(
+                "Contact the maintainer on [Discord](https://discord.com/invite/s3KuuzsPFb)."
+            )
        super().__init__(message)


--- a/src/lerobot/datasets/lerobot_dataset.py
+++ b/src/lerobot/datasets/lerobot_dataset.py
--- a/src/lerobot/datasets/online_buffer.py
+++ b/src/lerobot/datasets/online_buffer.py
@@ -337,13 +337,11 @@ def compute_sampler_weights(
    if len(offline_dataset) > 0:
        offline_data_mask_indices = []
        for start_index, end_index in zip(
-            offline_dataset.episode_data_index["from"],
-            offline_dataset.episode_data_index["to"],
+            offline_dataset.meta.episodes["dataset_from_index"],
+            offline_dataset.meta.episodes["dataset_to_index"],
            strict=True,
        ):
-            offline_data_mask_indices.extend(
-                range(start_index.item(), end_index.item() - offline_drop_n_last_frames)
-            )
+            offline_data_mask_indices.extend(range(start_index, end_index - offline_drop_n_last_frames))
        offline_data_mask = torch.zeros(len(offline_dataset), dtype=torch.bool)
        offline_data_mask[torch.tensor(offline_data_mask_indices)] = True
        weights.append(
--- a/src/lerobot/datasets/sampler.py
+++ b/src/lerobot/datasets/sampler.py
@@ -21,7 +21,8 @@ import torch
 class EpisodeAwareSampler:
    def __init__(
        self,
-        episode_data_index: dict,
+        dataset_from_indices: list[int],
+        dataset_to_indices: list[int],
        episode_indices_to_use: list | None = None,
        drop_n_first_frames: int = 0,
        drop_n_last_frames: int = 0,
@@ -30,7 +31,8 @@ class EpisodeAwareSampler:
        """Sampler that optionally incorporates episode boundary information.

        Args:
-            episode_data_index: Dictionary with keys 'from' and 'to' containing the start and end indices of each episode.
+            dataset_from_indices: List of indices containing the start of each episode in the dataset.
+            dataset_to_indices: List of indices containing the end of each episode in the dataset.
            episode_indices_to_use: List of episode indices to use. If None, all episodes are used.
                                    Assumes that episodes are indexed from 0 to N-1.
            drop_n_first_frames: Number of frames to drop from the start of each episode.
@@ -39,12 +41,10 @@ class EpisodeAwareSampler:
        """
        indices = []
        for episode_idx, (start_index, end_index) in enumerate(
-            zip(episode_data_index["from"], episode_data_index["to"], strict=True)
+            zip(dataset_from_indices, dataset_to_indices, strict=True)
        ):
            if episode_indices_to_use is None or episode_idx in episode_indices_to_use:
-                indices.extend(
-                    range(start_index.item() + drop_n_first_frames, end_index.item() - drop_n_last_frames)
-                )
+                indices.extend(range(start_index + drop_n_first_frames, end_index - drop_n_last_frames))

        self.indices = indices
        self.shuffle = shuffle
--- a/src/lerobot/datasets/utils.py
+++ b/src/lerobot/datasets/utils.py
@@ -18,42 +18,55 @@ import importlib.resources
 import json
 import logging
 from collections.abc import Iterator
-from itertools import accumulate
 from pathlib import Path
 from pprint import pformat
-from types import SimpleNamespace
 from typing import Any

 import datasets
-import jsonlines
 import numpy as np
 import packaging.version
+import pandas
+import pandas as pd
+import pyarrow.parquet as pq
 import torch
+from datasets import Dataset, concatenate_datasets
 from datasets.table import embed_table_storage
 from huggingface_hub import DatasetCard, DatasetCardData, HfApi
 from huggingface_hub.errors import RevisionNotFoundError
 from PIL import Image as PILImage
 from torchvision import transforms

-from lerobot.configs.types import DictLike, FeatureType, PolicyFeature
+from lerobot.configs.types import FeatureType, PolicyFeature
 from lerobot.datasets.backward_compatibility import (
-    V21_MESSAGE,
+    FUTURE_MESSAGE,
    BackwardCompatibilityError,
    ForwardCompatibilityError,
 )
 from lerobot.utils.utils import is_valid_numpy_dtype_string

-DEFAULT_CHUNK_SIZE = 1000  # Max number of episodes per chunk
+DEFAULT_CHUNK_SIZE = 1000  # Max number of files per chunk
+DEFAULT_DATA_FILE_SIZE_IN_MB = 100  # Max size per file
+DEFAULT_VIDEO_FILE_SIZE_IN_MB = 500  # Max size per file

 INFO_PATH = "meta/info.json"
-EPISODES_PATH = "meta/episodes.jsonl"
 STATS_PATH = "meta/stats.json"
-EPISODES_STATS_PATH = "meta/episodes_stats.jsonl"
-TASKS_PATH = "meta/tasks.jsonl"

-DEFAULT_VIDEO_PATH = "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4"
-DEFAULT_PARQUET_PATH = "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet"
-DEFAULT_IMAGE_PATH = "images/{image_key}/episode_{episode_index:06d}/frame_{frame_index:06d}.png"
+EPISODES_DIR = "meta/episodes"
+DATA_DIR = "data"
+VIDEO_DIR = "videos"
+
+CHUNK_FILE_PATTERN = "chunk-{chunk_index:03d}/file-{file_index:03d}"
+DEFAULT_TASKS_PATH = "meta/tasks.parquet"
+DEFAULT_EPISODES_PATH = EPISODES_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
+DEFAULT_DATA_PATH = DATA_DIR + "/" + CHUNK_FILE_PATTERN + ".parquet"
+DEFAULT_VIDEO_PATH = VIDEO_DIR + "/{video_key}/" + CHUNK_FILE_PATTERN + ".mp4"
+DEFAULT_IMAGE_PATH = "images/{image_key}/episode-{episode_index:06d}/frame-{frame_index:06d}.png"
+
+LEGACY_EPISODES_PATH = "meta/episodes.jsonl"
+LEGACY_EPISODES_STATS_PATH = "meta/episodes_stats.jsonl"
+LEGACY_TASKS_PATH = "meta/tasks.jsonl"
+LEGACY_DEFAULT_VIDEO_PATH = "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4"
+LEGACY_DEFAULT_PARQUET_PATH = "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet"

 DATASET_CARD_TEMPLATE = """
 ---
@@ -74,6 +87,65 @@ DEFAULT_FEATURES = {
 }


+def get_parquet_file_size_in_mb(parquet_path: str | Path) -> float:
+    metadata = pq.read_metadata(parquet_path)
+    total_uncompressed_size = 0
+    for row_group in range(metadata.num_row_groups):
+        rg_metadata = metadata.row_group(row_group)
+        for column in range(rg_metadata.num_columns):
+            col_metadata = rg_metadata.column(column)
+            total_uncompressed_size += col_metadata.total_uncompressed_size
+    return total_uncompressed_size / (1024**2)
+
+
+def get_hf_dataset_size_in_mb(hf_ds: Dataset) -> int:
+    return hf_ds.data.nbytes // (1024**2)
+
+
+def get_hf_dataset_cache_dir(hf_ds: Dataset) -> Path | None:
+    if hf_ds.cache_files is None or len(hf_ds.cache_files) == 0:
+        return None
+    return Path(hf_ds.cache_files[0]["filename"]).parents[2]
+
+
+def update_chunk_file_indices(chunk_idx: int, file_idx: int, chunks_size: int) -> tuple[int, int]:
+    if file_idx == chunks_size - 1:
+        file_idx = 0
+        chunk_idx += 1
+    else:
+        file_idx += 1
+    return chunk_idx, file_idx
+
+
+def load_nested_dataset(pq_dir: Path, features: datasets.Features | None = None) -> Dataset:
+    """Find parquet files in provided directory {pq_dir}/chunk-xxx/file-xxx.parquet
+    Convert parquet files to pyarrow memory mapped in a cache folder for efficient RAM usage
+    Concatenate all pyarrow references to return HF Dataset format
+
+    Args:
+        pq_dir: Directory containing parquet files
+        features: Optional features schema to ensure consistent loading of complex types like images
+    """
+    paths = sorted(pq_dir.glob("*/*.parquet"))
+    if len(paths) == 0:
+        raise FileNotFoundError(f"Provided directory does not contain any parquet file: {pq_dir}")
+
+    # TODO(rcadene): set num_proc to accelerate conversion to pyarrow
+    datasets = [Dataset.from_parquet(str(path), features=features) for path in paths]
+    return concatenate_datasets(datasets)
+
+
+def get_parquet_num_frames(parquet_path: str | Path) -> int:
+    metadata = pq.read_metadata(parquet_path)
+    return metadata.num_rows
+
+
+def get_video_size_in_mb(mp4_path: Path) -> float:
+    file_size_bytes = mp4_path.stat().st_size
+    file_size_mb = file_size_bytes / (1024**2)
+    return file_size_mb
+
+
 def flatten_dict(d: dict, parent_key: str = "", sep: str = "/") -> dict:
    """Flatten a nested dictionary structure by collapsing nested keys into one key with a separator.

@@ -82,6 +154,7 @@ def flatten_dict(d: dict, parent_key: str = "", sep: str = "/") -> dict:
    >>> dct = {"a": {"b": 1, "c": {"d": 2}}, "e": 3}`
    >>> print(flatten_dict(dct))
    {"a/b": 1, "a/c/d": 2, "e": 3}
+    ```
    """
    items = []
    for k, v in d.items():
@@ -106,23 +179,13 @@ def unflatten_dict(d: dict, sep: str = "/") -> dict:
    return outdict


-def get_nested_item(obj: DictLike, flattened_key: str, sep: str = "/") -> Any:
-    split_keys = flattened_key.split(sep)
-    getter = obj[split_keys[0]]
-    if len(split_keys) == 1:
-        return getter
-
-    for key in split_keys[1:]:
-        getter = getter[key]
-
-    return getter
-
-
 def serialize_dict(stats: dict[str, torch.Tensor | np.ndarray | dict]) -> dict:
    serialized_dict = {}
    for key, value in flatten_dict(stats).items():
        if isinstance(value, (torch.Tensor, np.ndarray)):
            serialized_dict[key] = value.tolist()
+        elif isinstance(value, list) and isinstance(value[0], (int, float, list)):
+            serialized_dict[key] = value
        elif isinstance(value, np.generic):
            serialized_dict[key] = value.item()
        elif isinstance(value, (int, float)):
@@ -152,24 +215,7 @@ def write_json(data: dict, fpath: Path) -> None:
        json.dump(data, f, indent=4, ensure_ascii=False)


-def load_jsonlines(fpath: Path) -> list[Any]:
-    with jsonlines.open(fpath, "r") as reader:
-        return list(reader)
-
-
-def write_jsonlines(data: dict, fpath: Path) -> None:
-    fpath.parent.mkdir(exist_ok=True, parents=True)
-    with jsonlines.open(fpath, "w") as writer:
-        writer.write_all(data)
-
-
-def append_jsonlines(data: dict, fpath: Path) -> None:
-    fpath.parent.mkdir(exist_ok=True, parents=True)
-    with jsonlines.open(fpath, "a") as writer:
-        writer.write(data)
-
-
-def write_info(info: dict, local_dir: Path):
+def write_info(info: dict, local_dir: Path) -> None:
    write_json(info, local_dir / INFO_PATH)


@@ -180,65 +226,68 @@ def load_info(local_dir: Path) -> dict:
    return info


-def write_stats(stats: dict, local_dir: Path):
+def write_stats(stats: dict, local_dir: Path) -> None:
    serialized_stats = serialize_dict(stats)
    write_json(serialized_stats, local_dir / STATS_PATH)


-def cast_stats_to_numpy(stats) -> dict[str, dict[str, np.ndarray]]:
+def cast_stats_to_numpy(stats: dict) -> dict[str, dict[str, np.ndarray]]:
    stats = {key: np.array(value) for key, value in flatten_dict(stats).items()}
    return unflatten_dict(stats)


-def load_stats(local_dir: Path) -> dict[str, dict[str, np.ndarray]]:
+def load_stats(local_dir: Path) -> dict[str, dict[str, np.ndarray]] | None:
    if not (local_dir / STATS_PATH).exists():
        return None
    stats = load_json(local_dir / STATS_PATH)
    return cast_stats_to_numpy(stats)


-def write_task(task_index: int, task: dict, local_dir: Path):
-    task_dict = {
-        "task_index": task_index,
-        "task": task,
-    }
-    append_jsonlines(task_dict, local_dir / TASKS_PATH)
+def write_tasks(tasks: pandas.DataFrame, local_dir: Path) -> None:
+    path = local_dir / DEFAULT_TASKS_PATH
+    path.parent.mkdir(parents=True, exist_ok=True)
+    tasks.to_parquet(path)


-def load_tasks(local_dir: Path) -> tuple[dict, dict]:
-    tasks = load_jsonlines(local_dir / TASKS_PATH)
-    tasks = {item["task_index"]: item["task"] for item in sorted(tasks, key=lambda x: x["task_index"])}
-    task_to_task_index = {task: task_index for task_index, task in tasks.items()}
-    return tasks, task_to_task_index
+def load_tasks(local_dir: Path) -> pandas.DataFrame:
+    tasks = pd.read_parquet(local_dir / DEFAULT_TASKS_PATH)
+    return tasks


-def write_episode(episode: dict, local_dir: Path):
-    append_jsonlines(episode, local_dir / EPISODES_PATH)
+def write_episodes(episodes: Dataset, local_dir: Path) -> None:
+    """Write episode metadata to a parquet file in the LeRobot v3.0 format.
+    This function writes episode-level metadata to a single parquet file.
+    Used primarily during dataset conversion (v2.1 → v3.0) and in test fixtures.
+
+    Args:
+        episodes: HuggingFace Dataset containing episode metadata
+        local_dir: Root directory where the dataset will be stored
+    """
+    episode_size_mb = get_hf_dataset_size_in_mb(episodes)
+    if episode_size_mb > DEFAULT_DATA_FILE_SIZE_IN_MB:
+        raise NotImplementedError(
+            f"Episodes dataset is too large ({episode_size_mb} MB) to write to a single file. "
+            f"The current limit is {DEFAULT_DATA_FILE_SIZE_IN_MB} MB. "
+            "This function only supports single-file episode metadata. "
+        )
+
+    fpath = local_dir / DEFAULT_EPISODES_PATH.format(chunk_index=0, file_index=0)
+    fpath.parent.mkdir(parents=True, exist_ok=True)
+    episodes.to_parquet(fpath)


-def load_episodes(local_dir: Path) -> dict:
-    episodes = load_jsonlines(local_dir / EPISODES_PATH)
-    return {item["episode_index"]: item for item in sorted(episodes, key=lambda x: x["episode_index"])}
-
-
-def write_episode_stats(episode_index: int, episode_stats: dict, local_dir: Path):
-    # We wrap episode_stats in a dictionary since `episode_stats["episode_index"]`
-    # is a dictionary of stats and not an integer.
-    episode_stats = {"episode_index": episode_index, "stats": serialize_dict(episode_stats)}
-    append_jsonlines(episode_stats, local_dir / EPISODES_STATS_PATH)
-
-
-def load_episodes_stats(local_dir: Path) -> dict:
-    episodes_stats = load_jsonlines(local_dir / EPISODES_STATS_PATH)
-    return {
-        item["episode_index"]: cast_stats_to_numpy(item["stats"])
-        for item in sorted(episodes_stats, key=lambda x: x["episode_index"])
-    }
+def load_episodes(local_dir: Path) -> datasets.Dataset:
+    episodes = load_nested_dataset(local_dir / EPISODES_DIR)
+    # Select episode features/columns containing references to episode data and videos
+    # (e.g. tasks, dataset_from_index, dataset_to_index, data/chunk_index, data/file_index, etc.)
+    # This is to speedup access to these data, instead of having to load episode stats.
+    episodes = episodes.select_columns([key for key in episodes.features if not key.startswith("stats/")])
+    return episodes


 def backward_compatible_episodes_stats(
    stats: dict[str, dict[str, np.ndarray]], episodes: list[int]
-) -> dict[str, dict[str, np.ndarray]]:
+) -> dict[int, dict[str, dict[str, np.ndarray]]]:
    return dict.fromkeys(episodes, stats)


@@ -254,7 +303,7 @@ def load_image_as_numpy(
    return img_array


-def hf_transform_to_torch(items_dict: dict[torch.Tensor | None]):
+def hf_transform_to_torch(items_dict: dict[str, list[Any]]) -> dict[str, list[torch.Tensor | str]]:
    """Get a transform function that convert items from Hugging Face dataset (pyarrow)
    to torch tensors. Importantly, images are converted from PIL, which corresponds to
    a channel last representation (h w c) of uint8 type, to a torch image representation
@@ -299,7 +348,7 @@ def check_version_compatibility(
    if v_check.major < v_current.major and enforce_breaking_major:
        raise BackwardCompatibilityError(repo_id, v_check)
    elif v_check.minor < v_current.minor:
-        logging.warning(V21_MESSAGE.format(repo_id=repo_id, version=v_check))
+        logging.warning(FUTURE_MESSAGE.format(repo_id=repo_id, version=v_check))


 def get_repo_versions(repo_id: str) -> list[packaging.version.Version]:
@@ -476,6 +525,9 @@ def create_empty_dataset_info(
    features: dict,
    use_videos: bool,
    robot_type: str | None = None,
+    chunks_size: int | None = None,
+    data_files_size_in_mb: int | None = None,
+    video_files_size_in_mb: int | None = None,
 ) -> dict:
    return {
        "codebase_version": codebase_version,
@@ -483,104 +535,17 @@ def create_empty_dataset_info(
        "total_episodes": 0,
        "total_frames": 0,
        "total_tasks": 0,
-        "total_videos": 0,
-        "total_chunks": 0,
-        "chunks_size": DEFAULT_CHUNK_SIZE,
+        "chunks_size": chunks_size or DEFAULT_CHUNK_SIZE,
+        "data_files_size_in_mb": data_files_size_in_mb or DEFAULT_DATA_FILE_SIZE_IN_MB,
+        "video_files_size_in_mb": video_files_size_in_mb or DEFAULT_VIDEO_FILE_SIZE_IN_MB,
        "fps": fps,
        "splits": {},
-        "data_path": DEFAULT_PARQUET_PATH,
+        "data_path": DEFAULT_DATA_PATH,
        "video_path": DEFAULT_VIDEO_PATH if use_videos else None,
        "features": features,
    }


-def get_episode_data_index(
-    episode_dicts: dict[dict], episodes: list[int] | None = None
-) -> dict[str, torch.Tensor]:
-    episode_lengths = {ep_idx: ep_dict["length"] for ep_idx, ep_dict in episode_dicts.items()}
-    if episodes is not None:
-        episode_lengths = {ep_idx: episode_lengths[ep_idx] for ep_idx in episodes}
-
-    cumulative_lengths = list(accumulate(episode_lengths.values()))
-    return {
-        "from": torch.LongTensor([0] + cumulative_lengths[:-1]),
-        "to": torch.LongTensor(cumulative_lengths),
-    }
-
-
-def check_timestamps_sync(
-    timestamps: np.ndarray,
-    episode_indices: np.ndarray,
-    episode_data_index: dict[str, np.ndarray],
-    fps: int,
-    tolerance_s: float,
-    raise_value_error: bool = True,
-) -> bool:
-    """
-    This check is to make sure that each timestamp is separated from the next by (1/fps) +/- tolerance
-    to account for possible numerical error.
-
-    Args:
-        timestamps (np.ndarray): Array of timestamps in seconds.
-        episode_indices (np.ndarray): Array indicating the episode index for each timestamp.
-        episode_data_index (dict[str, np.ndarray]): A dictionary that includes 'to',
-            which identifies indices for the end of each episode.
-        fps (int): Frames per second. Used to check the expected difference between consecutive timestamps.
-        tolerance_s (float): Allowed deviation from the expected (1/fps) difference.
-        raise_value_error (bool): Whether to raise a ValueError if the check fails.
-
-    Returns:
-        bool: True if all checked timestamp differences lie within tolerance, False otherwise.
-
-    Raises:
-        ValueError: If the check fails and `raise_value_error` is True.
-    """
-    if timestamps.shape != episode_indices.shape:
-        raise ValueError(
-            "timestamps and episode_indices should have the same shape. "
-            f"Found {timestamps.shape=} and {episode_indices.shape=}."
-        )
-
-    # Consecutive differences
-    diffs = np.diff(timestamps)
-    within_tolerance = np.abs(diffs - (1.0 / fps)) <= tolerance_s
-
-    # Mask to ignore differences at the boundaries between episodes
-    mask = np.ones(len(diffs), dtype=bool)
-    ignored_diffs = episode_data_index["to"][:-1] - 1  # indices at the end of each episode
-    mask[ignored_diffs] = False
-    filtered_within_tolerance = within_tolerance[mask]
-
-    # Check if all remaining diffs are within tolerance
-    if not np.all(filtered_within_tolerance):
-        # Track original indices before masking
-        original_indices = np.arange(len(diffs))
-        filtered_indices = original_indices[mask]
-        outside_tolerance_filtered_indices = np.nonzero(~filtered_within_tolerance)[0]
-        outside_tolerance_indices = filtered_indices[outside_tolerance_filtered_indices]
-
-        outside_tolerances = []
-        for idx in outside_tolerance_indices:
-            entry = {
-                "timestamps": [timestamps[idx], timestamps[idx + 1]],
-                "diff": diffs[idx],
-                "episode_index": episode_indices[idx].item()
-                if hasattr(episode_indices[idx], "item")
-                else episode_indices[idx],
-            }
-            outside_tolerances.append(entry)
-
-        if raise_value_error:
-            raise ValueError(
-                f"""One or several timestamps unexpectedly violate the tolerance inside episode range.
-                This might be due to synchronization issues during data collection.
-                \n{pformat(outside_tolerances)}"""
-            )
-        return False
-
-    return True
-
-
 def check_delta_timestamps(
    delta_timestamps: dict[str, list[float]], fps: int, tolerance_s: float, raise_value_error: bool = True
 ) -> bool:
@@ -619,7 +584,7 @@ def get_delta_indices(delta_timestamps: dict[str, list[float]], fps: int) -> dic
    return delta_indices


-def cycle(iterable):
+def cycle(iterable: Any) -> Iterator[Any]:
    """The equivalent of itertools.cycle, but safe for Pytorch dataloaders.

    See https://github.com/pytorch/pytorch/issues/23900 for information on why itertools.cycle is not safe.
@@ -632,7 +597,7 @@ def cycle(iterable):
            iterator = iter(iterable)


-def create_branch(repo_id, *, branch: str, repo_type: str | None = None) -> None:
+def create_branch(repo_id: str, *, branch: str, repo_type: str | None = None) -> None:
    """Create a branch on a existing Hugging Face repo. Delete the branch if it already
    exists before creating it.
    """
@@ -685,76 +650,28 @@ def create_lerobot_dataset_card(
    )


-class IterableNamespace(SimpleNamespace):
-    """
-    A namespace object that supports both dictionary-like iteration and dot notation access.
-    Automatically converts nested dictionaries into IterableNamespaces.
-
-    This class extends SimpleNamespace to provide:
-    - Dictionary-style iteration over keys
-    - Access to items via both dot notation (obj.key) and brackets (obj["key"])
-    - Dictionary-like methods: items(), keys(), values()
-    - Recursive conversion of nested dictionaries
-
-    Args:
-        dictionary: Optional dictionary to initialize the namespace
-        **kwargs: Additional keyword arguments passed to SimpleNamespace
-
-    Examples:
-        >>> data = {"name": "Alice", "details": {"age": 25}}
-        >>> ns = IterableNamespace(data)
-        >>> ns.name
-        'Alice'
-        >>> ns.details.age
-        25
-        >>> list(ns.keys())
-        ['name', 'details']
-        >>> for key, value in ns.items():
-        ...     print(f"{key}: {value}")
-        name: Alice
-        details: IterableNamespace(age=25)
-    """
-
-    def __init__(self, dictionary: dict[str, Any] = None, **kwargs):
-        super().__init__(**kwargs)
-        if dictionary is not None:
-            for key, value in dictionary.items():
-                if isinstance(value, dict):
-                    setattr(self, key, IterableNamespace(value))
-                else:
-                    setattr(self, key, value)
-
-    def __iter__(self) -> Iterator[str]:
-        return iter(vars(self))
-
-    def __getitem__(self, key: str) -> Any:
-        return vars(self)[key]
-
-    def items(self):
-        return vars(self).items()
-
-    def values(self):
-        return vars(self).values()
-
-    def keys(self):
-        return vars(self).keys()
-
-
-def validate_frame(frame: dict, features: dict):
+def validate_frame(frame: dict, features: dict) -> None:
    expected_features = set(features) - set(DEFAULT_FEATURES)
    actual_features = set(frame)

-    error_message = validate_features_presence(actual_features, expected_features)
+    # task is a special required field that's not part of regular features
+    if "task" not in actual_features:
+        raise ValueError("Feature mismatch in `frame` dictionary:\nMissing features: {'task'}\n")

-    common_features = actual_features & expected_features
-    for name in common_features - {"task"}:
+    # Remove task from actual_features for regular feature validation
+    actual_features_for_validation = actual_features - {"task"}
+
+    error_message = validate_features_presence(actual_features_for_validation, expected_features)
+
+    common_features = actual_features_for_validation & expected_features
+    for name in common_features:
        error_message += validate_feature_dtype_and_shape(name, features[name], frame[name])

    if error_message:
        raise ValueError(error_message)


-def validate_features_presence(actual_features: set[str], expected_features: set[str]):
+def validate_features_presence(actual_features: set[str], expected_features: set[str]) -> str:
    error_message = ""
    missing_features = expected_features - actual_features
    extra_features = actual_features - expected_features
@@ -769,7 +686,9 @@ def validate_features_presence(actual_features: set[str], expected_features: set
    return error_message


-def validate_feature_dtype_and_shape(name: str, feature: dict, value: np.ndarray | PILImage.Image | str):
+def validate_feature_dtype_and_shape(
+    name: str, feature: dict, value: np.ndarray | PILImage.Image | str
+) -> str:
    expected_dtype = feature["dtype"]
    expected_shape = feature["shape"]
    if is_valid_numpy_dtype_string(expected_dtype):
@@ -784,7 +703,7 @@ def validate_feature_dtype_and_shape(name: str, feature: dict, value: np.ndarray

 def validate_feature_numpy_array(
    name: str, expected_dtype: str, expected_shape: list[int], value: np.ndarray
-):
+) -> str:
    error_message = ""
    if isinstance(value, np.ndarray):
        actual_dtype = value.dtype
@@ -801,7 +720,9 @@ def validate_feature_numpy_array(
    return error_message


-def validate_feature_image_or_video(name: str, expected_shape: list[str], value: np.ndarray | PILImage.Image):
+def validate_feature_image_or_video(
+    name: str, expected_shape: list[str], value: np.ndarray | PILImage.Image
+) -> str:
    # Note: The check of pixels range ([0,1] for float and [0,255] for uint8) is done by the image writer threads.
    error_message = ""
    if isinstance(value, np.ndarray):
@@ -817,13 +738,13 @@ def validate_feature_image_or_video(name: str, expected_shape: list[str], value:
    return error_message


-def validate_feature_string(name: str, value: str):
+def validate_feature_string(name: str, value: str) -> str:
    if not isinstance(value, str):
        return f"The feature '{name}' is expected to be of type 'str', but type '{type(value)}' provided instead.\n"
    return ""


-def validate_episode_buffer(episode_buffer: dict, total_episodes: int, features: dict):
+def validate_episode_buffer(episode_buffer: dict, total_episodes: int, features: dict) -> None:
    if "size" not in episode_buffer:
        raise ValueError("size key not found in episode_buffer")

@@ -847,3 +768,11 @@ def validate_episode_buffer(episode_buffer: dict, total_episodes: int, features:
            f"In episode_buffer not in features: {buffer_keys - set(features)}"
            f"In features not in episode_buffer: {set(features) - buffer_keys}"
        )
+
+
+def to_parquet_with_hf_images(df: pandas.DataFrame, path: Path) -> None:
+    """This function correctly writes to parquet a panda DataFrame that contains images encoded by HF dataset.
+    This way, it can be loaded by HF dataset and correctly formatted images are returned.
+    """
+    # TODO(qlhoest): replace this weird synthax by `df.to_parquet(path)` only
+    datasets.Dataset.from_dict(df.to_dict(orient="list")).to_parquet(path)
--- a/src/lerobot/datasets/v2/batch_convert_dataset_v1_to_v2.py
+++ b/src/lerobot/datasets/v2/batch_convert_dataset_v1_to_v2.py
@@ -1,884 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-This script is for internal use to convert all datasets under the 'lerobot' hub user account to v2.
-
-Note: Since the original Aloha datasets don't use shadow motors, you need to comment those out in
-lerobot/configs/robot/aloha.yaml before running this script.
-"""
-
-import traceback
-from pathlib import Path
-from textwrap import dedent
-
-from lerobot import available_datasets
-from lerobot.datasets.v2.convert_dataset_v1_to_v2 import convert_dataset
-from lerobot.robots.aloha.configuration_aloha import AlohaRobotConfig
-
-LOCAL_DIR = Path("data/")
-
-# spellchecker:off
-ALOHA_MOBILE_INFO = {
-    "robot_config": AlohaRobotConfig(),
-    "license": "mit",
-    "url": "https://mobile-aloha.github.io/",
-    "paper": "https://huggingface.co/papers/2401.02117",
-    "citation_bibtex": dedent(r"""
-        @inproceedings{fu2024mobile,
-            author    = {Fu, Zipeng and Zhao, Tony Z. and Finn, Chelsea},
-            title     = {Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation},
-            booktitle = {arXiv},
-            year      = {2024},
-        }""").lstrip(),
-}
-ALOHA_STATIC_INFO = {
-    "robot_config": AlohaRobotConfig(),
-    "license": "mit",
-    "url": "https://tonyzhaozh.github.io/aloha/",
-    "paper": "https://huggingface.co/papers/2304.13705",
-    "citation_bibtex": dedent(r"""
-        @article{Zhao2023LearningFB,
-            title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
-            author={Tony Zhao and Vikash Kumar and Sergey Levine and Chelsea Finn},
-            journal={RSS},
-            year={2023},
-            volume={abs/2304.13705},
-            url={https://huggingface.co/papers/2304.13705}
-        }""").lstrip(),
-}
-PUSHT_INFO = {
-    "license": "mit",
-    "url": "https://diffusion-policy.cs.columbia.edu/",
-    "paper": "https://huggingface.co/papers/2303.04137",
-    "citation_bibtex": dedent(r"""
-        @article{chi2024diffusionpolicy,
-            author = {Cheng Chi and Zhenjia Xu and Siyuan Feng and Eric Cousineau and Yilun Du and Benjamin Burchfiel and Russ Tedrake and Shuran Song},
-            title ={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
-            journal = {The International Journal of Robotics Research},
-            year = {2024},
-        }""").lstrip(),
-}
-XARM_INFO = {
-    "license": "mit",
-    "url": "https://www.nicklashansen.com/td-mpc/",
-    "paper": "https://huggingface.co/papers/2203.04955",
-    "citation_bibtex": dedent(r"""
-        @inproceedings{Hansen2022tdmpc,
-            title={Temporal Difference Learning for Model Predictive Control},
-            author={Nicklas Hansen and Xiaolong Wang and Hao Su},
-            booktitle={ICML},
-            year={2022}
-        }
-    """),
-}
-UNITREEH_INFO = {
-    "license": "apache-2.0",
-}
-
-DATASETS = {
-    "aloha_mobile_cabinet": {
-        "single_task": "Open the top cabinet, store the pot inside it then close the cabinet.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_mobile_chair": {
-        "single_task": "Push the chairs in front of the desk to place them against it.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_mobile_elevator": {
-        "single_task": "Take the elevator to the 1st floor.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_mobile_shrimp": {
-        "single_task": "Sauté the raw shrimp on both sides, then serve it in the bowl.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_mobile_wash_pan": {
-        "single_task": "Pick up the pan, rinse it in the sink and then place it in the drying rack.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_mobile_wipe_wine": {
-        "single_task": "Pick up the wet cloth on the faucet and use it to clean the spilled wine on the table and underneath the glass.",
-        **ALOHA_MOBILE_INFO,
-    },
-    "aloha_static_battery": {
-        "single_task": "Place the battery into the slot of the remote controller.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_candy": {"single_task": "Pick up the candy and unwrap it.", **ALOHA_STATIC_INFO},
-    "aloha_static_coffee": {
-        "single_task": "Place the coffee capsule inside the capsule container, then place the cup onto the center of the cup tray, then push the 'Hot Water' and 'Travel Mug' buttons.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_coffee_new": {
-        "single_task": "Place the coffee capsule inside the capsule container, then place the cup onto the center of the cup tray.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_cups_open": {
-        "single_task": "Pick up the plastic cup and open its lid.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_fork_pick_up": {
-        "single_task": "Pick up the fork and place it on the plate.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_pingpong_test": {
-        "single_task": "Transfer one of the two balls in the right glass into the left glass, then transfer it back to the right glass.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_pro_pencil": {
-        "single_task": "Pick up the pencil with the right arm, hand it over to the left arm then place it back onto the table.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_screw_driver": {
-        "single_task": "Pick up the screwdriver with the right arm, hand it over to the left arm then place it into the cup.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_tape": {
-        "single_task": "Cut a small piece of tape from the tape dispenser then place it on the cardboard box's edge.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_thread_velcro": {
-        "single_task": "Pick up the velcro cable tie with the left arm, then insert the end of the velcro tie into the other end's loop with the right arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_towel": {
-        "single_task": "Pick up a piece of paper towel and place it on the spilled liquid.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_vinh_cup": {
-        "single_task": "Pick up the plastic cup with the right arm, then pop its lid open with the left arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_vinh_cup_left": {
-        "single_task": "Pick up the plastic cup with the left arm, then pop its lid open with the right arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_static_ziploc_slide": {"single_task": "Slide open the ziploc bag.", **ALOHA_STATIC_INFO},
-    "aloha_sim_insertion_scripted": {"single_task": "Insert the peg into the socket.", **ALOHA_STATIC_INFO},
-    "aloha_sim_insertion_scripted_image": {
-        "single_task": "Insert the peg into the socket.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_sim_insertion_human": {"single_task": "Insert the peg into the socket.", **ALOHA_STATIC_INFO},
-    "aloha_sim_insertion_human_image": {
-        "single_task": "Insert the peg into the socket.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_sim_transfer_cube_scripted": {
-        "single_task": "Pick up the cube with the right arm and transfer it to the left arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_sim_transfer_cube_scripted_image": {
-        "single_task": "Pick up the cube with the right arm and transfer it to the left arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_sim_transfer_cube_human": {
-        "single_task": "Pick up the cube with the right arm and transfer it to the left arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "aloha_sim_transfer_cube_human_image": {
-        "single_task": "Pick up the cube with the right arm and transfer it to the left arm.",
-        **ALOHA_STATIC_INFO,
-    },
-    "pusht": {"single_task": "Push the T-shaped block onto the T-shaped target.", **PUSHT_INFO},
-    "pusht_image": {"single_task": "Push the T-shaped block onto the T-shaped target.", **PUSHT_INFO},
-    "unitreeh1_fold_clothes": {"single_task": "Fold the sweatshirt.", **UNITREEH_INFO},
-    "unitreeh1_rearrange_objects": {"single_task": "Put the object into the bin.", **UNITREEH_INFO},
-    "unitreeh1_two_robot_greeting": {
-        "single_task": "Greet the other robot with a high five.",
-        **UNITREEH_INFO,
-    },
-    "unitreeh1_warehouse": {
-        "single_task": "Grab the spray paint on the shelf and place it in the bin on top of the robot dog.",
-        **UNITREEH_INFO,
-    },
-    "xarm_lift_medium": {"single_task": "Pick up the cube and lift it.", **XARM_INFO},
-    "xarm_lift_medium_image": {"single_task": "Pick up the cube and lift it.", **XARM_INFO},
-    "xarm_lift_medium_replay": {"single_task": "Pick up the cube and lift it.", **XARM_INFO},
-    "xarm_lift_medium_replay_image": {"single_task": "Pick up the cube and lift it.", **XARM_INFO},
-    "xarm_push_medium": {"single_task": "Push the cube onto the target.", **XARM_INFO},
-    "xarm_push_medium_image": {"single_task": "Push the cube onto the target.", **XARM_INFO},
-    "xarm_push_medium_replay": {"single_task": "Push the cube onto the target.", **XARM_INFO},
-    "xarm_push_medium_replay_image": {"single_task": "Push the cube onto the target.", **XARM_INFO},
-    "umi_cup_in_the_wild": {
-        "single_task": "Put the cup on the plate.",
-        "license": "apache-2.0",
-    },
-    "asu_table_top": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://link.springer.com/article/10.1007/s10514-023-10129-1",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{zhou2023modularity,
-                title={Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation},
-                author={Zhou, Yifan and Sonawani, Shubham and Phielipp, Mariano and Stepputtis, Simon and Amor, Heni},
-                booktitle={Conference on Robot Learning},
-                pages={1684--1695},
-                year={2023},
-                organization={PMLR}
-            }
-            @article{zhou2023learning,
-                title={Learning modular language-conditioned robot policies through attention},
-                author={Zhou, Yifan and Sonawani, Shubham and Phielipp, Mariano and Ben Amor, Heni and Stepputtis, Simon},
-                journal={Autonomous Robots},
-                pages={1--21},
-                year={2023},
-                publisher={Springer}
-            }""").lstrip(),
-    },
-    "austin_buds_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://ut-austin-rpl.github.io/BUDS-website/",
-        "paper": "https://huggingface.co/papers/2109.13841",
-        "citation_bibtex": dedent(r"""
-            @article{zhu2022bottom,
-                title={Bottom-Up Skill Discovery From Unsegmented Demonstrations for Long-Horizon Robot Manipulation},
-                author={Zhu, Yifeng and Stone, Peter and Zhu, Yuke},
-                journal={IEEE Robotics and Automation Letters},
-                volume={7},
-                number={2},
-                pages={4126--4133},
-                year={2022},
-                publisher={IEEE}
-            }""").lstrip(),
-    },
-    "austin_sailor_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://ut-austin-rpl.github.io/sailor/",
-        "paper": "https://huggingface.co/papers/2210.11435",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{nasiriany2022sailor,
-                title={Learning and Retrieval from Prior Data for Skill-based Imitation Learning},
-                author={Soroush Nasiriany and Tian Gao and Ajay Mandlekar and Yuke Zhu},
-                booktitle={Conference on Robot Learning (CoRL)},
-                year={2022}
-            }""").lstrip(),
-    },
-    "austin_sirius_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://ut-austin-rpl.github.io/sirius/",
-        "paper": "https://huggingface.co/papers/2211.08416",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{liu2022robot,
-                title = {Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment},
-                author = {Huihan Liu and Soroush Nasiriany and Lance Zhang and Zhiyao Bao and Yuke Zhu},
-                booktitle = {Robotics: Science and Systems (RSS)},
-                year = {2023}
-            }""").lstrip(),
-    },
-    "berkeley_autolab_ur5": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://sites.google.com/view/berkeley-ur5/home",
-        "citation_bibtex": dedent(r"""
-            @misc{BerkeleyUR5Website,
-                title = {Berkeley {UR5} Demonstration Dataset},
-                author = {Lawrence Yunliang Chen and Simeon Adebola and Ken Goldberg},
-                howpublished = {https://sites.google.com/view/berkeley-ur5/home},
-            }""").lstrip(),
-    },
-    "berkeley_cable_routing": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://sites.google.com/view/cablerouting/home",
-        "paper": "https://huggingface.co/papers/2307.08927",
-        "citation_bibtex": dedent(r"""
-            @article{luo2023multistage,
-                author    = {Jianlan Luo and Charles Xu and Xinyang Geng and Gilbert Feng and Kuan Fang and Liam Tan and Stefan Schaal and Sergey Levine},
-                title     = {Multi-Stage Cable Routing through Hierarchical Imitation Learning},
-                journal   = {arXiv pre-print},
-                year      = {2023},
-                url       = {https://huggingface.co/papers/2307.08927},
-            }""").lstrip(),
-    },
-    "berkeley_fanuc_manipulation": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/berkeley.edu/fanuc-manipulation",
-        "citation_bibtex": dedent(r"""
-            @article{fanuc_manipulation2023,
-                title={Fanuc Manipulation: A Dataset for Learning-based Manipulation with FANUC Mate 200iD Robot},
-                author={Zhu, Xinghao and Tian, Ran and Xu, Chenfeng and Ding, Mingyu and Zhan, Wei and Tomizuka, Masayoshi},
-                year={2023},
-            }""").lstrip(),
-    },
-    "berkeley_gnm_cory_hall": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://huggingface.co/papers/1709.10489",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{kahn2018self,
-                title={Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation},
-                author={Kahn, Gregory and Villaflor, Adam and Ding, Bosen and Abbeel, Pieter and Levine, Sergey},
-                booktitle={2018 IEEE international conference on robotics and automation (ICRA)},
-                pages={5129--5136},
-                year={2018},
-                organization={IEEE}
-            }""").lstrip(),
-    },
-    "berkeley_gnm_recon": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/view/recon-robot",
-        "paper": "https://huggingface.co/papers/2104.05859",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{shah2021rapid,
-                title={Rapid Exploration for Open-World Navigation with Latent Goal Models},
-                author={Dhruv Shah and Benjamin Eysenbach and Nicholas Rhinehart and Sergey Levine},
-                booktitle={5th Annual Conference on Robot Learning },
-                year={2021},
-                url={https://openreview.net/forum?id=d_SWJhyKfVw}
-            }""").lstrip(),
-    },
-    "berkeley_gnm_sac_son": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/view/SACSoN-review",
-        "paper": "https://huggingface.co/papers/2306.01874",
-        "citation_bibtex": dedent(r"""
-            @article{hirose2023sacson,
-                title={SACSoN: Scalable Autonomous Data Collection for Social Navigation},
-                author={Hirose, Noriaki and Shah, Dhruv and Sridhar, Ajay and Levine, Sergey},
-                journal={arXiv preprint arXiv:2306.01874},
-                year={2023}
-            }""").lstrip(),
-    },
-    "berkeley_mvp": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://huggingface.co/papers/2203.06173",
-        "citation_bibtex": dedent(r"""
-            @InProceedings{Radosavovic2022,
-                title = {Real-World Robot Learning with Masked Visual Pre-training},
-                author = {Ilija Radosavovic and Tete Xiao and Stephen James and Pieter Abbeel and Jitendra Malik and Trevor Darrell},
-                booktitle = {CoRL},
-                year = {2022}
-            }""").lstrip(),
-    },
-    "berkeley_rpt": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://huggingface.co/papers/2306.10007",
-        "citation_bibtex": dedent(r"""
-            @article{Radosavovic2023,
-                title={Robot Learning with Sensorimotor Pre-training},
-                author={Ilija Radosavovic and Baifeng Shi and Letian Fu and Ken Goldberg and Trevor Darrell and Jitendra Malik},
-                year={2023},
-                journal={arXiv:2306.10007}
-            }""").lstrip(),
-    },
-    "cmu_franka_exploration_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://human-world-model.github.io/",
-        "paper": "https://huggingface.co/papers/2308.10901",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{mendonca2023structured,
-                title={Structured World Models from Human Videos},
-                author={Mendonca, Russell  and Bahl, Shikhar and Pathak, Deepak},
-                journal={RSS},
-                year={2023}
-            }""").lstrip(),
-    },
-    "cmu_play_fusion": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://play-fusion.github.io/",
-        "paper": "https://huggingface.co/papers/2312.04549",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{chen2023playfusion,
-                title={PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play},
-                author={Chen, Lili and Bahl, Shikhar and Pathak, Deepak},
-                booktitle={CoRL},
-                year={2023}
-            }""").lstrip(),
-    },
-    "cmu_stretch": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://robo-affordances.github.io/",
-        "paper": "https://huggingface.co/papers/2304.08488",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{bahl2023affordances,
-                title={Affordances from Human Videos as a Versatile Representation for Robotics},
-                author={Bahl, Shikhar and Mendonca, Russell and Chen, Lili and Jain, Unnat and Pathak, Deepak},
-                booktitle={CVPR},
-                year={2023}
-            }
-                @article{mendonca2023structured,
-                title={Structured World Models from Human Videos},
-                author={Mendonca, Russell and Bahl, Shikhar and Pathak, Deepak},
-                journal={CoRL},
-                year={2023}
-            }""").lstrip(),
-    },
-    "columbia_cairlab_pusht_real": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://diffusion-policy.cs.columbia.edu/",
-        "paper": "https://huggingface.co/papers/2303.04137",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{chi2023diffusionpolicy,
-                title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
-                author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
-                booktitle={Proceedings of Robotics: Science and Systems (RSS)},
-                year={2023}
-            }""").lstrip(),
-    },
-    "conq_hose_manipulation": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/view/conq-hose-manipulation-dataset/home",
-        "citation_bibtex": dedent(r"""
-            @misc{ConqHoseManipData,
-                author={Peter Mitrano and Dmitry Berenson},
-                title={Conq Hose Manipulation Dataset, v1.15.0},
-                year={2024},
-                howpublished={https://sites.google.com/view/conq-hose-manipulation-dataset}
-            }""").lstrip(),
-    },
-    "dlr_edan_shared_control": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://ieeexplore.ieee.org/document/9341156",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{vogel_edan_2020,
-                title = {EDAN - an EMG-Controlled Daily Assistant to Help People with Physical Disabilities},
-                language = {en},
-                booktitle = {2020 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})},
-                author = {Vogel, Jörn and Hagengruber, Annette and Iskandar, Maged and Quere, Gabriel and Leipscher, Ulrike and Bustamante, Samuel and Dietrich, Alexander and Hoeppner, Hannes and Leidner, Daniel and Albu-Schäffer, Alin},
-                year = {2020}
-            }
-            @inproceedings{quere_shared_2020,
-                address = {Paris, France},
-                title = {Shared {Control} {Templates} for {Assistive} {Robotics}},
-                language = {en},
-                booktitle = {2020 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})},
-                author = {Quere, Gabriel and Hagengruber, Annette and Iskandar, Maged and Bustamante, Samuel and Leidner, Daniel and Stulp, Freek and Vogel, Joern},
-                year = {2020},
-                pages = {7},
-            }""").lstrip(),
-    },
-    "dlr_sara_grid_clamp": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://www.researchsquare.com/article/rs-3289569/v1",
-        "citation_bibtex": dedent(r"""
-            @article{padalkar2023guided,
-                title={A guided reinforcement learning approach using shared control templates for learning manipulation skills in the real world},
-                author={Padalkar, Abhishek and Quere, Gabriel and Raffin, Antonin and Silv{\'e}rio, Jo{\~a}o and Stulp, Freek},
-                journal={Research square preprint rs-3289569/v1},
-                year={2023}
-            }""").lstrip(),
-    },
-    "dlr_sara_pour": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "paper": "https://elib.dlr.de/193739/1/padalkar2023rlsct.pdf",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{padalkar2023guiding,
-                title={Guiding Reinforcement Learning with Shared Control Templates},
-                author={Padalkar, Abhishek and Quere, Gabriel and Steinmetz, Franz and Raffin, Antonin and Nieuwenhuisen, Matthias and Silv{\'e}rio, Jo{\~a}o and Stulp, Freek},
-                booktitle={40th IEEE International Conference on Robotics and Automation, ICRA 2023},
-                year={2023},
-                organization={IEEE}
-            }""").lstrip(),
-    },
-    "droid_100": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://droid-dataset.github.io/",
-        "paper": "https://huggingface.co/papers/2403.12945",
-        "citation_bibtex": dedent(r"""
-            @article{khazatsky2024droid,
-                title   = {DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset},
-                author  = {Alexander Khazatsky and Karl Pertsch and Suraj Nair and Ashwin Balakrishna and Sudeep Dasari and Siddharth Karamcheti and Soroush Nasiriany and Mohan Kumar Srirama and Lawrence Yunliang Chen and Kirsty Ellis and Peter David Fagan and Joey Hejna and Masha Itkina and Marion Lepert and Yecheng Jason Ma and Patrick Tree Miller and Jimmy Wu and Suneel Belkhale and Shivin Dass and Huy Ha and Arhan Jain and Abraham Lee and Youngwoon Lee and Marius Memmel and Sungjae Park and Ilija Radosavovic and Kaiyuan Wang and Albert Zhan and Kevin Black and Cheng Chi and Kyle Beltran Hatch and Shan Lin and Jingpei Lu and Jean Mercat and Abdul Rehman and Pannag R Sanketi and Archit Sharma and Cody Simpson and Quan Vuong and Homer Rich Walke and Blake Wulfe and Ted Xiao and Jonathan Heewon Yang and Arefeh Yavary and Tony Z. Zhao and Christopher Agia and Rohan Baijal and Mateo Guaman Castro and Daphne Chen and Qiuyu Chen and Trinity Chung and Jaimyn Drake and Ethan Paul Foster and Jensen Gao and David Antonio Herrera and Minho Heo and Kyle Hsu and Jiaheng Hu and Donovon Jackson and Charlotte Le and Yunshuang Li and Kevin Lin and Roy Lin and Zehan Ma and Abhiram Maddukuri and Suvir Mirchandani and Daniel Morton and Tony Nguyen and Abigail O'Neill and Rosario Scalise and Derick Seale and Victor Son and Stephen Tian and Emi Tran and Andrew E. Wang and Yilin Wu and Annie Xie and Jingyun Yang and Patrick Yin and Yunchu Zhang and Osbert Bastani and Glen Berseth and Jeannette Bohg and Ken Goldberg and Abhinav Gupta and Abhishek Gupta and Dinesh Jayaraman and Joseph J Lim and Jitendra Malik and Roberto Martín-Martín and Subramanian Ramamoorthy and Dorsa Sadigh and Shuran Song and Jiajun Wu and Michael C. Yip and Yuke Zhu and Thomas Kollar and Sergey Levine and Chelsea Finn},
-                year    = {2024},
-            }""").lstrip(),
-    },
-    "fmb": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://functional-manipulation-benchmark.github.io/",
-        "paper": "https://huggingface.co/papers/2401.08553",
-        "citation_bibtex": dedent(r"""
-            @article{luo2024fmb,
-                title={FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning},
-                author={Luo, Jianlan and Xu, Charles and Liu, Fangchen and Tan, Liam and Lin, Zipeng and Wu, Jeffrey and Abbeel, Pieter and Levine, Sergey},
-                journal={arXiv preprint arXiv:2401.08553},
-                year={2024}
-            }""").lstrip(),
-    },
-    "iamlab_cmu_pickup_insert": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://openreview.net/forum?id=WuBv9-IGDUA",
-        "paper": "https://huggingface.co/papers/2401.14502",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{saxena2023multiresolution,
-                title={Multi-Resolution Sensing for Real-Time Control with Vision-Language Models},
-                author={Saumya Saxena and Mohit Sharma and Oliver Kroemer},
-                booktitle={7th Annual Conference on Robot Learning},
-                year={2023},
-                url={https://openreview.net/forum?id=WuBv9-IGDUA}
-            }""").lstrip(),
-    },
-    "imperialcollege_sawyer_wrist_cam": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-    },
-    "jaco_play": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://github.com/clvrai/clvr_jaco_play_dataset",
-        "citation_bibtex": dedent(r"""
-            @software{dass2023jacoplay,
-                author = {Dass, Shivin and Yapeter, Jullian and Zhang, Jesse and Zhang, Jiahui
-                            and Pertsch, Karl and Nikolaidis, Stefanos and Lim, Joseph J.},
-                title = {CLVR Jaco Play Dataset},
-                url = {https://github.com/clvrai/clvr_jaco_play_dataset},
-                version = {1.0.0},
-                year = {2023}
-            }""").lstrip(),
-    },
-    "kaist_nonprehensile": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://github.com/JaeHyung-Kim/rlds_dataset_builder",
-        "citation_bibtex": dedent(r"""
-            @article{kimpre,
-                title={Pre-and post-contact policy decomposition for non-prehensile manipulation with zero-shot sim-to-real transfer},
-                author={Kim, Minchan and Han, Junhyek and Kim, Jaehyung and Kim, Beomjoon},
-                booktitle={2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
-                year={2023},
-                organization={IEEE}
-            }""").lstrip(),
-    },
-    "nyu_door_opening_surprising_effectiveness": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://jyopari.github.io/VINN/",
-        "paper": "https://huggingface.co/papers/2112.01511",
-        "citation_bibtex": dedent(r"""
-            @misc{pari2021surprising,
-                title={The Surprising Effectiveness of Representation Learning for Visual Imitation},
-                author={Jyothish Pari and Nur Muhammad Shafiullah and Sridhar Pandian Arunachalam and Lerrel Pinto},
-                year={2021},
-                eprint={2112.01511},
-                archivePrefix={arXiv},
-                primaryClass={cs.RO}
-            }""").lstrip(),
-    },
-    "nyu_franka_play_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://play-to-policy.github.io/",
-        "paper": "https://huggingface.co/papers/2210.10047",
-        "citation_bibtex": dedent(r"""
-            @article{cui2022play,
-                title   = {From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data},
-                author  = {Cui, Zichen Jeff and Wang, Yibin and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
-                journal = {arXiv preprint arXiv:2210.10047},
-                year    = {2022}
-            }""").lstrip(),
-    },
-    "nyu_rot_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://rot-robot.github.io/",
-        "paper": "https://huggingface.co/papers/2206.15469",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{haldar2023watch,
-                title={Watch and match: Supercharging imitation with regularized optimal transport},
-                author={Haldar, Siddhant and Mathur, Vaibhav and Yarats, Denis and Pinto, Lerrel},
-                booktitle={Conference on Robot Learning},
-                pages={32--43},
-                year={2023},
-                organization={PMLR}
-            }""").lstrip(),
-    },
-    "roboturk": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://roboturk.stanford.edu/dataset_real.html",
-        "paper": "PAPER",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{mandlekar2019scaling,
-                title={Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity},
-                author={Mandlekar, Ajay and Booher, Jonathan and Spero, Max and Tung, Albert and Gupta, Anchit and Zhu, Yuke and Garg, Animesh and Savarese, Silvio and Fei-Fei, Li},
-                booktitle={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
-                pages={1048--1055},
-                year={2019},
-                organization={IEEE}
-            }""").lstrip(),
-    },
-    "stanford_hydra_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/view/hydra-il-2023",
-        "paper": "https://huggingface.co/papers/2306.17237",
-        "citation_bibtex": dedent(r"""
-            @article{belkhale2023hydra,
-                title={HYDRA: Hybrid Robot Actions for Imitation Learning},
-                author={Belkhale, Suneel and Cui, Yuchen and Sadigh, Dorsa},
-                journal={arxiv},
-                year={2023}
-            }""").lstrip(),
-    },
-    "stanford_kuka_multimodal_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://sites.google.com/view/visionandtouch",
-        "paper": "https://huggingface.co/papers/1810.10191",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{lee2019icra,
-                title={Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks},
-                author={Lee, Michelle A and Zhu, Yuke and Srinivasan, Krishnan and Shah, Parth and Savarese, Silvio and Fei-Fei, Li and  Garg, Animesh and Bohg, Jeannette},
-                booktitle={2019 IEEE International Conference on Robotics and Automation (ICRA)},
-                year={2019},
-                url={https://huggingface.co/papers/1810.10191}
-            }""").lstrip(),
-    },
-    "stanford_robocook": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://hshi74.github.io/robocook/",
-        "paper": "https://huggingface.co/papers/2306.14447",
-        "citation_bibtex": dedent(r"""
-            @article{shi2023robocook,
-                title={RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools},
-                author={Shi, Haochen and Xu, Huazhe and Clarke, Samuel and Li, Yunzhu and Wu, Jiajun},
-                journal={arXiv preprint arXiv:2306.14447},
-                year={2023}
-            }""").lstrip(),
-    },
-    "taco_play": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "url": "https://www.kaggle.com/datasets/oiermees/taco-robot",
-        "paper": "https://huggingface.co/papers/2209.08959, https://huggingface.co/papers/2210.01911",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{rosete2022tacorl,
-                author = {Erick Rosete-Beas and Oier Mees and Gabriel Kalweit and Joschka Boedecker and Wolfram Burgard},
-                title = {Latent Plans for Task Agnostic Offline Reinforcement Learning},
-                journal = {Proceedings of the 6th Conference on Robot Learning (CoRL)},
-                year = {2022}
-            }
-            @inproceedings{mees23hulc2,
-                title={Grounding  Language  with  Visual  Affordances  over  Unstructured  Data},
-                author={Oier Mees and Jessica Borja-Diaz and Wolfram Burgard},
-                booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
-                year={2023},
-                address = {London, UK}
-            }""").lstrip(),
-    },
-    "tokyo_u_lsmo": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "URL",
-        "paper": "https://huggingface.co/papers/2107.05842",
-        "citation_bibtex": dedent(r"""
-            @Article{Osa22,
-                author  = {Takayuki Osa},
-                journal = {The International Journal of Robotics Research},
-                title   = {Motion Planning by Learning the Solution Manifold in Trajectory Optimization},
-                year    = {2022},
-                number  = {3},
-                pages   = {291--311},
-                volume  = {41},
-            }""").lstrip(),
-    },
-    "toto": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://toto-benchmark.org/",
-        "paper": "https://huggingface.co/papers/2306.00942",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{zhou2023train,
-                author={Zhou, Gaoyue and Dean, Victoria and Srirama, Mohan Kumar and Rajeswaran, Aravind and Pari, Jyothish and Hatch, Kyle and Jain, Aryan and Yu, Tianhe and Abbeel, Pieter and Pinto, Lerrel and Finn, Chelsea and Gupta, Abhinav},
-                booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
-                title={Train Offline, Test Online: A Real Robot Learning Benchmark},
-                year={2023},
-            }""").lstrip(),
-    },
-    "ucsd_kitchen_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "citation_bibtex": dedent(r"""
-            @ARTICLE{ucsd_kitchens,
-                author = {Ge Yan, Kris Wu, and Xiaolong Wang},
-                title = {{ucsd kitchens Dataset}},
-                year = {2023},
-                month = {August}
-            }""").lstrip(),
-    },
-    "ucsd_pick_and_place_dataset": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://owmcorl.github.io/#",
-        "paper": "https://huggingface.co/papers/2310.16029",
-        "citation_bibtex": dedent(r"""
-            @preprint{Feng2023Finetuning,
-                title={Finetuning Offline World Models in the Real World},
-                author={Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, Xiaolong Wang},
-                year={2023}
-            }""").lstrip(),
-    },
-    "uiuc_d3field": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://robopil.github.io/d3fields/",
-        "paper": "https://huggingface.co/papers/2309.16118",
-        "citation_bibtex": dedent(r"""
-            @article{wang2023d3field,
-                title={D^3Field: Dynamic 3D Descriptor Fields for Generalizable Robotic Manipulation},
-                author={Wang, Yixuan and Li, Zhuoran and Zhang, Mingtong and Driggs-Campbell, Katherine and Wu, Jiajun and Fei-Fei, Li and Li, Yunzhu},
-                journal={arXiv preprint arXiv:},
-                year={2023},
-            }""").lstrip(),
-    },
-    "usc_cloth_sim": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://uscresl.github.io/dmfd/",
-        "paper": "https://huggingface.co/papers/2207.10148",
-        "citation_bibtex": dedent(r"""
-            @article{salhotra2022dmfd,
-                author={Salhotra, Gautam and Liu, I-Chun Arthur and Dominguez-Kuhne, Marcus and Sukhatme, Gaurav S.},
-                journal={IEEE Robotics and Automation Letters},
-                title={Learning Deformable Object Manipulation From Expert Demonstrations},
-                year={2022},
-                volume={7},
-                number={4},
-                pages={8775-8782},
-                doi={10.1109/LRA.2022.3187843}
-            }""").lstrip(),
-    },
-    "utaustin_mutex": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://ut-austin-rpl.github.io/MUTEX/",
-        "paper": "https://huggingface.co/papers/2309.14320",
-        "citation_bibtex": dedent(r"""
-            @inproceedings{shah2023mutex,
-                title={{MUTEX}: Learning Unified Policies from Multimodal Task Specifications},
-                author={Rutav Shah and Roberto Mart{\'\i}n-Mart{\'\i}n and Yuke Zhu},
-                booktitle={7th Annual Conference on Robot Learning},
-                year={2023},
-                url={https://openreview.net/forum?id=PwqiqaaEzJ}
-            }""").lstrip(),
-    },
-    "utokyo_pr2_opening_fridge": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "citation_bibtex": dedent(r"""
-            @misc{oh2023pr2utokyodatasets,
-                author={Jihoon Oh and Naoaki Kanazawa and Kento Kawaharazuka},
-                title={X-Embodiment U-Tokyo PR2 Datasets},
-                year={2023},
-                url={https://github.com/ojh6404/rlds_dataset_builder},
-            }""").lstrip(),
-    },
-    "utokyo_pr2_tabletop_manipulation": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "citation_bibtex": dedent(r"""
-            @misc{oh2023pr2utokyodatasets,
-                author={Jihoon Oh and Naoaki Kanazawa and Kento Kawaharazuka},
-                title={X-Embodiment U-Tokyo PR2 Datasets},
-                year={2023},
-                url={https://github.com/ojh6404/rlds_dataset_builder},
-            }""").lstrip(),
-    },
-    "utokyo_saytap": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://saytap.github.io/",
-        "paper": "https://huggingface.co/papers/2306.07580",
-        "citation_bibtex": dedent(r"""
-            @article{saytap2023,
-                author = {Yujin Tang and Wenhao Yu and Jie Tan and Heiga Zen and Aleksandra Faust and
-                Tatsuya Harada},
-                title  = {SayTap: Language to Quadrupedal Locomotion},
-                eprint = {arXiv:2306.07580},
-                url    = {https://saytap.github.io},
-                note   = {https://saytap.github.io},
-                year   = {2023}
-            }""").lstrip(),
-    },
-    "utokyo_xarm_bimanual": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "citation_bibtex": dedent(r"""
-            @misc{matsushima2023weblab,
-                title={Weblab xArm Dataset},
-                author={Tatsuya Matsushima and Hiroki Furuta and Yusuke Iwasawa and Yutaka Matsuo},
-                year={2023},
-            }""").lstrip(),
-    },
-    "utokyo_xarm_pick_and_place": {
-        "tasks_col": "language_instruction",
-        "license": "cc-by-4.0",
-        "citation_bibtex": dedent(r"""
-            @misc{matsushima2023weblab,
-                title={Weblab xArm Dataset},
-                author={Tatsuya Matsushima and Hiroki Furuta and Yusuke Iwasawa and Yutaka Matsuo},
-                year={2023},
-            }""").lstrip(),
-    },
-    "viola": {
-        "tasks_col": "language_instruction",
-        "license": "mit",
-        "url": "https://ut-austin-rpl.github.io/VIOLA/",
-        "paper": "https://huggingface.co/papers/2210.11339",
-        "citation_bibtex": dedent(r"""
-            @article{zhu2022viola,
-                title={VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors},
-                author={Zhu, Yifeng and Joshi, Abhishek and Stone, Peter and Zhu, Yuke},
-                journal={6th Annual Conference on Robot Learning (CoRL)},
-                year={2022}
-            }""").lstrip(),
-    },
-}
-# spellchecker:on
-
-
-def batch_convert():
-    status = {}
-    logfile = LOCAL_DIR / "conversion_log.txt"
-    assert set(DATASETS) == {id_.split("/")[1] for id_ in available_datasets}
-    for num, (name, kwargs) in enumerate(DATASETS.items()):
-        repo_id = f"lerobot/{name}"
-        print(f"\nConverting {repo_id} ({num}/{len(DATASETS)})")
-        print("---------------------------------------------------------")
-        try:
-            convert_dataset(repo_id, LOCAL_DIR, **kwargs)
-            status = f"{repo_id}: success."
-            with open(logfile, "a") as file:
-                file.write(status + "\n")
-        except Exception:
-            status = f"{repo_id}: failed\n    {traceback.format_exc()}"
-            with open(logfile, "a") as file:
-                file.write(status + "\n")
-            continue
-
-
-if __name__ == "__main__":
-    batch_convert()
--- a/src/lerobot/datasets/v2/convert_dataset_v1_to_v2.py
+++ b/src/lerobot/datasets/v2/convert_dataset_v1_to_v2.py
@@ -1,687 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-This script will help you convert any LeRobot dataset already pushed to the hub from codebase version 1.6 to
-2.0. You will be required to provide the 'tasks', which is a short but accurate description in plain English
-for each of the task performed in the dataset. This will allow to easily train models with task-conditioning.
-
-We support 3 different scenarios for these tasks (see instructions below):
-    1. Single task dataset: all episodes of your dataset have the same single task.
-    2. Single task episodes: the episodes of your dataset each contain a single task but they can differ from
-      one episode to the next.
-    3. Multi task episodes: episodes of your dataset may each contain several different tasks.
-
-
-Can you can also provide a robot config .yaml file (not mandatory) to this script via the option
-'--robot-config' so that it writes information about the robot (robot type, motors names) this dataset was
-recorded with. For now, only Aloha/Koch type robots are supported with this option.
-
-
-# 1. Single task dataset
-If your dataset contains a single task, you can simply provide it directly via the CLI with the
-'--single-task' option.
-
-Examples:
-
-```bash
-python -m lerobot.datasets.v2.convert_dataset_v1_to_v2 \
-    --repo-id lerobot/aloha_sim_insertion_human_image \
-    --single-task "Insert the peg into the socket." \
-    --robot-config lerobot/configs/robot/aloha.yaml \
-    --local-dir data
-```
-
-```bash
-python -m lerobot.datasets.v2.convert_dataset_v1_to_v2 \
-    --repo-id aliberts/koch_tutorial \
-    --single-task "Pick the Lego block and drop it in the box on the right." \
-    --robot-config lerobot/configs/robot/koch.yaml \
-    --local-dir data
-```
-
-
-# 2. Single task episodes
-If your dataset is a multi-task dataset, you have two options to provide the tasks to this script:
-
- If your dataset already contains a language instruction column in its parquet file, you can simply provide
-  this column's name with the '--tasks-col' arg.
-
-    Example:
-
-    ```bash
-    python -m lerobot.datasets.v2.convert_dataset_v1_to_v2 \
-        --repo-id lerobot/stanford_kuka_multimodal_dataset \
-        --tasks-col "language_instruction" \
-        --local-dir data
-    ```
-
- If your dataset doesn't contain a language instruction, you should provide the path to a .json file with the
-  '--tasks-path' arg. This file should have the following structure where keys correspond to each
-  episode_index in the dataset, and values are the language instruction for that episode.
-
-    Example:
-
-    ```json
-    {
-        "0": "Do something",
-        "1": "Do something else",
-        "2": "Do something",
-        "3": "Go there",
-        ...
-    }
-    ```
-
-# 3. Multi task episodes
-If you have multiple tasks per episodes, your dataset should contain a language instruction column in its
-parquet file, and you must provide this column's name with the '--tasks-col' arg.
-
-Example:
-
-```bash
-python -m lerobot.datasets.v2.convert_dataset_v1_to_v2 \
-    --repo-id lerobot/stanford_kuka_multimodal_dataset \
-    --tasks-col "language_instruction" \
-    --local-dir data
-```
-"""
-
-import argparse
-import contextlib
-import filecmp
-import json
-import logging
-import math
-import shutil
-import subprocess
-import tempfile
-from pathlib import Path
-
-import datasets
-import pyarrow.compute as pc
-import pyarrow.parquet as pq
-import torch
-from datasets import Dataset
-from huggingface_hub import HfApi
-from huggingface_hub.errors import EntryNotFoundError, HfHubHTTPError
-from safetensors.torch import load_file
-
-from lerobot.datasets.utils import (
-    DEFAULT_CHUNK_SIZE,
-    DEFAULT_PARQUET_PATH,
-    DEFAULT_VIDEO_PATH,
-    EPISODES_PATH,
-    INFO_PATH,
-    STATS_PATH,
-    TASKS_PATH,
-    create_branch,
-    create_lerobot_dataset_card,
-    flatten_dict,
-    get_safe_version,
-    load_json,
-    unflatten_dict,
-    write_json,
-    write_jsonlines,
-)
-from lerobot.datasets.video_utils import (
-    VideoFrame,  # noqa: F401
-    get_image_pixel_channels,
-    get_video_info,
-)
-from lerobot.robots import RobotConfig
-
-V16 = "v1.6"
-V20 = "v2.0"
-
-GITATTRIBUTES_REF = "aliberts/gitattributes_reference"
-V1_VIDEO_FILE = "{video_key}_episode_{episode_index:06d}.mp4"
-V1_INFO_PATH = "meta_data/info.json"
-V1_STATS_PATH = "meta_data/stats.safetensors"
-
-
-def parse_robot_config(robot_cfg: RobotConfig) -> tuple[str, dict]:
-    if robot_cfg.type in ["aloha", "koch"]:
-        state_names = [
-            f"{arm}_{motor}" if len(robot_cfg.follower_arms) > 1 else motor
-            for arm in robot_cfg.follower_arms
-            for motor in robot_cfg.follower_arms[arm].motors
-        ]
-        action_names = [
-            # f"{arm}_{motor}" for arm in ["left", "right"] for motor in robot_cfg["leader_arms"][arm]["motors"]
-            f"{arm}_{motor}" if len(robot_cfg.leader_arms) > 1 else motor
-            for arm in robot_cfg.leader_arms
-            for motor in robot_cfg.leader_arms[arm].motors
-        ]
-    # elif robot_cfg["robot_type"] == "stretch3": TODO
-    else:
-        raise NotImplementedError(
-            "Please provide robot_config={'robot_type': ..., 'names': ...} directly to convert_dataset()."
-        )
-
-    return {
-        "robot_type": robot_cfg.type,
-        "names": {
-            "observation.state": state_names,
-            "observation.effort": state_names,
-            "action": action_names,
-        },
-    }
-
-
-def convert_stats_to_json(v1_dir: Path, v2_dir: Path) -> None:
-    safetensor_path = v1_dir / V1_STATS_PATH
-    stats = load_file(safetensor_path)
-    serialized_stats = {key: value.tolist() for key, value in stats.items()}
-    serialized_stats = unflatten_dict(serialized_stats)
-
-    json_path = v2_dir / STATS_PATH
-    json_path.parent.mkdir(exist_ok=True, parents=True)
-    with open(json_path, "w") as f:
-        json.dump(serialized_stats, f, indent=4)
-
-    # Sanity check
-    with open(json_path) as f:
-        stats_json = json.load(f)
-
-    stats_json = flatten_dict(stats_json)
-    stats_json = {key: torch.tensor(value) for key, value in stats_json.items()}
-    for key in stats:
-        torch.testing.assert_close(stats_json[key], stats[key])
-
-
-def get_features_from_hf_dataset(
-    dataset: Dataset, robot_config: RobotConfig | None = None
-) -> dict[str, list]:
-    robot_config = parse_robot_config(robot_config)
-    features = {}
-    for key, ft in dataset.features.items():
-        if isinstance(ft, datasets.Value):
-            dtype = ft.dtype
-            shape = (1,)
-            names = None
-        if isinstance(ft, datasets.Sequence):
-            assert isinstance(ft.feature, datasets.Value)
-            dtype = ft.feature.dtype
-            shape = (ft.length,)
-            motor_names = (
-                robot_config["names"][key] if robot_config else [f"motor_{i}" for i in range(ft.length)]
-            )
-            assert len(motor_names) == shape[0]
-            names = {"motors": motor_names}
-        elif isinstance(ft, datasets.Image):
-            dtype = "image"
-            image = dataset[0][key]  # Assuming first row
-            channels = get_image_pixel_channels(image)
-            shape = (image.height, image.width, channels)
-            names = ["height", "width", "channels"]
-        elif ft._type == "VideoFrame":
-            dtype = "video"
-            shape = None  # Add shape later
-            names = ["height", "width", "channels"]
-
-        features[key] = {
-            "dtype": dtype,
-            "shape": shape,
-            "names": names,
-        }
-
-    return features
-
-
-def add_task_index_by_episodes(dataset: Dataset, tasks_by_episodes: dict) -> tuple[Dataset, list[str]]:
-    df = dataset.to_pandas()
-    tasks = list(set(tasks_by_episodes.values()))
-    tasks_to_task_index = {task: task_idx for task_idx, task in enumerate(tasks)}
-    episodes_to_task_index = {ep_idx: tasks_to_task_index[task] for ep_idx, task in tasks_by_episodes.items()}
-    df["task_index"] = df["episode_index"].map(episodes_to_task_index).astype(int)
-
-    features = dataset.features
-    features["task_index"] = datasets.Value(dtype="int64")
-    dataset = Dataset.from_pandas(df, features=features, split="train")
-    return dataset, tasks
-
-
-def add_task_index_from_tasks_col(
-    dataset: Dataset, tasks_col: str
-) -> tuple[Dataset, dict[str, list[str]], list[str]]:
-    df = dataset.to_pandas()
-
-    # HACK: This is to clean some of the instructions in our version of Open X datasets
-    prefix_to_clean = "tf.Tensor(b'"
-    suffix_to_clean = "', shape=(), dtype=string)"
-    df[tasks_col] = df[tasks_col].str.removeprefix(prefix_to_clean).str.removesuffix(suffix_to_clean)
-
-    # Create task_index col
-    tasks_by_episode = df.groupby("episode_index")[tasks_col].unique().apply(lambda x: x.tolist()).to_dict()
-    tasks = df[tasks_col].unique().tolist()
-    tasks_to_task_index = {task: idx for idx, task in enumerate(tasks)}
-    df["task_index"] = df[tasks_col].map(tasks_to_task_index).astype(int)
-
-    # Build the dataset back from df
-    features = dataset.features
-    features["task_index"] = datasets.Value(dtype="int64")
-    dataset = Dataset.from_pandas(df, features=features, split="train")
-    dataset = dataset.remove_columns(tasks_col)
-
-    return dataset, tasks, tasks_by_episode
-
-
-def split_parquet_by_episodes(
-    dataset: Dataset,
-    total_episodes: int,
-    total_chunks: int,
-    output_dir: Path,
-) -> list:
-    table = dataset.data.table
-    episode_lengths = []
-    for ep_chunk in range(total_chunks):
-        ep_chunk_start = DEFAULT_CHUNK_SIZE * ep_chunk
-        ep_chunk_end = min(DEFAULT_CHUNK_SIZE * (ep_chunk + 1), total_episodes)
-        chunk_dir = "/".join(DEFAULT_PARQUET_PATH.split("/")[:-1]).format(episode_chunk=ep_chunk)
-        (output_dir / chunk_dir).mkdir(parents=True, exist_ok=True)
-        for ep_idx in range(ep_chunk_start, ep_chunk_end):
-            ep_table = table.filter(pc.equal(table["episode_index"], ep_idx))
-            episode_lengths.insert(ep_idx, len(ep_table))
-            output_file = output_dir / DEFAULT_PARQUET_PATH.format(
-                episode_chunk=ep_chunk, episode_index=ep_idx
-            )
-            pq.write_table(ep_table, output_file)
-
-    return episode_lengths
-
-
-def move_videos(
-    repo_id: str,
-    video_keys: list[str],
-    total_episodes: int,
-    total_chunks: int,
-    work_dir: Path,
-    clean_gittatributes: Path,
-    branch: str = "main",
-) -> None:
-    """
-    HACK: Since HfApi() doesn't provide a way to move files directly in a repo, this function will run git
-    commands to fetch git lfs video files references to move them into subdirectories without having to
-    actually download them.
-    """
-    _lfs_clone(repo_id, work_dir, branch)
-
-    videos_moved = False
-    video_files = [str(f.relative_to(work_dir)) for f in work_dir.glob("videos*/*.mp4")]
-    if len(video_files) == 0:
-        video_files = [str(f.relative_to(work_dir)) for f in work_dir.glob("videos*/*/*/*.mp4")]
-        videos_moved = True  # Videos have already been moved
-
-    assert len(video_files) == total_episodes * len(video_keys)
-
-    lfs_untracked_videos = _get_lfs_untracked_videos(work_dir, video_files)
-
-    current_gittatributes = work_dir / ".gitattributes"
-    if not filecmp.cmp(current_gittatributes, clean_gittatributes, shallow=False):
-        fix_gitattributes(work_dir, current_gittatributes, clean_gittatributes)
-
-    if lfs_untracked_videos:
-        fix_lfs_video_files_tracking(work_dir, video_files)
-
-    if videos_moved:
-        return
-
-    video_dirs = sorted(work_dir.glob("videos*/"))
-    for ep_chunk in range(total_chunks):
-        ep_chunk_start = DEFAULT_CHUNK_SIZE * ep_chunk
-        ep_chunk_end = min(DEFAULT_CHUNK_SIZE * (ep_chunk + 1), total_episodes)
-        for vid_key in video_keys:
-            chunk_dir = "/".join(DEFAULT_VIDEO_PATH.split("/")[:-1]).format(
-                episode_chunk=ep_chunk, video_key=vid_key
-            )
-            (work_dir / chunk_dir).mkdir(parents=True, exist_ok=True)
-
-            for ep_idx in range(ep_chunk_start, ep_chunk_end):
-                target_path = DEFAULT_VIDEO_PATH.format(
-                    episode_chunk=ep_chunk, video_key=vid_key, episode_index=ep_idx
-                )
-                video_file = V1_VIDEO_FILE.format(video_key=vid_key, episode_index=ep_idx)
-                if len(video_dirs) == 1:
-                    video_path = video_dirs[0] / video_file
-                else:
-                    for dir in video_dirs:
-                        if (dir / video_file).is_file():
-                            video_path = dir / video_file
-                            break
-
-                video_path.rename(work_dir / target_path)
-
-    commit_message = "Move video files into chunk subdirectories"
-    subprocess.run(["git", "add", "."], cwd=work_dir, check=True)
-    subprocess.run(["git", "commit", "-m", commit_message], cwd=work_dir, check=True)
-    subprocess.run(["git", "push"], cwd=work_dir, check=True)
-
-
-def fix_lfs_video_files_tracking(work_dir: Path, lfs_untracked_videos: list[str]) -> None:
-    """
-    HACK: This function fixes the tracking by git lfs which was not properly set on some repos. In that case,
-    there's no other option than to download the actual files and reupload them with lfs tracking.
-    """
-    for i in range(0, len(lfs_untracked_videos), 100):
-        files = lfs_untracked_videos[i : i + 100]
-        try:
-            subprocess.run(["git", "rm", "--cached", *files], cwd=work_dir, capture_output=True, check=True)
-        except subprocess.CalledProcessError as e:
-            print("git rm --cached ERROR:")
-            print(e.stderr)
-        subprocess.run(["git", "add", *files], cwd=work_dir, check=True)
-
-    commit_message = "Track video files with git lfs"
-    subprocess.run(["git", "commit", "-m", commit_message], cwd=work_dir, check=True)
-    subprocess.run(["git", "push"], cwd=work_dir, check=True)
-
-
-def fix_gitattributes(work_dir: Path, current_gittatributes: Path, clean_gittatributes: Path) -> None:
-    shutil.copyfile(clean_gittatributes, current_gittatributes)
-    subprocess.run(["git", "add", ".gitattributes"], cwd=work_dir, check=True)
-    subprocess.run(["git", "commit", "-m", "Fix .gitattributes"], cwd=work_dir, check=True)
-    subprocess.run(["git", "push"], cwd=work_dir, check=True)
-
-
-def _lfs_clone(repo_id: str, work_dir: Path, branch: str) -> None:
-    subprocess.run(["git", "lfs", "install"], cwd=work_dir, check=True)
-    repo_url = f"https://huggingface.co/datasets/{repo_id}"
-    env = {"GIT_LFS_SKIP_SMUDGE": "1"}  # Prevent downloading LFS files
-    subprocess.run(
-        ["git", "clone", "--branch", branch, "--single-branch", "--depth", "1", repo_url, str(work_dir)],
-        check=True,
-        env=env,
-    )
-
-
-def _get_lfs_untracked_videos(work_dir: Path, video_files: list[str]) -> list[str]:
-    lfs_tracked_files = subprocess.run(
-        ["git", "lfs", "ls-files", "-n"], cwd=work_dir, capture_output=True, text=True, check=True
-    )
-    lfs_tracked_files = set(lfs_tracked_files.stdout.splitlines())
-    return [f for f in video_files if f not in lfs_tracked_files]
-
-
-def get_videos_info(repo_id: str, local_dir: Path, video_keys: list[str], branch: str) -> dict:
-    # Assumes first episode
-    video_files = [
-        DEFAULT_VIDEO_PATH.format(episode_chunk=0, video_key=vid_key, episode_index=0)
-        for vid_key in video_keys
-    ]
-    hub_api = HfApi()
-    hub_api.snapshot_download(
-        repo_id=repo_id, repo_type="dataset", local_dir=local_dir, revision=branch, allow_patterns=video_files
-    )
-    videos_info_dict = {}
-    for vid_key, vid_path in zip(video_keys, video_files, strict=True):
-        videos_info_dict[vid_key] = get_video_info(local_dir / vid_path)
-
-    return videos_info_dict
-
-
-def convert_dataset(
-    repo_id: str,
-    local_dir: Path,
-    single_task: str | None = None,
-    tasks_path: Path | None = None,
-    tasks_col: Path | None = None,
-    robot_config: RobotConfig | None = None,
-    test_branch: str | None = None,
-    **card_kwargs,
-):
-    v1 = get_safe_version(repo_id, V16)
-    v1x_dir = local_dir / V16 / repo_id
-    v20_dir = local_dir / V20 / repo_id
-    v1x_dir.mkdir(parents=True, exist_ok=True)
-    v20_dir.mkdir(parents=True, exist_ok=True)
-
-    hub_api = HfApi()
-    hub_api.snapshot_download(
-        repo_id=repo_id, repo_type="dataset", revision=v1, local_dir=v1x_dir, ignore_patterns="videos*/"
-    )
-    branch = "main"
-    if test_branch:
-        branch = test_branch
-        create_branch(repo_id=repo_id, branch=test_branch, repo_type="dataset")
-
-    metadata_v1 = load_json(v1x_dir / V1_INFO_PATH)
-    dataset = datasets.load_dataset("parquet", data_dir=v1x_dir / "data", split="train")
-    features = get_features_from_hf_dataset(dataset, robot_config)
-    video_keys = [key for key, ft in features.items() if ft["dtype"] == "video"]
-
-    if single_task and "language_instruction" in dataset.column_names:
-        logging.warning(
-            "'single_task' provided but 'language_instruction' tasks_col found. Using 'language_instruction'.",
-        )
-        single_task = None
-        tasks_col = "language_instruction"
-
-    # Episodes & chunks
-    episode_indices = sorted(dataset.unique("episode_index"))
-    total_episodes = len(episode_indices)
-    assert episode_indices == list(range(total_episodes))
-    total_videos = total_episodes * len(video_keys)
-    total_chunks = total_episodes // DEFAULT_CHUNK_SIZE
-    if total_episodes % DEFAULT_CHUNK_SIZE != 0:
-        total_chunks += 1
-
-    # Tasks
-    if single_task:
-        tasks_by_episodes = dict.fromkeys(episode_indices, single_task)
-        dataset, tasks = add_task_index_by_episodes(dataset, tasks_by_episodes)
-        tasks_by_episodes = {ep_idx: [task] for ep_idx, task in tasks_by_episodes.items()}
-    elif tasks_path:
-        tasks_by_episodes = load_json(tasks_path)
-        tasks_by_episodes = {int(ep_idx): task for ep_idx, task in tasks_by_episodes.items()}
-        dataset, tasks = add_task_index_by_episodes(dataset, tasks_by_episodes)
-        tasks_by_episodes = {ep_idx: [task] for ep_idx, task in tasks_by_episodes.items()}
-    elif tasks_col:
-        dataset, tasks, tasks_by_episodes = add_task_index_from_tasks_col(dataset, tasks_col)
-    else:
-        raise ValueError
-
-    assert set(tasks) == {task for ep_tasks in tasks_by_episodes.values() for task in ep_tasks}
-    tasks = [{"task_index": task_idx, "task": task} for task_idx, task in enumerate(tasks)]
-    write_jsonlines(tasks, v20_dir / TASKS_PATH)
-    features["task_index"] = {
-        "dtype": "int64",
-        "shape": (1,),
-        "names": None,
-    }
-
-    # Videos
-    if video_keys:
-        assert metadata_v1.get("video", False)
-        dataset = dataset.remove_columns(video_keys)
-        clean_gitattr = Path(
-            hub_api.hf_hub_download(
-                repo_id=GITATTRIBUTES_REF, repo_type="dataset", local_dir=local_dir, filename=".gitattributes"
-            )
-        ).absolute()
-        with tempfile.TemporaryDirectory() as tmp_video_dir:
-            move_videos(
-                repo_id, video_keys, total_episodes, total_chunks, Path(tmp_video_dir), clean_gitattr, branch
-            )
-        videos_info = get_videos_info(repo_id, v1x_dir, video_keys=video_keys, branch=branch)
-        for key in video_keys:
-            features[key]["shape"] = (
-                videos_info[key].pop("video.height"),
-                videos_info[key].pop("video.width"),
-                videos_info[key].pop("video.channels"),
-            )
-            features[key]["video_info"] = videos_info[key]
-            assert math.isclose(videos_info[key]["video.fps"], metadata_v1["fps"], rel_tol=1e-3)
-            if "encoding" in metadata_v1:
-                assert videos_info[key]["video.pix_fmt"] == metadata_v1["encoding"]["pix_fmt"]
-    else:
-        assert metadata_v1.get("video", 0) == 0
-        videos_info = None
-
-    # Split data into 1 parquet file by episode
-    episode_lengths = split_parquet_by_episodes(dataset, total_episodes, total_chunks, v20_dir)
-
-    if robot_config is not None:
-        robot_type = robot_config.type
-        repo_tags = [robot_type]
-    else:
-        robot_type = "unknown"
-        repo_tags = None
-
-    # Episodes
-    episodes = [
-        {"episode_index": ep_idx, "tasks": tasks_by_episodes[ep_idx], "length": episode_lengths[ep_idx]}
-        for ep_idx in episode_indices
-    ]
-    write_jsonlines(episodes, v20_dir / EPISODES_PATH)
-
-    # Assemble metadata v2.0
-    metadata_v2_0 = {
-        "codebase_version": V20,
-        "robot_type": robot_type,
-        "total_episodes": total_episodes,
-        "total_frames": len(dataset),
-        "total_tasks": len(tasks),
-        "total_videos": total_videos,
-        "total_chunks": total_chunks,
-        "chunks_size": DEFAULT_CHUNK_SIZE,
-        "fps": metadata_v1["fps"],
-        "splits": {"train": f"0:{total_episodes}"},
-        "data_path": DEFAULT_PARQUET_PATH,
-        "video_path": DEFAULT_VIDEO_PATH if video_keys else None,
-        "features": features,
-    }
-    write_json(metadata_v2_0, v20_dir / INFO_PATH)
-    convert_stats_to_json(v1x_dir, v20_dir)
-    card = create_lerobot_dataset_card(tags=repo_tags, dataset_info=metadata_v2_0, **card_kwargs)
-
-    with contextlib.suppress(EntryNotFoundError, HfHubHTTPError):
-        hub_api.delete_folder(repo_id=repo_id, path_in_repo="data", repo_type="dataset", revision=branch)
-
-    with contextlib.suppress(EntryNotFoundError, HfHubHTTPError):
-        hub_api.delete_folder(repo_id=repo_id, path_in_repo="meta_data", repo_type="dataset", revision=branch)
-
-    with contextlib.suppress(EntryNotFoundError, HfHubHTTPError):
-        hub_api.delete_folder(repo_id=repo_id, path_in_repo="meta", repo_type="dataset", revision=branch)
-
-    hub_api.upload_folder(
-        repo_id=repo_id,
-        path_in_repo="data",
-        folder_path=v20_dir / "data",
-        repo_type="dataset",
-        revision=branch,
-    )
-    hub_api.upload_folder(
-        repo_id=repo_id,
-        path_in_repo="meta",
-        folder_path=v20_dir / "meta",
-        repo_type="dataset",
-        revision=branch,
-    )
-
-    card.push_to_hub(repo_id=repo_id, repo_type="dataset", revision=branch)
-
-    if not test_branch:
-        create_branch(repo_id=repo_id, branch=V20, repo_type="dataset")
-
-
-def make_robot_config(robot_type: str, **kwargs) -> RobotConfig:
-    if robot_type == "aloha":
-        raise NotImplementedError  # TODO
-
-    elif robot_type == "koch_follower":
-        from lerobot.robots.koch_follower import KochFollowerConfig
-
-        return KochFollowerConfig(**kwargs)
-    elif robot_type == "so100_follower":
-        from lerobot.robots.so100_follower import SO100FollowerConfig
-
-        return SO100FollowerConfig(**kwargs)
-    elif robot_type == "stretch":
-        from lerobot.robots.stretch3 import Stretch3RobotConfig
-
-        return Stretch3RobotConfig(**kwargs)
-    elif robot_type == "lekiwi":
-        from lerobot.robots.lekiwi import LeKiwiConfig
-
-        return LeKiwiConfig(**kwargs)
-    else:
-        raise ValueError(f"Robot type '{robot_type}' is not available.")
-
-
-def main():
-    parser = argparse.ArgumentParser()
-    task_args = parser.add_mutually_exclusive_group(required=True)
-
-    parser.add_argument(
-        "--repo-id",
-        type=str,
-        required=True,
-        help="Repository identifier on Hugging Face: a community or a user name `/` the name of the dataset (e.g. `lerobot/pusht`, `cadene/aloha_sim_insertion_human`).",
-    )
-    task_args.add_argument(
-        "--single-task",
-        type=str,
-        help="A short but accurate description of the single task performed in the dataset.",
-    )
-    task_args.add_argument(
-        "--tasks-col",
-        type=str,
-        help="The name of the column containing language instructions",
-    )
-    task_args.add_argument(
-        "--tasks-path",
-        type=Path,
-        help="The path to a .json file containing one language instruction for each episode_index",
-    )
-    parser.add_argument(
-        "--robot",
-        type=str,
-        default=None,
-        help="Robot config used for the dataset during conversion (e.g. 'koch', 'aloha', 'so100', etc.)",
-    )
-    parser.add_argument(
-        "--local-dir",
-        type=Path,
-        default=None,
-        help="Local directory to store the dataset during conversion. Defaults to /tmp/lerobot_dataset_v2",
-    )
-    parser.add_argument(
-        "--license",
-        type=str,
-        default="apache-2.0",
-        help="Repo license. Must be one of https://huggingface.co/docs/hub/repositories-licenses. Defaults to mit.",
-    )
-    parser.add_argument(
-        "--test-branch",
-        type=str,
-        default=None,
-        help="Repo branch to test your conversion first (e.g. 'v2.0.test')",
-    )
-
-    args = parser.parse_args()
-    if not args.local_dir:
-        args.local_dir = Path("/tmp/lerobot_dataset_v2")
-
-    if args.robot is not None:
-        robot_config = make_robot_config(args.robot)
-
-    del args.robot
-
-    convert_dataset(**vars(args), robot_config=robot_config)
-
-
-if __name__ == "__main__":
-    main()
--- a/src/lerobot/datasets/v21/_remove_language_instruction.py
+++ b/src/lerobot/datasets/v21/_remove_language_instruction.py
@@ -1,87 +0,0 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import traceback
-from pathlib import Path
-
-from datasets import get_dataset_config_info
-from huggingface_hub import HfApi
-
-from lerobot import available_datasets
-from lerobot.datasets.lerobot_dataset import LeRobotDatasetMetadata
-from lerobot.datasets.utils import INFO_PATH, write_info
-from lerobot.datasets.v21.convert_dataset_v20_to_v21 import V20, SuppressWarnings
-
-LOCAL_DIR = Path("data/")
-
-hub_api = HfApi()
-
-
-def fix_dataset(repo_id: str) -> str:
-    if not hub_api.revision_exists(repo_id, V20, repo_type="dataset"):
-        return f"{repo_id}: skipped (not in {V20})."
-
-    dataset_info = get_dataset_config_info(repo_id, "default")
-    with SuppressWarnings():
-        lerobot_metadata = LeRobotDatasetMetadata(repo_id, revision=V20, force_cache_sync=True)
-
-    meta_features = {key for key, ft in lerobot_metadata.features.items() if ft["dtype"] != "video"}
-    parquet_features = set(dataset_info.features)
-
-    diff_parquet_meta = parquet_features - meta_features
-    diff_meta_parquet = meta_features - parquet_features
-
-    if diff_parquet_meta:
-        raise ValueError(f"In parquet not in info.json: {parquet_features - meta_features}")
-
-    if not diff_meta_parquet:
-        return f"{repo_id}: skipped (no diff)"
-
-    if diff_meta_parquet:
-        logging.warning(f"In info.json not in parquet: {meta_features - parquet_features}")
-        assert diff_meta_parquet == {"language_instruction"}
-        lerobot_metadata.features.pop("language_instruction")
-        write_info(lerobot_metadata.info, lerobot_metadata.root)
-        commit_info = hub_api.upload_file(
-            path_or_fileobj=lerobot_metadata.root / INFO_PATH,
-            path_in_repo=INFO_PATH,
-            repo_id=repo_id,
-            repo_type="dataset",
-            revision=V20,
-            commit_message="Remove 'language_instruction'",
-            create_pr=True,
-        )
-        return f"{repo_id}: success - PR: {commit_info.pr_url}"
-
-
-def batch_fix():
-    status = {}
-    LOCAL_DIR.mkdir(parents=True, exist_ok=True)
-    logfile = LOCAL_DIR / "fix_features_v20.txt"
-    for num, repo_id in enumerate(available_datasets):
-        print(f"\nConverting {repo_id} ({num}/{len(available_datasets)})")
-        print("---------------------------------------------------------")
-        try:
-            status = fix_dataset(repo_id)
-        except Exception:
-            status = f"{repo_id}: failed\n    {traceback.format_exc()}"
-
-        logging.info(status)
-        with open(logfile, "a") as file:
-            file.write(status + "\n")
-
-
-if __name__ == "__main__":
-    batch_fix()
--- a/src/lerobot/datasets/v21/batch_convert_dataset_v20_to_v21.py
+++ b/src/lerobot/datasets/v21/batch_convert_dataset_v20_to_v21.py
@@ -1,54 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-This script is for internal use to convert all datasets under the 'lerobot' hub user account to v2.1.
-"""
-
-import traceback
-from pathlib import Path
-
-from huggingface_hub import HfApi
-
-from lerobot import available_datasets
-from lerobot.datasets.v21.convert_dataset_v20_to_v21 import V21, convert_dataset
-
-LOCAL_DIR = Path("data/")
-
-
-def batch_convert():
-    status = {}
-    LOCAL_DIR.mkdir(parents=True, exist_ok=True)
-    logfile = LOCAL_DIR / "conversion_log_v21.txt"
-    hub_api = HfApi()
-    for num, repo_id in enumerate(available_datasets):
-        print(f"\nConverting {repo_id} ({num}/{len(available_datasets)})")
-        print("---------------------------------------------------------")
-        try:
-            if hub_api.revision_exists(repo_id, V21, repo_type="dataset"):
-                status = f"{repo_id}: success (already in {V21})."
-            else:
-                convert_dataset(repo_id)
-                status = f"{repo_id}: success."
-        except Exception:
-            status = f"{repo_id}: failed\n    {traceback.format_exc()}"
-
-        with open(logfile, "a") as file:
-            file.write(status + "\n")
-
-
-if __name__ == "__main__":
-    batch_convert()
--- a/src/lerobot/datasets/v21/convert_dataset_v20_to_v21.py
+++ b/src/lerobot/datasets/v21/convert_dataset_v20_to_v21.py
@@ -1,114 +0,0 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-This script will help you convert any LeRobot dataset already pushed to the hub from codebase version 2.0 to
-2.1. It will:
-
- Generate per-episodes stats and writes them in `episodes_stats.jsonl`
- Check consistency between these new stats and the old ones.
- Remove the deprecated `stats.json`.
- Update codebase_version in `info.json`.
- Push this new version to the hub on the 'main' branch and tags it with "v2.1".
-
-Usage:
-
-```bash
-python -m lerobot.datasets.v21.convert_dataset_v20_to_v21 \
-    --repo-id=aliberts/koch_tutorial
-```
-
-"""
-
-import argparse
-import logging
-
-from huggingface_hub import HfApi
-
-from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset
-from lerobot.datasets.utils import EPISODES_STATS_PATH, STATS_PATH, load_stats, write_info
-from lerobot.datasets.v21.convert_stats import check_aggregate_stats, convert_stats
-
-V20 = "v2.0"
-V21 = "v2.1"
-
-
-class SuppressWarnings:
-    def __enter__(self):
-        self.previous_level = logging.getLogger().getEffectiveLevel()
-        logging.getLogger().setLevel(logging.ERROR)
-
-    def __exit__(self, exc_type, exc_val, exc_tb):
-        logging.getLogger().setLevel(self.previous_level)
-
-
-def convert_dataset(
-    repo_id: str,
-    branch: str | None = None,
-    num_workers: int = 4,
-):
-    with SuppressWarnings():
-        dataset = LeRobotDataset(repo_id, revision=V20, force_cache_sync=True)
-
-    if (dataset.root / EPISODES_STATS_PATH).is_file():
-        (dataset.root / EPISODES_STATS_PATH).unlink()
-
-    convert_stats(dataset, num_workers=num_workers)
-    ref_stats = load_stats(dataset.root)
-    check_aggregate_stats(dataset, ref_stats)
-
-    dataset.meta.info["codebase_version"] = CODEBASE_VERSION
-    write_info(dataset.meta.info, dataset.root)
-
-    dataset.push_to_hub(branch=branch, tag_version=False, allow_patterns="meta/")
-
-    # delete old stats.json file
-    if (dataset.root / STATS_PATH).is_file:
-        (dataset.root / STATS_PATH).unlink()
-
-    hub_api = HfApi()
-    if hub_api.file_exists(
-        repo_id=dataset.repo_id, filename=STATS_PATH, revision=branch, repo_type="dataset"
-    ):
-        hub_api.delete_file(
-            path_in_repo=STATS_PATH, repo_id=dataset.repo_id, revision=branch, repo_type="dataset"
-        )
-
-    hub_api.create_tag(repo_id, tag=CODEBASE_VERSION, revision=branch, repo_type="dataset")
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--repo-id",
-        type=str,
-        required=True,
-        help="Repository identifier on Hugging Face: a community or a user name `/` the name of the dataset "
-        "(e.g. `lerobot/pusht`, `cadene/aloha_sim_insertion_human`).",
-    )
-    parser.add_argument(
-        "--branch",
-        type=str,
-        default=None,
-        help="Repo branch to push your dataset. Defaults to the main branch.",
-    )
-    parser.add_argument(
-        "--num-workers",
-        type=int,
-        default=4,
-        help="Number of workers for parallelizing stats compute. Defaults to 4.",
-    )
-
-    args = parser.parse_args()
-    convert_dataset(**vars(args))
--- a/src/lerobot/datasets/v21/convert_stats.py
+++ b/src/lerobot/datasets/v21/convert_stats.py
@@ -1,99 +0,0 @@
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from concurrent.futures import ThreadPoolExecutor, as_completed
-
-import numpy as np
-from tqdm import tqdm
-
-from lerobot.datasets.compute_stats import aggregate_stats, get_feature_stats, sample_indices
-from lerobot.datasets.lerobot_dataset import LeRobotDataset
-from lerobot.datasets.utils import write_episode_stats
-
-
-def sample_episode_video_frames(dataset: LeRobotDataset, episode_index: int, ft_key: str) -> np.ndarray:
-    ep_len = dataset.meta.episodes[episode_index]["length"]
-    sampled_indices = sample_indices(ep_len)
-    query_timestamps = dataset._get_query_timestamps(0.0, {ft_key: sampled_indices})
-    video_frames = dataset._query_videos(query_timestamps, episode_index)
-    return video_frames[ft_key].numpy()
-
-
-def convert_episode_stats(dataset: LeRobotDataset, ep_idx: int):
-    ep_start_idx = dataset.episode_data_index["from"][ep_idx]
-    ep_end_idx = dataset.episode_data_index["to"][ep_idx]
-    ep_data = dataset.hf_dataset.select(range(ep_start_idx, ep_end_idx))
-
-    ep_stats = {}
-    for key, ft in dataset.features.items():
-        if ft["dtype"] == "video":
-            # We sample only for videos
-            ep_ft_data = sample_episode_video_frames(dataset, ep_idx, key)
-        else:
-            ep_ft_data = np.array(ep_data[key])
-
-        axes_to_reduce = (0, 2, 3) if ft["dtype"] in ["image", "video"] else 0
-        keepdims = True if ft["dtype"] in ["image", "video"] else ep_ft_data.ndim == 1
-        ep_stats[key] = get_feature_stats(ep_ft_data, axis=axes_to_reduce, keepdims=keepdims)
-
-        if ft["dtype"] in ["image", "video"]:  # remove batch dim
-            ep_stats[key] = {
-                k: v if k == "count" else np.squeeze(v, axis=0) for k, v in ep_stats[key].items()
-            }
-
-    dataset.meta.episodes_stats[ep_idx] = ep_stats
-
-
-def convert_stats(dataset: LeRobotDataset, num_workers: int = 0):
-    assert dataset.episodes is None
-    print("Computing episodes stats")
-    total_episodes = dataset.meta.total_episodes
-    if num_workers > 0:
-        with ThreadPoolExecutor(max_workers=num_workers) as executor:
-            futures = {
-                executor.submit(convert_episode_stats, dataset, ep_idx): ep_idx
-                for ep_idx in range(total_episodes)
-            }
-            for future in tqdm(as_completed(futures), total=total_episodes):
-                future.result()
-    else:
-        for ep_idx in tqdm(range(total_episodes)):
-            convert_episode_stats(dataset, ep_idx)
-
-    for ep_idx in tqdm(range(total_episodes)):
-        write_episode_stats(ep_idx, dataset.meta.episodes_stats[ep_idx], dataset.root)
-
-
-def check_aggregate_stats(
-    dataset: LeRobotDataset,
-    reference_stats: dict[str, dict[str, np.ndarray]],
-    video_rtol_atol: tuple[float] = (1e-2, 1e-2),
-    default_rtol_atol: tuple[float] = (5e-6, 6e-5),
-):
-    """Verifies that the aggregated stats from episodes_stats are close to reference stats."""
-    agg_stats = aggregate_stats(list(dataset.meta.episodes_stats.values()))
-    for key, ft in dataset.features.items():
-        # These values might need some fine-tuning
-        if ft["dtype"] == "video":
-            # to account for image sub-sampling
-            rtol, atol = video_rtol_atol
-        else:
-            rtol, atol = default_rtol_atol
-
-        for stat, val in agg_stats[key].items():
-            if key in reference_stats and stat in reference_stats[key]:
-                err_msg = f"feature='{key}' stats='{stat}'"
-                np.testing.assert_allclose(
-                    val, reference_stats[key][stat], rtol=rtol, atol=atol, err_msg=err_msg
-                )
--- a/src/lerobot/datasets/v30/convert_dataset_v21_to_v30.py
+++ b/src/lerobot/datasets/v30/convert_dataset_v21_to_v30.py
@@ -0,0 +1,500 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+This script will help you convert any LeRobot dataset already pushed to the hub from codebase version 2.1 to
+3.0. It will:
+
+- Generate per-episodes stats and writes them in `episodes_stats.jsonl`
+- Check consistency between these new stats and the old ones.
+- Remove the deprecated `stats.json`.
+- Update codebase_version in `info.json`.
+- Push this new version to the hub on the 'main' branch and tags it with "v3.0".
+
+Usage:
+
+```bash
+python src/lerobot/datasets/v30/convert_dataset_v21_to_v30.py \
+    --repo-id=lerobot/pusht
+```
+
+"""
+
+import argparse
+import shutil
+from pathlib import Path
+from typing import Any
+
+import jsonlines
+import pandas as pd
+import pyarrow as pa
+import tqdm
+from datasets import Dataset, Features, Image
+from huggingface_hub import HfApi, snapshot_download
+from requests import HTTPError
+
+from lerobot.constants import HF_LEROBOT_HOME
+from lerobot.datasets.compute_stats import aggregate_stats
+from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset
+from lerobot.datasets.utils import (
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_DATA_PATH,
+    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+    DEFAULT_VIDEO_PATH,
+    LEGACY_EPISODES_PATH,
+    LEGACY_EPISODES_STATS_PATH,
+    LEGACY_TASKS_PATH,
+    cast_stats_to_numpy,
+    flatten_dict,
+    get_parquet_file_size_in_mb,
+    get_parquet_num_frames,
+    get_video_size_in_mb,
+    load_info,
+    update_chunk_file_indices,
+    write_episodes,
+    write_info,
+    write_stats,
+    write_tasks,
+)
+from lerobot.datasets.video_utils import concatenate_video_files, get_video_duration_in_s
+
+V21 = "v2.1"
+
+
+"""
+-------------------------
+OLD
+data/chunk-000/episode_000000.parquet
+
+NEW
+data/chunk-000/file_000.parquet
+-------------------------
+OLD
+videos/chunk-000/CAMERA/episode_000000.mp4
+
+NEW
+videos/chunk-000/file_000.mp4
+-------------------------
+OLD
+episodes.jsonl
+{"episode_index": 1, "tasks": ["Put the blue block in the green bowl"], "length": 266}
+
+NEW
+meta/episodes/chunk-000/episodes_000.parquet
+episode_index | video_chunk_index | video_file_index | data_chunk_index | data_file_index | tasks | length
+-------------------------
+OLD
+tasks.jsonl
+{"task_index": 1, "task": "Put the blue block in the green bowl"}
+
+NEW
+meta/tasks/chunk-000/file_000.parquet
+task_index | task
+-------------------------
+OLD
+episodes_stats.jsonl
+
+NEW
+meta/episodes_stats/chunk-000/file_000.parquet
+episode_index | mean | std | min | max
+-------------------------
+UPDATE
+meta/info.json
+-------------------------
+"""
+
+
+def load_jsonlines(fpath: Path) -> list[Any]:
+    with jsonlines.open(fpath, "r") as reader:
+        return list(reader)
+
+
+def legacy_load_episodes(local_dir: Path) -> dict:
+    episodes = load_jsonlines(local_dir / LEGACY_EPISODES_PATH)
+    return {item["episode_index"]: item for item in sorted(episodes, key=lambda x: x["episode_index"])}
+
+
+def legacy_load_episodes_stats(local_dir: Path) -> dict:
+    episodes_stats = load_jsonlines(local_dir / LEGACY_EPISODES_STATS_PATH)
+    return {
+        item["episode_index"]: cast_stats_to_numpy(item["stats"])
+        for item in sorted(episodes_stats, key=lambda x: x["episode_index"])
+    }
+
+
+def legacy_load_tasks(local_dir: Path) -> tuple[dict, dict]:
+    tasks = load_jsonlines(local_dir / LEGACY_TASKS_PATH)
+    tasks = {item["task_index"]: item["task"] for item in sorted(tasks, key=lambda x: x["task_index"])}
+    task_to_task_index = {task: task_index for task_index, task in tasks.items()}
+    return tasks, task_to_task_index
+
+
+def convert_tasks(root, new_root):
+    tasks, _ = legacy_load_tasks(root)
+    task_indices = tasks.keys()
+    task_strings = tasks.values()
+    df_tasks = pd.DataFrame({"task_index": task_indices}, index=task_strings)
+    write_tasks(df_tasks, new_root)
+
+
+def concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys):
+    # TODO(rcadene): to save RAM use Dataset.from_parquet(file) and concatenate_datasets
+    dataframes = [pd.read_parquet(file) for file in paths_to_cat]
+    # Concatenate all DataFrames along rows
+    concatenated_df = pd.concat(dataframes, ignore_index=True)
+
+    path = new_root / DEFAULT_DATA_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
+    path.parent.mkdir(parents=True, exist_ok=True)
+
+    if len(image_keys) > 0:
+        schema = pa.Schema.from_pandas(concatenated_df)
+        features = Features.from_arrow_schema(schema)
+        for key in image_keys:
+            features[key] = Image()
+        schema = features.arrow_schema
+    else:
+        schema = None
+
+    concatenated_df.to_parquet(path, index=False, schema=schema)
+
+
+def convert_data(root: Path, new_root: Path, data_file_size_in_mb: int):
+    data_dir = root / "data"
+    ep_paths = sorted(data_dir.glob("*/*.parquet"))
+
+    image_keys = get_image_keys(root)
+
+    ep_idx = 0
+    chunk_idx = 0
+    file_idx = 0
+    size_in_mb = 0
+    num_frames = 0
+    paths_to_cat = []
+    episodes_metadata = []
+    for ep_path in ep_paths:
+        ep_size_in_mb = get_parquet_file_size_in_mb(ep_path)
+        ep_num_frames = get_parquet_num_frames(ep_path)
+        ep_metadata = {
+            "episode_index": ep_idx,
+            "data/chunk_index": chunk_idx,
+            "data/file_index": file_idx,
+            "dataset_from_index": num_frames,
+            "dataset_to_index": num_frames + ep_num_frames,
+        }
+        size_in_mb += ep_size_in_mb
+        num_frames += ep_num_frames
+        episodes_metadata.append(ep_metadata)
+        ep_idx += 1
+
+        if size_in_mb < data_file_size_in_mb:
+            paths_to_cat.append(ep_path)
+            continue
+
+        if paths_to_cat:
+            concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys)
+
+        # Reset for the next file
+        size_in_mb = ep_size_in_mb
+        num_frames = ep_num_frames
+        paths_to_cat = [ep_path]
+
+        chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, DEFAULT_CHUNK_SIZE)
+
+    # Write remaining data if any
+    if paths_to_cat:
+        concat_data_files(paths_to_cat, new_root, chunk_idx, file_idx, image_keys)
+
+    return episodes_metadata
+
+
+def get_video_keys(root):
+    info = load_info(root)
+    features = info["features"]
+    video_keys = [key for key, ft in features.items() if ft["dtype"] == "video"]
+    return video_keys
+
+
+def get_image_keys(root):
+    info = load_info(root)
+    features = info["features"]
+    image_keys = [key for key, ft in features.items() if ft["dtype"] == "image"]
+    return image_keys
+
+
+def convert_videos(root: Path, new_root: Path, video_file_size_in_mb: int):
+    video_keys = get_video_keys(root)
+    if len(video_keys) == 0:
+        return None
+
+    video_keys = sorted(video_keys)
+
+    eps_metadata_per_cam = []
+    for camera in video_keys:
+        eps_metadata = convert_videos_of_camera(root, new_root, camera, video_file_size_in_mb)
+        eps_metadata_per_cam.append(eps_metadata)
+
+    num_eps_per_cam = [len(eps_cam_map) for eps_cam_map in eps_metadata_per_cam]
+    if len(set(num_eps_per_cam)) != 1:
+        raise ValueError(f"All cams dont have same number of episodes ({num_eps_per_cam}).")
+
+    episods_metadata = []
+    num_cameras = len(video_keys)
+    num_episodes = num_eps_per_cam[0]
+    for ep_idx in range(num_episodes):
+        # Sanity check
+        ep_ids = [eps_metadata_per_cam[cam_idx][ep_idx]["episode_index"] for cam_idx in range(num_cameras)]
+        ep_ids += [ep_idx]
+        if len(set(ep_ids)) != 1:
+            raise ValueError(f"All episode indices need to match ({ep_ids}).")
+
+        ep_dict = {}
+        for cam_idx in range(num_cameras):
+            ep_dict.update(eps_metadata_per_cam[cam_idx][ep_idx])
+        episods_metadata.append(ep_dict)
+
+    return episods_metadata
+
+
+def convert_videos_of_camera(root: Path, new_root: Path, video_key: str, video_file_size_in_mb: int):
+    # Access old paths to mp4
+    videos_dir = root / "videos"
+    ep_paths = sorted(videos_dir.glob(f"*/{video_key}/*.mp4"))
+
+    ep_idx = 0
+    chunk_idx = 0
+    file_idx = 0
+    size_in_mb = 0
+    duration_in_s = 0.0
+    paths_to_cat = []
+    episodes_metadata = []
+    for ep_path in tqdm.tqdm(ep_paths, desc=f"convert videos of {video_key}"):
+        ep_size_in_mb = get_video_size_in_mb(ep_path)
+        ep_duration_in_s = get_video_duration_in_s(ep_path)
+
+        # Check if adding this episode would exceed the limit
+        if size_in_mb + ep_size_in_mb >= video_file_size_in_mb and len(paths_to_cat) > 0:
+            # Size limit would be exceeded, save current accumulation WITHOUT this episode
+            concatenate_video_files(
+                paths_to_cat,
+                new_root
+                / DEFAULT_VIDEO_PATH.format(video_key=video_key, chunk_index=chunk_idx, file_index=file_idx),
+            )
+
+            # Update episodes metadata for the file we just saved
+            for i, _ in enumerate(paths_to_cat):
+                past_ep_idx = ep_idx - len(paths_to_cat) + i
+                episodes_metadata[past_ep_idx][f"videos/{video_key}/chunk_index"] = chunk_idx
+                episodes_metadata[past_ep_idx][f"videos/{video_key}/file_index"] = file_idx
+
+            # Move to next file and start fresh with current episode
+            chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, DEFAULT_CHUNK_SIZE)
+            size_in_mb = 0
+            duration_in_s = 0.0
+            paths_to_cat = []
+
+        # Add current episode metadata
+        ep_metadata = {
+            "episode_index": ep_idx,
+            f"videos/{video_key}/chunk_index": chunk_idx,  # Will be updated when file is saved
+            f"videos/{video_key}/file_index": file_idx,  # Will be updated when file is saved
+            f"videos/{video_key}/from_timestamp": duration_in_s,
+            f"videos/{video_key}/to_timestamp": duration_in_s + ep_duration_in_s,
+        }
+        episodes_metadata.append(ep_metadata)
+
+        # Add current episode to accumulation
+        paths_to_cat.append(ep_path)
+        size_in_mb += ep_size_in_mb
+        duration_in_s += ep_duration_in_s
+        ep_idx += 1
+
+    # Write remaining videos if any
+    if paths_to_cat:
+        concatenate_video_files(
+            paths_to_cat,
+            new_root
+            / DEFAULT_VIDEO_PATH.format(video_key=video_key, chunk_index=chunk_idx, file_index=file_idx),
+        )
+
+        # Update episodes metadata for the final file
+        for i, _ in enumerate(paths_to_cat):
+            past_ep_idx = ep_idx - len(paths_to_cat) + i
+            episodes_metadata[past_ep_idx][f"videos/{video_key}/chunk_index"] = chunk_idx
+            episodes_metadata[past_ep_idx][f"videos/{video_key}/file_index"] = file_idx
+
+    return episodes_metadata
+
+
+def generate_episode_metadata_dict(
+    episodes_legacy_metadata, episodes_metadata, episodes_stats, episodes_videos=None
+):
+    num_episodes = len(episodes_metadata)
+    episodes_legacy_metadata_vals = list(episodes_legacy_metadata.values())
+    episodes_stats_vals = list(episodes_stats.values())
+    episodes_stats_keys = list(episodes_stats.keys())
+
+    for i in range(num_episodes):
+        ep_legacy_metadata = episodes_legacy_metadata_vals[i]
+        ep_metadata = episodes_metadata[i]
+        ep_stats = episodes_stats_vals[i]
+
+        ep_ids_set = {
+            ep_legacy_metadata["episode_index"],
+            ep_metadata["episode_index"],
+            episodes_stats_keys[i],
+        }
+
+        if episodes_videos is None:
+            ep_video = {}
+        else:
+            ep_video = episodes_videos[i]
+            ep_ids_set.add(ep_video["episode_index"])
+
+        if len(ep_ids_set) != 1:
+            raise ValueError(f"Number of episodes is not the same ({ep_ids_set}).")
+
+        ep_dict = {**ep_metadata, **ep_video, **ep_legacy_metadata, **flatten_dict({"stats": ep_stats})}
+        ep_dict["meta/episodes/chunk_index"] = 0
+        ep_dict["meta/episodes/file_index"] = 0
+        yield ep_dict
+
+
+def convert_episodes_metadata(root, new_root, episodes_metadata, episodes_video_metadata=None):
+    episodes_legacy_metadata = legacy_load_episodes(root)
+    episodes_stats = legacy_load_episodes_stats(root)
+
+    num_eps_set = {len(episodes_legacy_metadata), len(episodes_metadata)}
+    if episodes_video_metadata is not None:
+        num_eps_set.add(len(episodes_video_metadata))
+
+    if len(num_eps_set) != 1:
+        raise ValueError(f"Number of episodes is not the same ({num_eps_set}).")
+
+    ds_episodes = Dataset.from_generator(
+        lambda: generate_episode_metadata_dict(
+            episodes_legacy_metadata, episodes_metadata, episodes_stats, episodes_video_metadata
+        )
+    )
+    write_episodes(ds_episodes, new_root)
+
+    stats = aggregate_stats(list(episodes_stats.values()))
+    write_stats(stats, new_root)
+
+
+def convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb):
+    info = load_info(root)
+    info["codebase_version"] = "v3.0"
+    del info["total_chunks"]
+    del info["total_videos"]
+    info["data_files_size_in_mb"] = data_file_size_in_mb
+    info["video_files_size_in_mb"] = video_file_size_in_mb
+    info["data_path"] = DEFAULT_DATA_PATH
+    info["video_path"] = DEFAULT_VIDEO_PATH
+    info["fps"] = float(info["fps"])
+    for key in info["features"]:
+        if info["features"][key]["dtype"] == "video":
+            # already has fps in video_info
+            continue
+        info["features"][key]["fps"] = info["fps"]
+    write_info(info, new_root)
+
+
+def convert_dataset(
+    repo_id: str,
+    branch: str | None = None,
+    data_file_size_in_mb: int | None = None,
+    video_file_size_in_mb: int | None = None,
+):
+    root = HF_LEROBOT_HOME / repo_id
+    old_root = HF_LEROBOT_HOME / f"{repo_id}_old"
+    new_root = HF_LEROBOT_HOME / f"{repo_id}_v30"
+
+    if data_file_size_in_mb is None:
+        data_file_size_in_mb = DEFAULT_DATA_FILE_SIZE_IN_MB
+    if video_file_size_in_mb is None:
+        video_file_size_in_mb = DEFAULT_VIDEO_FILE_SIZE_IN_MB
+
+    if old_root.is_dir() and root.is_dir():
+        shutil.rmtree(str(root))
+        shutil.move(str(old_root), str(root))
+
+    if new_root.is_dir():
+        shutil.rmtree(new_root)
+
+    snapshot_download(
+        repo_id,
+        repo_type="dataset",
+        revision=V21,
+        local_dir=root,
+    )
+
+    convert_info(root, new_root, data_file_size_in_mb, video_file_size_in_mb)
+    convert_tasks(root, new_root)
+    episodes_metadata = convert_data(root, new_root, data_file_size_in_mb)
+    episodes_videos_metadata = convert_videos(root, new_root, video_file_size_in_mb)
+    convert_episodes_metadata(root, new_root, episodes_metadata, episodes_videos_metadata)
+
+    shutil.move(str(root), str(old_root))
+    shutil.move(str(new_root), str(root))
+
+    hub_api = HfApi()
+    try:
+        hub_api.delete_tag(repo_id, tag=CODEBASE_VERSION, repo_type="dataset")
+    except HTTPError as e:
+        print(f"tag={CODEBASE_VERSION} probably doesn't exist. Skipping exception ({e})")
+        pass
+    hub_api.delete_files(
+        delete_patterns=["data/chunk*/episode_*", "meta/*.jsonl", "videos/chunk*"],
+        repo_id=repo_id,
+        revision=branch,
+        repo_type="dataset",
+    )
+    hub_api.create_tag(repo_id, tag=CODEBASE_VERSION, revision=branch, repo_type="dataset")
+
+    LeRobotDataset(repo_id).push_to_hub()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--repo-id",
+        type=str,
+        required=True,
+        help="Repository identifier on Hugging Face: a community or a user name `/` the name of the dataset "
+        "(e.g. `lerobot/pusht`, `cadene/aloha_sim_insertion_human`).",
+    )
+    parser.add_argument(
+        "--branch",
+        type=str,
+        default=None,
+        help="Repo branch to push your dataset. Defaults to the main branch.",
+    )
+    parser.add_argument(
+        "--data-file-size-in-mb",
+        type=int,
+        default=None,
+        help="File size in MB. Defaults to 100 for data and 500 for videos.",
+    )
+    parser.add_argument(
+        "--video-file-size-in-mb",
+        type=int,
+        default=None,
+        help="File size in MB. Defaults to 100 for data and 500 for videos.",
+    )
+
+    args = parser.parse_args()
+    convert_dataset(**vars(args))
--- a/src/lerobot/datasets/video_utils.py
+++ b/src/lerobot/datasets/video_utils.py
@@ -17,6 +17,7 @@ import glob
 import importlib
 import logging
 import shutil
+import tempfile
 import warnings
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -263,7 +264,11 @@ def encode_video_frames(
    video_path = Path(video_path)
    imgs_dir = Path(imgs_dir)

-    video_path.parent.mkdir(parents=True, exist_ok=overwrite)
+    if video_path.exists() and not overwrite:
+        logging.warning(f"Video file already exists: {video_path}. Skipping encoding.")
+        return
+
+    video_path.parent.mkdir(parents=True, exist_ok=True)

    # Encoders/pixel formats incompatibility check
    if (vcodec == "libsvtav1" or vcodec == "hevc") and pix_fmt == "yuv444p":
@@ -273,9 +278,9 @@ def encode_video_frames(
        pix_fmt = "yuv420p"

    # Get input frames
-    template = "frame_" + ("[0-9]" * 6) + ".png"
+    template = "frame-" + ("[0-9]" * 6) + ".png"
    input_list = sorted(
-        glob.glob(str(imgs_dir / template)), key=lambda x: int(x.split("_")[-1].split(".")[0])
+        glob.glob(str(imgs_dir / template)), key=lambda x: int(x.split("-")[-1].split(".")[0])
    )

    # Define video output frame size (assuming all input frames are the same size)
@@ -300,7 +305,7 @@ def encode_video_frames(

    # Set logging level
    if log_level is not None:
-        # "While less efficient, it is generally preferable to modify logging with Python’s logging"
+        # "While less efficient, it is generally preferable to modify logging with Python's logging"
        logging.getLogger("libav").setLevel(log_level)

    # Create and open output file (overwrite by default)
@@ -331,6 +336,89 @@ def encode_video_frames(
        raise OSError(f"Video encoding did not work. File not found: {video_path}.")


+def concatenate_video_files(
+    input_video_paths: list[Path | str], output_video_path: Path, overwrite: bool = True
+):
+    """
+    Concatenate multiple video files into a single video file using pyav.
+
+    This function takes a list of video input file paths and concatenates them into a single
+    output video file. It uses ffmpeg's concat demuxer with stream copy mode for fast
+    concatenation without re-encoding.
+
+    Args:
+        input_video_paths: Ordered list of input video file paths to concatenate.
+        output_video_path: Path to the output video file.
+        overwrite: Whether to overwrite the output video file if it already exists. Default is True.
+
+    Note:
+        - Creates a temporary directory for intermediate files that is cleaned up after use.
+        - Uses ffmpeg's concat demuxer which requires all input videos to have the same
+          codec, resolution, and frame rate for proper concatenation.
+    """
+
+    output_video_path = Path(output_video_path)
+
+    if output_video_path.exists() and not overwrite:
+        logging.warning(f"Video file already exists: {output_video_path}. Skipping concatenation.")
+        return
+
+    output_video_path.parent.mkdir(parents=True, exist_ok=True)
+
+    if len(input_video_paths) == 0:
+        raise FileNotFoundError("No input video paths provided.")
+
+    # Create a temporary .ffconcat file to list the input video paths
+    with tempfile.NamedTemporaryFile(mode="w", suffix=".ffconcat", delete=False) as tmp_concatenate_file:
+        tmp_concatenate_file.write("ffconcat version 1.0\n")
+        for input_path in input_video_paths:
+            tmp_concatenate_file.write(f"file '{str(input_path)}'\n")
+        tmp_concatenate_file.flush()
+        tmp_concatenate_path = tmp_concatenate_file.name
+
+    # Create input and output containers
+    input_container = av.open(
+        tmp_concatenate_path, mode="r", format="concat", options={"safe": "0"}
+    )  # safe = 0 allows absolute paths as well as relative paths
+
+    tmp_output_video_path = tempfile.NamedTemporaryFile(suffix=".mp4", delete=False).name
+    output_container = av.open(
+        tmp_output_video_path, mode="w", options={"movflags": "faststart"}
+    )  # faststart is to move the metadata to the beginning of the file to speed up loading
+
+    # Replicate input streams in output container
+    stream_map = {}
+    for input_stream in input_container.streams:
+        if input_stream.type in ("video", "audio", "subtitle"):  # only copy compatible streams
+            stream_map[input_stream.index] = output_container.add_stream_from_template(
+                template=input_stream, opaque=True
+            )
+            stream_map[
+                input_stream.index
+            ].time_base = (
+                input_stream.time_base
+            )  # set the time base to the input stream time base (missing in the codec context)
+
+    # Demux + remux packets (no re-encode)
+    for packet in input_container.demux():
+        # Skip packets from un-mapped streams
+        if packet.stream.index not in stream_map:
+            continue
+
+        # Skip demux flushing packets
+        if packet.dts is None:
+            continue
+
+        output_stream = stream_map[packet.stream.index]
+        packet.stream = output_stream
+        output_container.mux(packet)
+
+    input_container.close()
+    output_container.close()
+    shutil.move(tmp_output_video_path, output_video_path)
+    Path(tmp_concatenate_path).unlink()
+
+
@dataclass
 class VideoFrame:
    # TODO(rcadene, lhoestq): move to Hugging Face `datasets` repo
@@ -454,6 +542,28 @@ def get_image_pixel_channels(image: Image):
        raise ValueError("Unknown format")


+def get_video_duration_in_s(video_path: Path | str) -> float:
+    """
+    Get the duration of a video file in seconds using PyAV.
+
+    Args:
+        video_path: Path to the video file.
+
+    Returns:
+        Duration of the video in seconds.
+    """
+    with av.open(str(video_path)) as container:
+        # Get the first video stream
+        video_stream = container.streams.video[0]
+        # Calculate duration: stream.duration * stream.time_base gives duration in seconds
+        if video_stream.duration is not None:
+            duration = float(video_stream.duration * video_stream.time_base)
+        else:
+            # Fallback to container duration if stream duration is not available
+            duration = float(container.duration / av.time_base)
+    return duration
+
+
 class VideoEncodingManager:
    """
    Context manager that ensures proper video encoding and data cleanup even if exceptions occur.
@@ -487,7 +597,7 @@ class VideoEncodingManager:
                f"Encoding remaining {self.dataset.episodes_since_last_encoding} episodes, "
                f"from episode {start_ep} to {end_ep - 1}"
            )
-            self.dataset.batch_encode_videos(start_ep, end_ep)
+            self.dataset._batch_save_episode_video(start_ep, end_ep)

        # Clean up episode images if recording was interrupted
        if exc_type is not None:
--- a/src/lerobot/record.py
+++ b/src/lerobot/record.py
@@ -279,8 +279,8 @@ def record_loop(

        if dataset is not None:
            action_frame = build_dataset_frame(dataset.features, sent_action, prefix="action")
-            frame = {**observation_frame, **action_frame}
-            dataset.add_frame(frame, task=single_task)
+            frame = {**observation_frame, **action_frame, "task": single_task}
+            dataset.add_frame(frame)

        if display_data:
            log_rerun_data(observation, action)
--- a/src/lerobot/replay.py
+++ b/src/lerobot/replay.py
@@ -93,11 +93,15 @@ def replay(cfg: ReplayConfig):

    robot = make_robot_from_config(cfg.robot)
    dataset = LeRobotDataset(cfg.dataset.repo_id, root=cfg.dataset.root, episodes=[cfg.dataset.episode])
-    actions = dataset.hf_dataset.select_columns("action")
+
+    # Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
+    episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == cfg.dataset.episode)
+    actions = episode_frames.select_columns("action")
+
    robot.connect()

    log_say("Replaying episode", cfg.play_sounds, blocking=True)
-    for idx in range(dataset.num_frames):
+    for idx in range(len(episode_frames)):
        start_episode_t = time.perf_counter()

        action_array = actions[idx]["action"]
--- a/src/lerobot/robots/viperx/README.md
+++ b/src/lerobot/robots/viperx/README.md
@@ -115,11 +115,11 @@ If you uploaded your dataset to the hub with `--control.push_to_hub=true`, you c
 echo ${HF_USER}/aloha_test
 ```

-If you didn't upload with `--control.push_to_hub=false`, you can also visualize it locally with:
+If you didn't upload with `--control.push_to_hub=false`, you can also visualize it locally with [Rerun](https://github.com/rerun-io/rerun):

 ```bash
-python -m lerobot.scripts.visualize_dataset_html \
-  --repo-id ${HF_USER}/aloha_test
+python -m lerobot.scripts.visualize_dataset \
+  --repo-id ${HF_USER}/aloha_test --episode 0
 ```

 ## Replay an episode
--- a/src/lerobot/scripts/rl/crop_dataset_roi.py
+++ b/src/lerobot/scripts/rl/crop_dataset_roi.py
@@ -226,7 +226,8 @@ def convert_lerobot_dataset_to_cropper_lerobot_dataset(
                value = value.unsqueeze(0)
            new_frame[key] = value

-        new_dataset.add_frame(new_frame, task=task)
+        new_frame["task"] = task
+        new_dataset.add_frame(new_frame)

        if frame["episode_index"].item() != prev_episode_index:
            # Save the episode
--- a/src/lerobot/scripts/rl/gym_manipulator.py
+++ b/src/lerobot/scripts/rl/gym_manipulator.py
@@ -2129,7 +2129,8 @@ def record_dataset(env, policy, cfg):
            frame["complementary_info.discrete_penalty"] = torch.tensor(
                [info.get("discrete_penalty", 0.0)], dtype=torch.float32
            )
-            dataset.add_frame(frame, task=cfg.task)
+            frame["task"] = cfg.task
+            dataset.add_frame(frame)

            # Maintain consistent timing
            if cfg.fps:
--- a/src/lerobot/scripts/train.py
+++ b/src/lerobot/scripts/train.py
@@ -166,7 +166,8 @@ def train(cfg: TrainPipelineConfig):
    if hasattr(cfg.policy, "drop_n_last_frames"):
        shuffle = False
        sampler = EpisodeAwareSampler(
-            dataset.episode_data_index,
+            dataset.meta.episodes["dataset_from_index"],
+            dataset.meta.episodes["dataset_to_index"],
            drop_n_last_frames=cfg.policy.drop_n_last_frames,
            shuffle=True,
        )
--- a/src/lerobot/scripts/visualize_dataset.py
+++ b/src/lerobot/scripts/visualize_dataset.py
@@ -79,8 +79,8 @@ from lerobot.datasets.lerobot_dataset import LeRobotDataset

 class EpisodeSampler(torch.utils.data.Sampler):
    def __init__(self, dataset: LeRobotDataset, episode_index: int):
-        from_idx = dataset.episode_data_index["from"][episode_index].item()
-        to_idx = dataset.episode_data_index["to"][episode_index].item()
+        from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
+        to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
        self.frame_ids = range(from_idx, to_idx)

    def __iter__(self) -> Iterator:
@@ -283,7 +283,7 @@ def main():
    tolerance_s = kwargs.pop("tolerance_s")

    logging.info("Loading dataset")
-    dataset = LeRobotDataset(repo_id, root=root, tolerance_s=tolerance_s)
+    dataset = LeRobotDataset(repo_id, episodes=[args.episode_index], root=root, tolerance_s=tolerance_s)

    visualize_dataset(dataset, **vars(args))

--- a/src/lerobot/scripts/visualize_dataset_html.py
+++ b/src/lerobot/scripts/visualize_dataset_html.py
@@ -1,482 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-""" Visualize data of **all** frames of any episode of a dataset of type LeRobotDataset.
-
-Note: The last frame of the episode doesnt always correspond to a final state.
-That's because our datasets are composed of transition from state to state up to
-the antepenultimate state associated to the ultimate action to arrive in the final state.
-However, there might not be a transition from a final state to another state.
-
-Note: This script aims to visualize the data used to train the neural networks.
-~What you see is what you get~. When visualizing image modality, it is often expected to observe
-lossly compression artifacts since these images have been decoded from compressed mp4 videos to
-save disk space. The compression factor applied has been tuned to not affect success rate.
-
-Example of usage:
-
- Visualize data stored on a local machine:
-```bash
-local$ python -m lerobot.scripts.visualize_dataset_html \
-    --repo-id lerobot/pusht
-
-local$ open http://localhost:9090
-```
-
- Visualize data stored on a distant machine with a local viewer:
-```bash
-distant$ python -m lerobot.scripts.visualize_dataset_html \
-    --repo-id lerobot/pusht
-
-local$ ssh -L 9090:localhost:9090 distant  # create a ssh tunnel
-local$ open http://localhost:9090
-```
-
- Select episodes to visualize:
-```bash
-python -m lerobot.scripts.visualize_dataset_html \
-    --repo-id lerobot/pusht \
-    --episodes 7 3 5 1 4
-```
-"""
-
-import argparse
-import csv
-import json
-import logging
-import re
-import shutil
-import tempfile
-from io import StringIO
-from pathlib import Path
-
-import numpy as np
-import pandas as pd
-import requests
-from flask import Flask, redirect, render_template, request, url_for
-
-from lerobot import available_datasets
-from lerobot.datasets.lerobot_dataset import LeRobotDataset
-from lerobot.datasets.utils import IterableNamespace
-from lerobot.utils.utils import init_logging
-
-
-def run_server(
-    dataset: LeRobotDataset | IterableNamespace | None,
-    episodes: list[int] | None,
-    host: str,
-    port: str,
-    static_folder: Path,
-    template_folder: Path,
-):
-    app = Flask(__name__, static_folder=static_folder.resolve(), template_folder=template_folder.resolve())
-    app.config["SEND_FILE_MAX_AGE_DEFAULT"] = 0  # specifying not to cache
-
-    @app.route("/")
-    def hommepage(dataset=dataset):
-        if dataset:
-            dataset_namespace, dataset_name = dataset.repo_id.split("/")
-            return redirect(
-                url_for(
-                    "show_episode",
-                    dataset_namespace=dataset_namespace,
-                    dataset_name=dataset_name,
-                    episode_id=0,
-                )
-            )
-
-        dataset_param, episode_param = None, None
-        all_params = request.args
-        if "dataset" in all_params:
-            dataset_param = all_params["dataset"]
-        if "episode" in all_params:
-            episode_param = int(all_params["episode"])
-
-        if dataset_param:
-            dataset_namespace, dataset_name = dataset_param.split("/")
-            return redirect(
-                url_for(
-                    "show_episode",
-                    dataset_namespace=dataset_namespace,
-                    dataset_name=dataset_name,
-                    episode_id=episode_param if episode_param is not None else 0,
-                )
-            )
-
-        featured_datasets = [
-            "lerobot/aloha_static_cups_open",
-            "lerobot/columbia_cairlab_pusht_real",
-            "lerobot/taco_play",
-        ]
-        return render_template(
-            "visualize_dataset_homepage.html",
-            featured_datasets=featured_datasets,
-            lerobot_datasets=available_datasets,
-        )
-
-    @app.route("/<string:dataset_namespace>/<string:dataset_name>")
-    def show_first_episode(dataset_namespace, dataset_name):
-        first_episode_id = 0
-        return redirect(
-            url_for(
-                "show_episode",
-                dataset_namespace=dataset_namespace,
-                dataset_name=dataset_name,
-                episode_id=first_episode_id,
-            )
-        )
-
-    @app.route("/<string:dataset_namespace>/<string:dataset_name>/episode_<int:episode_id>")
-    def show_episode(dataset_namespace, dataset_name, episode_id, dataset=dataset, episodes=episodes):
-        repo_id = f"{dataset_namespace}/{dataset_name}"
-        try:
-            if dataset is None:
-                dataset = get_dataset_info(repo_id)
-        except FileNotFoundError:
-            return (
-                "Make sure to convert your LeRobotDataset to v2 & above. See how to convert your dataset at https://github.com/huggingface/lerobot/pull/461",
-                400,
-            )
-        dataset_version = (
-            str(dataset.meta._version) if isinstance(dataset, LeRobotDataset) else dataset.codebase_version
-        )
-        match = re.search(r"v(\d+)\.", dataset_version)
-        if match:
-            major_version = int(match.group(1))
-            if major_version < 2:
-                return "Make sure to convert your LeRobotDataset to v2 & above."
-
-        episode_data_csv_str, columns, ignored_columns = get_episode_data(dataset, episode_id)
-        dataset_info = {
-            "repo_id": f"{dataset_namespace}/{dataset_name}",
-            "num_samples": dataset.num_frames
-            if isinstance(dataset, LeRobotDataset)
-            else dataset.total_frames,
-            "num_episodes": dataset.num_episodes
-            if isinstance(dataset, LeRobotDataset)
-            else dataset.total_episodes,
-            "fps": dataset.fps,
-        }
-        if isinstance(dataset, LeRobotDataset):
-            video_paths = [
-                dataset.meta.get_video_file_path(episode_id, key) for key in dataset.meta.video_keys
-            ]
-            videos_info = [
-                {
-                    "url": url_for("static", filename=str(video_path).replace("\\", "/")),
-                    "filename": video_path.parent.name,
-                }
-                for video_path in video_paths
-            ]
-            tasks = dataset.meta.episodes[episode_id]["tasks"]
-        else:
-            video_keys = [key for key, ft in dataset.features.items() if ft["dtype"] == "video"]
-            videos_info = [
-                {
-                    "url": f"https://huggingface.co/datasets/{repo_id}/resolve/main/"
-                    + dataset.video_path.format(
-                        episode_chunk=int(episode_id) // dataset.chunks_size,
-                        video_key=video_key,
-                        episode_index=episode_id,
-                    ),
-                    "filename": video_key,
-                }
-                for video_key in video_keys
-            ]
-
-            response = requests.get(
-                f"https://huggingface.co/datasets/{repo_id}/resolve/main/meta/episodes.jsonl", timeout=5
-            )
-            response.raise_for_status()
-            # Split into lines and parse each line as JSON
-            tasks_jsonl = [json.loads(line) for line in response.text.splitlines() if line.strip()]
-
-            filtered_tasks_jsonl = [row for row in tasks_jsonl if row["episode_index"] == episode_id]
-            tasks = filtered_tasks_jsonl[0]["tasks"]
-
-        videos_info[0]["language_instruction"] = tasks
-
-        if episodes is None:
-            episodes = list(
-                range(dataset.num_episodes if isinstance(dataset, LeRobotDataset) else dataset.total_episodes)
-            )
-
-        return render_template(
-            "visualize_dataset_template.html",
-            episode_id=episode_id,
-            episodes=episodes,
-            dataset_info=dataset_info,
-            videos_info=videos_info,
-            episode_data_csv_str=episode_data_csv_str,
-            columns=columns,
-            ignored_columns=ignored_columns,
-        )
-
-    app.run(host=host, port=port)
-
-
-def get_ep_csv_fname(episode_id: int):
-    ep_csv_fname = f"episode_{episode_id}.csv"
-    return ep_csv_fname
-
-
-def get_episode_data(dataset: LeRobotDataset | IterableNamespace, episode_index):
-    """Get a csv str containing timeseries data of an episode (e.g. state and action).
-    This file will be loaded by Dygraph javascript to plot data in real time."""
-    columns = []
-
-    selected_columns = [col for col, ft in dataset.features.items() if ft["dtype"] in ["float32", "int32"]]
-    selected_columns.remove("timestamp")
-
-    ignored_columns = []
-    for column_name in selected_columns:
-        shape = dataset.features[column_name]["shape"]
-        shape_dim = len(shape)
-        if shape_dim > 1:
-            selected_columns.remove(column_name)
-            ignored_columns.append(column_name)
-
-    # init header of csv with state and action names
-    header = ["timestamp"]
-
-    for column_name in selected_columns:
-        dim_state = (
-            dataset.meta.shapes[column_name][0]
-            if isinstance(dataset, LeRobotDataset)
-            else dataset.features[column_name].shape[0]
-        )
-
-        if "names" in dataset.features[column_name] and dataset.features[column_name]["names"]:
-            column_names = dataset.features[column_name]["names"]
-            while not isinstance(column_names, list):
-                column_names = list(column_names.values())[0]
-        else:
-            column_names = [f"{column_name}_{i}" for i in range(dim_state)]
-        columns.append({"key": column_name, "value": column_names})
-
-        header += column_names
-
-    selected_columns.insert(0, "timestamp")
-
-    if isinstance(dataset, LeRobotDataset):
-        from_idx = dataset.episode_data_index["from"][episode_index]
-        to_idx = dataset.episode_data_index["to"][episode_index]
-        data = (
-            dataset.hf_dataset.select(range(from_idx, to_idx))
-            .select_columns(selected_columns)
-            .with_format("pandas")
-        )
-    else:
-        repo_id = dataset.repo_id
-
-        url = f"https://huggingface.co/datasets/{repo_id}/resolve/main/" + dataset.data_path.format(
-            episode_chunk=int(episode_index) // dataset.chunks_size, episode_index=episode_index
-        )
-        df = pd.read_parquet(url)
-        data = df[selected_columns]  # Select specific columns
-
-    rows = np.hstack(
-        (
-            np.expand_dims(data["timestamp"], axis=1),
-            *[np.vstack(data[col]) for col in selected_columns[1:]],
-        )
-    ).tolist()
-
-    # Convert data to CSV string
-    csv_buffer = StringIO()
-    csv_writer = csv.writer(csv_buffer)
-    # Write header
-    csv_writer.writerow(header)
-    # Write data rows
-    csv_writer.writerows(rows)
-    csv_string = csv_buffer.getvalue()
-
-    return csv_string, columns, ignored_columns
-
-
-def get_episode_video_paths(dataset: LeRobotDataset, ep_index: int) -> list[str]:
-    # get first frame of episode (hack to get video_path of the episode)
-    first_frame_idx = dataset.episode_data_index["from"][ep_index].item()
-    return [
-        dataset.hf_dataset.select_columns(key)[first_frame_idx][key]["path"]
-        for key in dataset.meta.video_keys
-    ]
-
-
-def get_episode_language_instruction(dataset: LeRobotDataset, ep_index: int) -> list[str]:
-    # check if the dataset has language instructions
-    if "language_instruction" not in dataset.features:
-        return None
-
-    # get first frame index
-    first_frame_idx = dataset.episode_data_index["from"][ep_index].item()
-
-    language_instruction = dataset.hf_dataset[first_frame_idx]["language_instruction"]
-    # TODO (michel-aractingi) hack to get the sentence, some strings in openx are badly stored
-    # with the tf.tensor appearing in the string
-    return language_instruction.removeprefix("tf.Tensor(b'").removesuffix("', shape=(), dtype=string)")
-
-
-def get_dataset_info(repo_id: str) -> IterableNamespace:
-    response = requests.get(
-        f"https://huggingface.co/datasets/{repo_id}/resolve/main/meta/info.json", timeout=5
-    )
-    response.raise_for_status()  # Raises an HTTPError for bad responses
-    dataset_info = response.json()
-    dataset_info["repo_id"] = repo_id
-    return IterableNamespace(dataset_info)
-
-
-def visualize_dataset_html(
-    dataset: LeRobotDataset | None,
-    episodes: list[int] | None = None,
-    output_dir: Path | None = None,
-    serve: bool = True,
-    host: str = "127.0.0.1",
-    port: int = 9090,
-    force_override: bool = False,
-) -> Path | None:
-    init_logging()
-
-    template_dir = Path(__file__).resolve().parent.parent / "templates"
-
-    if output_dir is None:
-        # Create a temporary directory that will be automatically cleaned up
-        output_dir = tempfile.mkdtemp(prefix="lerobot_visualize_dataset_")
-
-    output_dir = Path(output_dir)
-    if output_dir.exists():
-        if force_override:
-            shutil.rmtree(output_dir)
-        else:
-            logging.info(f"Output directory already exists. Loading from it: '{output_dir}'")
-
-    output_dir.mkdir(parents=True, exist_ok=True)
-
-    static_dir = output_dir / "static"
-    static_dir.mkdir(parents=True, exist_ok=True)
-
-    if dataset is None:
-        if serve:
-            run_server(
-                dataset=None,
-                episodes=None,
-                host=host,
-                port=port,
-                static_folder=static_dir,
-                template_folder=template_dir,
-            )
-    else:
-        # Create a simlink from the dataset video folder containing mp4 files to the output directory
-        # so that the http server can get access to the mp4 files.
-        if isinstance(dataset, LeRobotDataset):
-            ln_videos_dir = static_dir / "videos"
-            if not ln_videos_dir.exists():
-                ln_videos_dir.symlink_to((dataset.root / "videos").resolve().as_posix())
-
-        if serve:
-            run_server(dataset, episodes, host, port, static_dir, template_dir)
-
-
-def main():
-    parser = argparse.ArgumentParser()
-
-    parser.add_argument(
-        "--repo-id",
-        type=str,
-        default=None,
-        help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht` for https://huggingface.co/datasets/lerobot/pusht).",
-    )
-    parser.add_argument(
-        "--root",
-        type=Path,
-        default=None,
-        help="Root directory for a dataset stored locally (e.g. `--root data`). By default, the dataset will be loaded from hugging face cache folder, or downloaded from the hub if available.",
-    )
-    parser.add_argument(
-        "--load-from-hf-hub",
-        type=int,
-        default=0,
-        help="Load videos and parquet files from HF Hub rather than local system.",
-    )
-    parser.add_argument(
-        "--episodes",
-        type=int,
-        nargs="*",
-        default=None,
-        help="Episode indices to visualize (e.g. `0 1 5 6` to load episodes of index 0, 1, 5 and 6). By default loads all episodes.",
-    )
-    parser.add_argument(
-        "--output-dir",
-        type=Path,
-        default=None,
-        help="Directory path to write html files and kickoff a web server. By default write them to 'outputs/visualize_dataset/REPO_ID'.",
-    )
-    parser.add_argument(
-        "--serve",
-        type=int,
-        default=1,
-        help="Launch web server.",
-    )
-    parser.add_argument(
-        "--host",
-        type=str,
-        default="127.0.0.1",
-        help="Web host used by the http server.",
-    )
-    parser.add_argument(
-        "--port",
-        type=int,
-        default=9090,
-        help="Web port used by the http server.",
-    )
-    parser.add_argument(
-        "--force-override",
-        type=int,
-        default=0,
-        help="Delete the output directory if it exists already.",
-    )
-
-    parser.add_argument(
-        "--tolerance-s",
-        type=float,
-        default=1e-4,
-        help=(
-            "Tolerance in seconds used to ensure data timestamps respect the dataset fps value"
-            "This is argument passed to the constructor of LeRobotDataset and maps to its tolerance_s constructor argument"
-            "If not given, defaults to 1e-4."
-        ),
-    )
-
-    args = parser.parse_args()
-    kwargs = vars(args)
-    repo_id = kwargs.pop("repo_id")
-    load_from_hf_hub = kwargs.pop("load_from_hf_hub")
-    root = kwargs.pop("root")
-    tolerance_s = kwargs.pop("tolerance_s")
-
-    dataset = None
-    if repo_id:
-        dataset = (
-            LeRobotDataset(repo_id, root=root, tolerance_s=tolerance_s)
-            if not load_from_hf_hub
-            else get_dataset_info(repo_id)
-        )
-
-    visualize_dataset_html(dataset, **vars(args))
-
-
-if __name__ == "__main__":
-    main()
--- a/src/lerobot/templates/visualize_dataset_homepage.html
+++ b/src/lerobot/templates/visualize_dataset_homepage.html
@@ -1,68 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>Interactive Video Background Page</title>
-    <script src="https://cdn.tailwindcss.com"></script>
-    <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
-</head>
-<body class="h-screen overflow-hidden font-mono text-white" x-data="{
-    inputValue: '',
-    navigateToDataset() {
-        const trimmedValue = this.inputValue.trim();
-        if (trimmedValue) {
-            window.location.href = `/${trimmedValue}`;
-        }
-    }
-}">
-    <div class="fixed inset-0 w-full h-full overflow-hidden">
-        <video class="absolute min-w-full min-h-full w-auto h-auto top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2" autoplay muted loop>
-            <source src="https://huggingface.co/datasets/cadene/koch_bimanual_folding/resolve/v1.6/videos/observation.images.phone_episode_000037.mp4" type="video/mp4">
-            Your browser does not support HTML5 video.
-        </video>
-    </div>
-    <div class="fixed inset-0 bg-black bg-opacity-80"></div>
-    <div class="relative z-10 flex flex-col items-center justify-center h-screen">
-        <div class="text-center mb-8">
-            <h1 class="text-4xl font-bold mb-4">LeRobot Dataset Visualizer</h1>
-
-            <a href="https://x.com/RemiCadene/status/1825455895561859185" target="_blank" rel="noopener noreferrer" class="underline">create & train your own robots</a>
-
-            <p class="text-xl mb-4"></p>
-            <div class="text-left inline-block">
-                <h3 class="font-semibold mb-2 mt-4">Example Datasets:</h3>
-                <ul class="list-disc list-inside">
-                    {% for dataset in featured_datasets %}
-                        <li><a href="/{{ dataset }}" class="text-blue-300 hover:text-blue-100 hover:underline">{{ dataset }}</a></li>
-                    {% endfor %}
-                </ul>
-            </div>
-        </div>
-        <div class="flex w-full max-w-lg px-4 mb-4">
-            <input
-                type="text"
-                x-model="inputValue"
-                @keyup.enter="navigateToDataset"
-                placeholder="enter dataset id (ex: lerobot/droid_100)"
-                class="flex-grow px-4 py-2 rounded-l bg-white bg-opacity-20 text-white placeholder-gray-300 focus:outline-none focus:ring-2 focus:ring-blue-300"
-            >
-            <button
-                @click="navigateToDataset"
-                class="px-4 py-2 bg-blue-500 text-white rounded-r hover:bg-blue-600 focus:outline-none focus:ring-2 focus:ring-blue-300"
-            >
-                Go
-            </button>
-        </div>
-
-        <details class="mt-4 max-w-full px-4">
-            <summary>More example datasets</summary>
-            <ul class="list-disc list-inside max-h-28 overflow-y-auto break-all">
-                {% for dataset in lerobot_datasets %}
-                    <li><a href="/{{ dataset }}" class="text-blue-300 hover:text-blue-100 hover:underline">{{ dataset }}</a></li>
-                {% endfor %}
-            </ul>
-        </details>
-    </div>
-</body>
-</html>
--- a/src/lerobot/templates/visualize_dataset_template.html
+++ b/src/lerobot/templates/visualize_dataset_template.html
@@ -1,546 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <!-- # TODO(rcadene, mishig25): store the js files locally -->
-    <script src="https://cdnjs.cloudflare.com/ajax/libs/alpinejs/3.13.5/cdn.min.js" defer></script>
-    <script src="https://cdn.jsdelivr.net/npm/dygraphs@2.2.1/dist/dygraph.min.js" type="text/javascript"></script>
-    <script src="https://cdn.tailwindcss.com"></script>
-    <title>{{ dataset_info.repo_id }} episode {{ episode_id }}</title>
-</head>
-
-<!-- Use [Alpin.js](https://alpinejs.dev), a lightweight and easy to learn JS framework -->
-<!-- Use [tailwindcss](https://tailwindcss.com/), CSS classes for styling html -->
-<!-- Use [dygraphs](https://dygraphs.com/), a lightweight JS charting library -->
-<body class="flex flex-col md:flex-row h-screen max-h-screen bg-slate-950 text-gray-200" x-data="createAlpineData()">
-    <!-- Sidebar -->
-    <div x-ref="sidebar" class="bg-slate-900 p-5 break-words overflow-y-auto shrink-0 md:shrink md:w-60 md:max-h-screen">
-        <a href="https://github.com/huggingface/lerobot" target="_blank" class="hidden md:block">
-            <img src="https://github.com/huggingface/lerobot/raw/main/media/lerobot-logo-thumbnail.png">
-        </a>
-        <a href="https://huggingface.co/datasets/{{ dataset_info.repo_id }}" target="_blank">
-            <h1 class="mb-4 text-xl font-semibold">{{ dataset_info.repo_id }}</h1>
-        </a>
-
-        <ul>
-            <li>
-                Number of samples/frames: {{ dataset_info.num_samples }}
-            </li>
-            <li>
-                Number of episodes: {{ dataset_info.num_episodes }}
-            </li>
-            <li>
-                Frames per second: {{ dataset_info.fps }}
-            </li>
-        </ul>
-
-        <p>Episodes:</p>
-        <!-- episodes menu for medium & large screens -->
-        <div class="ml-2 hidden md:block" x-data="episodePagination">
-            <ul>
-                <template x-for="episode in paginatedEpisodes" :key="episode">
-                    <li class="font-mono text-sm mt-0.5">
-                        <a :href="'episode_' + episode"
-                           :class="{'underline': true, 'font-bold -ml-1': episode == {{ episode_id }}}"
-                           x-text="'Episode ' + episode"></a>
-                    </li>
-                </template>
-            </ul>
-
-            <div class="flex items-center mt-3 text-xs" x-show="totalPages > 1">
-                <button @click="prevPage()"
-                        class="px-2 py-1 bg-slate-800 rounded mr-2"
-                        :class="{'opacity-50 cursor-not-allowed': page === 1}"
-                        :disabled="page === 1">
-                    &laquo; Prev
-                </button>
-                <span class="font-mono mr-2" x-text="` ${page} / ${totalPages}`"></span>
-                <button @click="nextPage()"
-                        class="px-2 py-1 bg-slate-800 rounded"
-                        :class="{'opacity-50 cursor-not-allowed': page === totalPages}"
-                        :disabled="page === totalPages">
-                    Next &raquo;
-                </button>
-            </div>
-        </div>
-
-        <!-- episodes menu for small screens -->
-        <div class="flex overflow-x-auto md:hidden" x-data="episodePagination">
-            <button @click="prevPage()"
-                    class="px-2 bg-slate-800 rounded mr-2"
-                    :class="{'opacity-50 cursor-not-allowed': page === 1}"
-                    :disabled="page === 1">&laquo;</button>
-            <div class="flex">
-                <template x-for="(episode, index) in paginatedEpisodes" :key="episode">
-                    <p class="font-mono text-sm mt-0.5 px-2"
-                       :class="{
-                           'font-bold': episode == {{ episode_id }},
-                           'border-r': index !== paginatedEpisodes.length - 1
-                       }">
-                        <a :href="'episode_' + episode" x-text="episode"></a>
-                    </p>
-                </template>
-            </div>
-            <button @click="nextPage()"
-                    class="px-2 bg-slate-800 rounded ml-2"
-                    :class="{'opacity-50 cursor-not-allowed': page === totalPages}"
-                    :disabled="page === totalPages">&raquo; </button>
-        </div>
-
-    </div>
-
-    <!-- Toggle sidebar button -->
-    <button class="flex items-center opacity-50 hover:opacity-100 mx-1 hidden md:block"
-        @click="() => ($refs.sidebar.classList.toggle('hidden'))" title="Toggle sidebar">
-        <div class="bg-slate-500 w-2 h-10 rounded-full"></div>
-    </button>
-
-    <!-- Content -->
-    <div class="max-h-screen flex flex-col gap-4 overflow-y-auto md:flex-1">
-        <h1 class="text-xl font-bold mt-4 font-mono">
-            Episode {{ episode_id }}
-        </h1>
-
-        <!-- Error message -->
-        <div class="font-medium text-orange-700 hidden" :class="{ 'hidden': !videoCodecError }">
-            <p>Videos could NOT play because <a href="https://en.wikipedia.org/wiki/AV1" target="_blank" class="underline">AV1</a> decoding is not available on your browser.</p>
-            <ul class="list-decimal list-inside">
-                <li>If iPhone: <span class="italic">It is supported with A17 chip or higher.</span></li>
-                <li>If Mac with Safari: <span class="italic">It is supported on most browsers except Safari with M1 chip or higher and on Safari with M3 chip or higher.</span></li>
-                <li>Other: <span class="italic">Contact the maintainers on LeRobot discord channel:</span> <a href="https://discord.com/invite/s3KuuzsPFb" target="_blank" class="underline">https://discord.com/invite/s3KuuzsPFb</a></li>
-            </ul>
-        </div>
-
-        <!-- Videos -->
-        <div  class="max-w-32 relative text-sm mb-4 select-none"
-            @click.outside="isVideosDropdownOpen = false">
-            <div
-                @click="isVideosDropdownOpen = !isVideosDropdownOpen"
-                class="p-2 border border-slate-500 rounded flex justify-between items-center cursor-pointer"
-            >
-            <span class="truncate">filter videos</span>
-            <div class="transition-transform" :class="{ 'rotate-180': isVideosDropdownOpen }">🔽</div>
-            </div>
-
-            <div x-show="isVideosDropdownOpen"
-                class="absolute mt-1 border border-slate-500 rounded shadow-lg z-10">
-            <div>
-                <template x-for="option in videosKeys" :key="option">
-                <div
-                    @click="videosKeysSelected = videosKeysSelected.includes(option) ? videosKeysSelected.filter(v => v !== option) : [...videosKeysSelected, option]"
-                    class="p-2 cursor-pointer bg-slate-900"
-                    :class="{ 'bg-slate-700': videosKeysSelected.includes(option) }"
-                    x-text="option"
-                ></div>
-                </template>
-            </div>
-            </div>
-        </div>
-
-        <div class="flex flex-wrap gap-x-2 gap-y-6">
-            {% for video_info in videos_info %}
-            <div x-show="!videoCodecError && videosKeysSelected.includes('{{ video_info.filename }}')" class="max-w-96 relative">
-                <p class="absolute inset-x-0 -top-4 text-sm text-gray-300 bg-gray-800 px-2 rounded-t-xl truncate">{{ video_info.filename }}</p>
-                <video muted loop type="video/mp4" class="object-contain w-full h-full" @canplaythrough="videoCanPlay" @timeupdate="() => {
-                    if (video.duration) {
-                      const time = video.currentTime;
-                      const pc = (100 / video.duration) * time;
-                      $refs.slider.value = pc;
-                      dygraphTime = time;
-                      dygraphIndex = Math.floor(pc * dygraph.numRows() / 100);
-                      dygraph.setSelection(dygraphIndex, undefined, true, true);
-
-                      $refs.timer.textContent = formatTime(time) + ' / ' + formatTime(video.duration);
-
-                      updateTimeQuery(time.toFixed(2));
-                    }
-                }" @ended="() => {
-                    $refs.btnPlay.classList.remove('hidden');
-                    $refs.btnPause.classList.add('hidden');
-                }"
-                    @loadedmetadata="() => ($refs.timer.textContent = formatTime(0) + ' / ' + formatTime(video.duration))">
-                    <source src="{{ video_info.url }}">
-                    Your browser does not support the video tag.
-                </video>
-            </div>
-            {% endfor %}
-        </div>
-
-        <!-- Language instruction -->
-        {% if videos_info[0].language_instruction %}
-        <p class="font-medium mt-2">
-            Language Instruction: <span class="italic">{{ videos_info[0].language_instruction }}</span>
-        </p>
-        {% endif %}
-
-        <!-- Shortcuts info -->
-        <div class="text-sm hidden md:block">
-            Hotkeys: <span class="font-mono">Space</span> to pause/unpause, <span class="font-mono">Arrow Down</span> to go to next episode, <span class="font-mono">Arrow Up</span> to go to previous episode.
-        </div>
-
-        <!-- Controllers -->
-        <div class="flex gap-1 text-3xl items-center">
-            <button x-ref="btnPlay" class="-rotate-90" class="-rotate-90" title="Play. Toggle with Space" @click="() => {
-                videos.forEach(video => video.play());
-                $refs.btnPlay.classList.toggle('hidden');
-                $refs.btnPause.classList.toggle('hidden');
-            }">🔽</button>
-            <button x-ref="btnPause" class="hidden" title="Pause. Toggle with Space" @click="() => {
-                videos.forEach(video => video.pause());
-                $refs.btnPlay.classList.toggle('hidden');
-                $refs.btnPause.classList.toggle('hidden');
-            }">⏸️</button>
-            <button title="Jump backward 5 seconds"
-                @click="() => (videos.forEach(video => (video.currentTime -= 5)))">⏪</button>
-            <button title="Jump forward 5 seconds"
-                @click="() => (videos.forEach(video => (video.currentTime += 5)))">⏩</button>
-            <button title="Rewind from start"
-                @click="() => (videos.forEach(video => (video.currentTime = 0.0)))">↩️</button>
-            <input x-ref="slider" max="100" min="0" step="1" type="range" value="0" class="w-80 mx-2" @input="() => {
-                const sliderValue = $refs.slider.value;
-                videos.forEach(video => {
-                    const time = (video.duration * sliderValue) / 100;
-                    video.currentTime = time;
-                });
-            }" />
-            <div x-ref="timer" class="font-mono text-sm border border-slate-500 rounded-lg px-1 py-0.5 shrink-0">0:00 /
-                0:00
-            </div>
-        </div>
-
-        <!-- Graph -->
-        <div class="flex gap-2 mb-4 flex-wrap">
-            <div>
-                <div id="graph" @mouseleave="() => {
-                    dygraph.setSelection(dygraphIndex, undefined, true, true);
-                    dygraphTime = video.currentTime;
-                }">
-                </div>
-                <p x-ref="graphTimer" class="font-mono ml-14 mt-4"
-                    x-init="$watch('dygraphTime', value => ($refs.graphTimer.innerText = `Time: ${dygraphTime.toFixed(2)}s`))">
-                    Time: 0.00s
-                </p>
-            </div>
-
-            <div>
-                <table class="text-sm border-collapse border border-slate-700" x-show="currentFrameData">
-                    <thead>
-                        <tr>
-                            <th></th>
-                            <template x-for="(_, colIndex) in Array.from({length: columns.length}, (_, index) => index)">
-                                <th class="border border-slate-700">
-                                    <div class="flex gap-x-2 justify-between px-2">
-                                        <input type="checkbox" :checked="isColumnChecked(colIndex)"
-                                            @change="toggleColumn(colIndex)">
-                                        <p x-text="`${columns[colIndex].key}`"></p>
-                                    </div>
-                                </th>
-                            </template>
-                        </tr>
-                    </thead>
-                    <tbody>
-                        <template x-for="(row, rowIndex) in rows">
-                            <tr class="odd:bg-gray-800 even:bg-gray-900">
-                                <td class="border border-slate-700">
-                                    <div class="flex gap-x-2 max-w-64 font-semibold px-1 break-all">
-                                        <input type="checkbox" :checked="isRowChecked(rowIndex)"
-                                            @change="toggleRow(rowIndex)">
-                                    </div>
-                                </td>
-                                <template x-for="(cell, colIndex) in row">
-                                    <td x-show="cell" class="border border-slate-700">
-                                        <div class="flex gap-x-2 justify-between px-2" :class="{ 'hidden': cell.isNull }">
-                                            <div class="flex gap-x-2">
-                                                <input type="checkbox" x-model="cell.checked" @change="updateTableValues()">
-                                                <span x-text="`${!cell.isNull ? cell.label : null}`"></span>
-                                            </div>
-                                            <span class="w-14 text-right" x-text="`${!cell.isNull ? (typeof cell.value === 'number' ? cell.value.toFixed(2) : cell.value) : null}`"
-                                                :style="`color: ${cell.color}`"></span>
-                                        </div>
-                                    </td>
-                                </template>
-                            </tr>
-                        </template>
-                    </tbody>
-                </table>
-
-                <div id="labels" class="hidden">
-                </div>
-
-                {% if ignored_columns|length > 0 %}
-                <div class="m-2 text-orange-700 max-w-96">
-                    Columns {{ ignored_columns }} are NOT shown since the visualizer currently does not support 2D or 3D data.
-                </div>
-                {% endif %}
-            </div>
-
-        </div>
-    </div>
-
-    <script>
-        const parentOrigin = "https://huggingface.co";
-        const searchParams = new URLSearchParams();
-        searchParams.set("dataset", "{{ dataset_info.repo_id }}");
-        searchParams.set("episode", "{{ episode_id }}");
-		window.parent.postMessage({ queryString: searchParams.toString() }, parentOrigin);
-    </script>
-
-    <script>
-        function createAlpineData() {
-            return {
-                // state
-                dygraph: null,
-                currentFrameData: null,
-                checked: [],
-                dygraphTime: 0.0,
-                dygraphIndex: 0,
-                videos: null,
-                video: null,
-                colors: null,
-                nVideos: {{ videos_info | length }},
-                nVideoReadyToPlay: 0,
-                videoCodecError: false,
-                isVideosDropdownOpen: false,
-                videosKeys: {{ videos_info | map(attribute='filename') | list | tojson }},
-                videosKeysSelected: [],
-                columns: {{ columns | tojson }},
-
-                // alpine initialization
-                init() {
-                    // check if videos can play
-                    const dummyVideo = document.createElement('video');
-                    const canPlayVideos = dummyVideo.canPlayType('video/mp4; codecs="av01.0.05M.08"'); // codec source: https://huggingface.co/blog/video-encoding#results
-                    if(!canPlayVideos){
-                        this.videoCodecError = true;
-                    }
-                    this.videosKeysSelected = this.videosKeys.map(opt => opt)
-
-                    // process CSV data
-                    const csvDataStr = {{ episode_data_csv_str|tojson|safe }};
-                    // Create a Blob with the CSV data
-                    const blob = new Blob([csvDataStr], { type: 'text/csv;charset=utf-8;' });
-                    // Create a URL for the Blob
-                    const csvUrl = URL.createObjectURL(blob);
-
-                    // process CSV data
-                    this.videos = document.querySelectorAll('video');
-                    this.video = this.videos[0];
-                    this.dygraph = new Dygraph(document.getElementById("graph"), csvUrl, {
-                        pixelsPerPoint: 0.01,
-                        legend: 'always',
-                        labelsDiv: document.getElementById('labels'),
-                        labelsKMB: true,
-                        strokeWidth: 1.5,
-                        pointClickCallback: (event, point) => {
-                            this.dygraphTime = point.xval;
-                            this.updateTableValues(this.dygraphTime);
-                        },
-                        highlightCallback: (event, x, points, row, seriesName) => {
-                            this.dygraphTime = x;
-                            this.updateTableValues(this.dygraphTime);
-                        },
-                        drawCallback: (dygraph, is_initial) => {
-                            if (is_initial) {
-                                // dygraph initialization
-                                this.dygraph.setSelection(this.dygraphIndex, undefined, true, true);
-                                this.colors = this.dygraph.getColors();
-                                this.checked = Array(this.colors.length).fill(true);
-
-                                const colors = [];
-                                let lightness = 30; // const LIGHTNESS = [30, 65, 85]; // state_lightness, action_lightness, pred_action_lightness
-                                for(const column of this.columns){
-                                    const nValues = column.value.length;
-                                    for (let hue = 0; hue < 360; hue += parseInt(360/nValues)) {
-                                        const color = `hsl(${hue}, 100%, ${lightness}%)`;
-                                        colors.push(color);
-                                    }
-                                    lightness += 35;
-                                }
-
-                                this.dygraph.updateOptions({ colors });
-                                this.colors = colors;
-
-                                this.updateTableValues();
-
-                                let url = new URL(window.location.href);
-                                let params = new URLSearchParams(url.search);
-                                let time = params.get("t");
-                                if(time){
-                                    time = parseFloat(time);
-                                    this.videos.forEach(video => (video.currentTime = time));
-                                }
-                            }
-                        },
-                    });
-                },
-
-                //#region Table Data
-
-                // turn dygraph's 1D data (at a given time t) to 2D data that whose columns names are defined in this.columnNames.
-                // 2d data view is used to create html table element.
-                get rows() {
-                    if (!this.currentFrameData) {
-                        return [];
-                    }
-                    const rows = [];
-                    const nRows = Math.max(...this.columns.map(column => column.value.length));
-                    let rowIndex = 0;
-                    while(rowIndex < nRows){
-                        const row = [];
-                        // number of states may NOT match number of actions. In this case, we null-pad the 2D array to make a fully rectangular 2d array
-                        const nullCell = { isNull: true };
-                        // row consists of [state value, action value]
-                        let idx = rowIndex;
-                        for(const column of this.columns){
-                            const nColumn = column.value.length;
-                            row.push(rowIndex < nColumn ? this.currentFrameData[idx] : nullCell);
-                            idx += nColumn; // because this.currentFrameData = [state0, state1, ..., stateN, action0, action1, ..., actionN]
-                        }
-                        rowIndex += 1;
-                        rows.push(row);
-                    }
-                    return rows;
-                },
-                isRowChecked(rowIndex) {
-                    return this.rows[rowIndex].every(cell => cell && (cell.isNull || cell.checked));
-                },
-                isColumnChecked(colIndex) {
-                    return this.rows.every(row => row[colIndex] && (row[colIndex].isNull || row[colIndex].checked));
-                },
-                toggleRow(rowIndex) {
-                    const newState = !this.isRowChecked(rowIndex);
-                    this.rows[rowIndex].forEach(cell => {
-                        if (cell && !cell.isNull) cell.checked = newState;
-                    });
-                    this.updateTableValues();
-                },
-                toggleColumn(colIndex) {
-                    const newState = !this.isColumnChecked(colIndex);
-                    this.rows.forEach(row => {
-                        if (row[colIndex] && !row[colIndex].isNull) row[colIndex].checked = newState;
-                    });
-                    this.updateTableValues();
-                },
-
-                // given time t, update the values in the html table with "data[t]"
-                updateTableValues(time) {
-                    if (!this.colors) {
-                        return;
-                    }
-                    let pc = (100 / this.video.duration) * (time === undefined ? this.video.currentTime : time);
-                    if (isNaN(pc)) pc = 0;
-                    const index = Math.floor(pc * this.dygraph.numRows() / 100);
-                    // slice(1) to remove the timestamp point that we do not need
-                    const labels = this.dygraph.getLabels().slice(1);
-                    const values = this.dygraph.rawData_[index].slice(1);
-                    const checkedNew = this.currentFrameData ? this.currentFrameData.map(cell => cell.checked) : Array(
-                        this.colors.length).fill(true);
-                    this.currentFrameData = labels.map((label, idx) => ({
-                        label,
-                        value: values[idx],
-                        color: this.colors[idx],
-                        checked: checkedNew[idx],
-                    }));
-                    const shouldUpdateVisibility = !this.checked.every((value, index) => value === checkedNew[index]);
-                    if (shouldUpdateVisibility) {
-                        this.checked = checkedNew;
-                        this.dygraph.setVisibility(this.checked);
-                    }
-                },
-
-                //#endregion
-
-                updateTimeQuery(time) {
-                    let url = new URL(window.location.href);
-                    let params = new URLSearchParams(url.search);
-                    params.set("t", time);
-                    url.search = params.toString();
-                    window.history.replaceState({}, '', url.toString());
-                },
-
-                formatTime(time) {
-                    var hours = Math.floor(time / 3600);
-                    var minutes = Math.floor((time % 3600) / 60);
-                    var seconds = Math.floor(time % 60);
-                    return (hours > 0 ? hours + ':' : '') + (minutes < 10 ? '0' + minutes : minutes) + ':' + (seconds <
-                        10 ?
-                        '0' + seconds : seconds);
-                },
-
-                videoCanPlay() {
-                    this.nVideoReadyToPlay += 1;
-                    if(this.nVideoReadyToPlay == this.nVideos) {
-                        // start autoplay all videos in sync
-                        this.$refs.btnPlay.click();
-                    }
-                }
-            };
-        }
-
-        document.addEventListener('alpine:init', () => {
-            // Episode pagination component
-            Alpine.data('episodePagination', () => ({
-                episodes: {{ episodes }},
-                pageSize: 100,
-                page: 1,
-
-                init() {
-                    // Find which page contains the current episode_id
-                    const currentEpisodeId = {{ episode_id }};
-                    const episodeIndex = this.episodes.indexOf(currentEpisodeId);
-                    if (episodeIndex !== -1) {
-                        this.page = Math.floor(episodeIndex / this.pageSize) + 1;
-                    }
-                },
-
-                get totalPages() {
-                    return Math.ceil(this.episodes.length / this.pageSize);
-                },
-
-                get paginatedEpisodes() {
-                    const start = (this.page - 1) * this.pageSize;
-                    const end = start + this.pageSize;
-                    return this.episodes.slice(start, end);
-                },
-
-                nextPage() {
-                    if (this.page < this.totalPages) {
-                        this.page++;
-                    }
-                },
-
-                prevPage() {
-                    if (this.page > 1) {
-                        this.page--;
-                    }
-                }
-            }));
-        });
-    </script>
-
-    <script>
-        window.addEventListener('keydown', (e) => {
-            // Use the space bar to play and pause, instead of default action (e.g. scrolling)
-            const { keyCode, key } = e;
-
-            if (keyCode === 32 || key === ' ') {
-                e.preventDefault();
-                const btnPause = document.querySelector('[x-ref="btnPause"]');
-                const btnPlay = document.querySelector('[x-ref="btnPlay"]');
-                btnPause.classList.contains('hidden') ? btnPlay.click() : btnPause.click();
-            } else if (key === 'ArrowDown' || key === 'ArrowUp') {
-                const episodes = {{ episodes }};  // Access episodes directly from the Jinja template
-                const nextEpisodeId = key === 'ArrowDown' ? {{ episode_id }} + 1 : {{ episode_id }} - 1;
-                const lowestEpisodeId = episodes.at(0);
-                const highestEpisodeId = episodes.at(-1);
-                if (nextEpisodeId >= lowestEpisodeId && nextEpisodeId <= highestEpisodeId) {
-                    window.location.href = `./episode_${nextEpisodeId}`;
-                }
-            }
-        });
-    </script>
-</body>
-
-</html>
--- a/src/lerobot/utils/buffer.py
+++ b/src/lerobot/utils/buffer.py
@@ -565,10 +565,7 @@ class ReplayBuffer:
        lerobot_dataset.start_image_writer(num_processes=0, num_threads=3)

        # Convert transitions into episodes and frames
-        episode_index = 0
-        lerobot_dataset.episode_buffer = lerobot_dataset.create_episode_buffer(episode_index=episode_index)

-        frame_idx_in_episode = 0
        for idx in range(self.size):
            actual_idx = (self.position - self.size + idx) % self.capacity

@@ -582,6 +579,7 @@ class ReplayBuffer:
            frame_dict["action"] = self.actions[actual_idx].cpu()
            frame_dict["next.reward"] = torch.tensor([self.rewards[actual_idx]], dtype=torch.float32).cpu()
            frame_dict["next.done"] = torch.tensor([self.dones[actual_idx]], dtype=torch.bool).cpu()
+            frame_dict["task"] = task_name

            # Add complementary_info if available
            if self.has_complementary_info:
@@ -597,19 +595,11 @@ class ReplayBuffer:
                        frame_dict[f"complementary_info.{key}"] = val

            # Add to the dataset's buffer
-            lerobot_dataset.add_frame(frame_dict, task=task_name)
-
-            # Move to next frame
-            frame_idx_in_episode += 1
+            lerobot_dataset.add_frame(frame_dict)

            # If we reached an episode boundary, call save_episode, reset counters
            if self.dones[actual_idx] or self.truncateds[actual_idx]:
                lerobot_dataset.save_episode()
-                episode_index += 1
-                frame_idx_in_episode = 0
-                lerobot_dataset.episode_buffer = lerobot_dataset.create_episode_buffer(
-                    episode_index=episode_index
-                )

        # Save any remaining frames in the buffer
        if lerobot_dataset.episode_buffer["size"] > 0:
--- a/src/lerobot/utils/utils.py
+++ b/src/lerobot/utils/utils.py
@@ -274,6 +274,16 @@ def move_cursor_up(lines):
    print(f"\033[{lines}A", end="")


+def get_elapsed_time_in_days_hours_minutes_seconds(elapsed_time_s: float):
+    days = int(elapsed_time_s // (24 * 3600))
+    elapsed_time_s %= 24 * 3600
+    hours = int(elapsed_time_s // 3600)
+    elapsed_time_s %= 3600
+    minutes = int(elapsed_time_s // 60)
+    seconds = elapsed_time_s % 60
+    return days, hours, minutes, seconds
+
+
 class TimerManager:
    """
    Lightweight utility to measure elapsed time.
--- a/tests/artifacts/datasets/save_dataset_to_safetensors.py
+++ b/tests/artifacts/datasets/save_dataset_to_safetensors.py
@@ -47,38 +47,22 @@ def save_dataset_to_safetensors(output_dir, repo_id="lerobot/pusht"):
    )

    # save 2 first frames of first episode
-    i = dataset.episode_data_index["from"][0].item()
+    i = dataset.meta.episodes["dataset_from_index"][0]
    save_file(dataset[i], repo_dir / f"frame_{i}.safetensors")
    save_file(dataset[i + 1], repo_dir / f"frame_{i + 1}.safetensors")

    # save 2 frames at the middle of first episode
-    i = int((dataset.episode_data_index["to"][0].item() - dataset.episode_data_index["from"][0].item()) / 2)
+    i = int(
+        (dataset.meta.episodes["dataset_to_index"][0] - dataset.meta.episodes["dataset_from_index"][0]) / 2
+    )
    save_file(dataset[i], repo_dir / f"frame_{i}.safetensors")
    save_file(dataset[i + 1], repo_dir / f"frame_{i + 1}.safetensors")

    # save 2 last frames of first episode
-    i = dataset.episode_data_index["to"][0].item()
+    i = dataset.meta.episodes["dataset_to_index"][0]
    save_file(dataset[i - 2], repo_dir / f"frame_{i - 2}.safetensors")
    save_file(dataset[i - 1], repo_dir / f"frame_{i - 1}.safetensors")

-    # TODO(rcadene): Enable testing on second and last episode
-    # We currently cant because our test dataset only contains the first episode
-
-    # # save 2 first frames of second episode
-    # i = dataset.episode_data_index["from"][1].item()
-    # save_file(dataset[i], repo_dir / f"frame_{i}.safetensors")
-    # save_file(dataset[i + 1], repo_dir / f"frame_{i+1}.safetensors")
-
-    # # save 2 last frames of second episode
-    # i = dataset.episode_data_index["to"][1].item()
-    # save_file(dataset[i - 2], repo_dir / f"frame_{i-2}.safetensors")
-    # save_file(dataset[i - 1], repo_dir / f"frame_{i-1}.safetensors")
-
-    # # save 2 last frames of last episode
-    # i = dataset.episode_data_index["to"][-1].item()
-    # save_file(dataset[i - 2], repo_dir / f"frame_{i-2}.safetensors")
-    # save_file(dataset[i - 1], repo_dir / f"frame_{i-1}.safetensors")
-

 if __name__ == "__main__":
    for dataset in [
--- a/tests/datasets/test_aggregate.py
+++ b/tests/datasets/test_aggregate.py
@@ -0,0 +1,292 @@
+#!/usr/bin/env python
+
+# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from unittest.mock import patch
+
+import torch
+
+from lerobot.datasets.aggregate import aggregate_datasets
+from lerobot.datasets.lerobot_dataset import LeRobotDataset
+from tests.fixtures.constants import DUMMY_REPO_ID
+
+
+def assert_episode_and_frame_counts(aggr_ds, expected_episodes, expected_frames):
+    """Test that total number of episodes and frames are correctly aggregated."""
+    assert aggr_ds.num_episodes == expected_episodes, (
+        f"Expected {expected_episodes} episodes, got {aggr_ds.num_episodes}"
+    )
+    assert aggr_ds.num_frames == expected_frames, (
+        f"Expected {expected_frames} frames, got {aggr_ds.num_frames}"
+    )
+
+
+def assert_dataset_content_integrity(aggr_ds, ds_0, ds_1):
+    """Test that the content of both datasets is preserved correctly in the aggregated dataset."""
+    keys_to_ignore = ["episode_index", "index", "timestamp"]
+
+    # Test first part of dataset corresponds to ds_0, check first item (index 0) matches ds_0[0]
+    aggr_first_item = aggr_ds[0]
+    ds_0_first_item = ds_0[0]
+
+    # Compare all keys except episode_index and index which should be updated
+    for key in ds_0_first_item:
+        if key not in keys_to_ignore:
+            # Handle both tensor and non-tensor data
+            if torch.is_tensor(aggr_first_item[key]) and torch.is_tensor(ds_0_first_item[key]):
+                assert torch.allclose(aggr_first_item[key], ds_0_first_item[key], atol=1e-6), (
+                    f"First item key '{key}' doesn't match between aggregated and ds_0"
+                )
+            else:
+                assert aggr_first_item[key] == ds_0_first_item[key], (
+                    f"First item key '{key}' doesn't match between aggregated and ds_0"
+                )
+
+    # Check last item of ds_0 part (index len(ds_0)-1) matches ds_0[-1]
+    aggr_ds_0_last_item = aggr_ds[len(ds_0) - 1]
+    ds_0_last_item = ds_0[-1]
+
+    for key in ds_0_last_item:
+        if key not in keys_to_ignore:
+            # Handle both tensor and non-tensor data
+            if torch.is_tensor(aggr_ds_0_last_item[key]) and torch.is_tensor(ds_0_last_item[key]):
+                assert torch.allclose(aggr_ds_0_last_item[key], ds_0_last_item[key], atol=1e-6), (
+                    f"Last ds_0 item key '{key}' doesn't match between aggregated and ds_0"
+                )
+            else:
+                assert aggr_ds_0_last_item[key] == ds_0_last_item[key], (
+                    f"Last ds_0 item key '{key}' doesn't match between aggregated and ds_0"
+                )
+
+    # Test second part of dataset corresponds to ds_1
+    # Check first item of ds_1 part (index len(ds_0)) matches ds_1[0]
+    aggr_ds_1_first_item = aggr_ds[len(ds_0)]
+    ds_1_first_item = ds_1[0]
+
+    for key in ds_1_first_item:
+        if key not in keys_to_ignore:
+            # Handle both tensor and non-tensor data
+            if torch.is_tensor(aggr_ds_1_first_item[key]) and torch.is_tensor(ds_1_first_item[key]):
+                assert torch.allclose(aggr_ds_1_first_item[key], ds_1_first_item[key], atol=1e-6), (
+                    f"First ds_1 item key '{key}' doesn't match between aggregated and ds_1"
+                )
+            else:
+                assert aggr_ds_1_first_item[key] == ds_1_first_item[key], (
+                    f"First ds_1 item key '{key}' doesn't match between aggregated and ds_1"
+                )
+
+    # Check last item matches ds_1[-1]
+    aggr_last_item = aggr_ds[-1]
+    ds_1_last_item = ds_1[-1]
+
+    for key in ds_1_last_item:
+        if key not in keys_to_ignore:
+            # Handle both tensor and non-tensor data
+            if torch.is_tensor(aggr_last_item[key]) and torch.is_tensor(ds_1_last_item[key]):
+                assert torch.allclose(aggr_last_item[key], ds_1_last_item[key], atol=1e-6), (
+                    f"Last item key '{key}' doesn't match between aggregated and ds_1"
+                )
+            else:
+                assert aggr_last_item[key] == ds_1_last_item[key], (
+                    f"Last item key '{key}' doesn't match between aggregated and ds_1"
+                )
+
+
+def assert_metadata_consistency(aggr_ds, ds_0, ds_1):
+    """Test that metadata is correctly aggregated."""
+    # Test basic info
+    assert aggr_ds.fps == ds_0.fps == ds_1.fps, "FPS should be the same across all datasets"
+    assert aggr_ds.meta.info["robot_type"] == ds_0.meta.info["robot_type"] == ds_1.meta.info["robot_type"], (
+        "Robot type should be the same"
+    )
+
+    # Test features are the same
+    assert aggr_ds.features == ds_0.features == ds_1.features, "Features should be the same"
+
+    # Test tasks aggregation
+    expected_tasks = set(ds_0.meta.tasks.index) | set(ds_1.meta.tasks.index)
+    actual_tasks = set(aggr_ds.meta.tasks.index)
+    assert actual_tasks == expected_tasks, f"Expected tasks {expected_tasks}, got {actual_tasks}"
+
+
+def assert_episode_indices_updated_correctly(aggr_ds, ds_0, ds_1):
+    """Test that episode indices are correctly updated after aggregation."""
+    # ds_0 episodes should have episode_index 0 to ds_0.num_episodes-1
+    for i in range(len(ds_0)):
+        assert aggr_ds[i]["episode_index"] < ds_0.num_episodes, (
+            f"Episode index {aggr_ds[i]['episode_index']} at position {i} should be < {ds_0.num_episodes}"
+        )
+
+    def ds1_episodes_condition(ep_idx):
+        return (ep_idx >= ds_0.num_episodes) and (ep_idx < ds_0.num_episodes + ds_1.num_episodes)
+
+    # ds_1 episodes should have episode_index ds_0.num_episodes to total_episodes-1
+    for i in range(len(ds_0), len(ds_0) + len(ds_1)):
+        expected_min_episode_idx = ds_0.num_episodes
+        assert ds1_episodes_condition(aggr_ds[i]["episode_index"]), (
+            f"Episode index {aggr_ds[i]['episode_index']} at position {i} should be >= {expected_min_episode_idx}"
+        )
+
+
+def assert_video_frames_integrity(aggr_ds, ds_0, ds_1):
+    """Test that video frames are correctly preserved and frame indices are updated."""
+
+    def visual_frames_equal(frame1, frame2):
+        return torch.allclose(frame1, frame2)
+
+    video_keys = list(
+        filter(
+            lambda key: aggr_ds.meta.info["features"][key]["dtype"] == "video",
+            aggr_ds.meta.info["features"].keys(),
+        )
+    )
+
+    # Test the section corresponding to the first dataset (ds_0)
+    for i in range(len(ds_0)):
+        assert aggr_ds[i]["index"] == i, (
+            f"Frame index at position {i} should be {i}, but got {aggr_ds[i]['index']}"
+        )
+        for key in video_keys:
+            assert visual_frames_equal(aggr_ds[i][key], ds_0[i][key]), (
+                f"Visual frames at position {i} should be equal between aggregated and ds_0"
+            )
+
+    # Test the section corresponding to the second dataset (ds_1)
+    for i in range(len(ds_0), len(ds_0) + len(ds_1)):
+        # The frame index in the aggregated dataset should also match its position.
+        assert aggr_ds[i]["index"] == i, (
+            f"Frame index at position {i} should be {i}, but got {aggr_ds[i]['index']}"
+        )
+        for key in video_keys:
+            assert visual_frames_equal(aggr_ds[i][key], ds_1[i - len(ds_0)][key]), (
+                f"Visual frames at position {i} should be equal between aggregated and ds_1"
+            )
+
+
+def assert_dataset_iteration_works(aggr_ds):
+    """Test that we can iterate through the entire dataset without errors."""
+    for _ in aggr_ds:
+        pass
+
+
+def test_aggregate_datasets(tmp_path, lerobot_dataset_factory):
+    """Test basic aggregation functionality with standard parameters."""
+    ds_0_num_frames = 400
+    ds_1_num_frames = 800
+    ds_0_num_episodes = 10
+    ds_1_num_episodes = 25
+
+    # Create two datasets with different number of frames and episodes
+    ds_0 = lerobot_dataset_factory(
+        root=tmp_path / "test_0",
+        repo_id=f"{DUMMY_REPO_ID}_0",
+        total_episodes=ds_0_num_episodes,
+        total_frames=ds_0_num_frames,
+    )
+    ds_1 = lerobot_dataset_factory(
+        root=tmp_path / "test_1",
+        repo_id=f"{DUMMY_REPO_ID}_1",
+        total_episodes=ds_1_num_episodes,
+        total_frames=ds_1_num_frames,
+    )
+
+    aggregate_datasets(
+        repo_ids=[ds_0.repo_id, ds_1.repo_id],
+        roots=[ds_0.root, ds_1.root],
+        aggr_repo_id=f"{DUMMY_REPO_ID}_aggr",
+        aggr_root=tmp_path / "test_aggr",
+    )
+
+    # Mock the revision to prevent Hub calls during dataset loading
+    with (
+        patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
+        patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
+    ):
+        mock_get_safe_version.return_value = "v3.0"
+        mock_snapshot_download.return_value = str(tmp_path / "test_aggr")
+        aggr_ds = LeRobotDataset(f"{DUMMY_REPO_ID}_aggr", root=tmp_path / "test_aggr")
+
+    # Run all assertion functions
+    expected_total_episodes = ds_0.num_episodes + ds_1.num_episodes
+    expected_total_frames = ds_0.num_frames + ds_1.num_frames
+
+    assert_episode_and_frame_counts(aggr_ds, expected_total_episodes, expected_total_frames)
+    assert_dataset_content_integrity(aggr_ds, ds_0, ds_1)
+    assert_metadata_consistency(aggr_ds, ds_0, ds_1)
+    assert_episode_indices_updated_correctly(aggr_ds, ds_0, ds_1)
+    assert_video_frames_integrity(aggr_ds, ds_0, ds_1)
+    assert_dataset_iteration_works(aggr_ds)
+
+
+def test_aggregate_with_low_threshold(tmp_path, lerobot_dataset_factory):
+    """Test aggregation with small file size limits to force file rotation/sharding."""
+    ds_0_num_episodes = ds_1_num_episodes = 10
+    ds_0_num_frames = ds_1_num_frames = 400
+
+    ds_0 = lerobot_dataset_factory(
+        root=tmp_path / "small_0",
+        repo_id=f"{DUMMY_REPO_ID}_small_0",
+        total_episodes=ds_0_num_episodes,
+        total_frames=ds_0_num_frames,
+    )
+    ds_1 = lerobot_dataset_factory(
+        root=tmp_path / "small_1",
+        repo_id=f"{DUMMY_REPO_ID}_small_1",
+        total_episodes=ds_1_num_episodes,
+        total_frames=ds_1_num_frames,
+    )
+
+    # Use the new configurable parameters to force file rotation
+    aggregate_datasets(
+        repo_ids=[ds_0.repo_id, ds_1.repo_id],
+        roots=[ds_0.root, ds_1.root],
+        aggr_repo_id=f"{DUMMY_REPO_ID}_small_aggr",
+        aggr_root=tmp_path / "small_aggr",
+        # Tiny file size to trigger new file instantiation
+        data_files_size_in_mb=0.01,
+        video_files_size_in_mb=0.1,
+    )
+
+    # Mock the revision to prevent Hub calls during dataset loading
+    with (
+        patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
+        patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
+    ):
+        mock_get_safe_version.return_value = "v3.0"
+        mock_snapshot_download.return_value = str(tmp_path / "small_aggr")
+        aggr_ds = LeRobotDataset(f"{DUMMY_REPO_ID}_small_aggr", root=tmp_path / "small_aggr")
+
+    # Verify aggregation worked correctly despite file size constraints
+    expected_total_episodes = ds_0_num_episodes + ds_1_num_episodes
+    expected_total_frames = ds_0_num_frames + ds_1_num_frames
+
+    assert_episode_and_frame_counts(aggr_ds, expected_total_episodes, expected_total_frames)
+    assert_dataset_content_integrity(aggr_ds, ds_0, ds_1)
+    assert_metadata_consistency(aggr_ds, ds_0, ds_1)
+    assert_episode_indices_updated_correctly(aggr_ds, ds_0, ds_1)
+    assert_video_frames_integrity(aggr_ds, ds_0, ds_1)
+    assert_dataset_iteration_works(aggr_ds)
+
+    # Check that multiple files were actually created due to small size limits
+    data_dir = tmp_path / "small_aggr" / "data"
+    video_dir = tmp_path / "small_aggr" / "videos"
+
+    if data_dir.exists():
+        parquet_files = list(data_dir.rglob("*.parquet"))
+        assert len(parquet_files) > 1, "Small file size limits should create multiple parquet files"
+
+    if video_dir.exists():
+        video_files = list(video_dir.rglob("*.mp4"))
+        assert len(video_files) > 1, "Small file size limits should create multiple video files"
--- a/tests/datasets/test_datasets.py
+++ b/tests/datasets/test_datasets.py
@@ -13,10 +13,8 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import json
 import logging
 import re
-from copy import deepcopy
 from itertools import chain
 from pathlib import Path

@@ -37,13 +35,19 @@ from lerobot.datasets.lerobot_dataset import (
    MultiLeRobotDataset,
 )
 from lerobot.datasets.utils import (
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    create_branch,
-    flatten_dict,
-    unflatten_dict,
+    get_hf_features_from_features,
+    hf_transform_to_torch,
+    hw_to_dataset_features,
 )
 from lerobot.envs.factory import make_env_config
 from lerobot.policies.factory import make_policy_config
+from lerobot.robots import make_robot_from_config
 from tests.fixtures.constants import DUMMY_CHW, DUMMY_HWC, DUMMY_REPO_ID
+from tests.mocks.mock_robot import MockRobotConfig
 from tests.utils import require_x86_64_kernel


@@ -69,12 +73,17 @@ def test_same_attributes_defined(tmp_path, lerobot_dataset_factory):
    objects have the same sets of attributes defined.
    """
    # Instantiate both ways
-    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
+    robot = make_robot_from_config(MockRobotConfig())
+    action_features = hw_to_dataset_features(robot.action_features, "action", True)
+    obs_features = hw_to_dataset_features(robot.observation_features, "observation", True)
+    dataset_features = {**action_features, **obs_features}
    root_create = tmp_path / "create"
-    dataset_create = LeRobotDataset.create(repo_id=DUMMY_REPO_ID, fps=30, features=features, root=root_create)
+    dataset_create = LeRobotDataset.create(
+        repo_id=DUMMY_REPO_ID, fps=30, features=dataset_features, root=root_create
+    )

    root_init = tmp_path / "init"
-    dataset_init = lerobot_dataset_factory(root=root_init)
+    dataset_init = lerobot_dataset_factory(root=root_init, total_episodes=1, total_frames=1)

    init_attr = set(vars(dataset_init).keys())
    create_attr = set(vars(dataset_create).keys())
@@ -99,13 +108,41 @@ def test_dataset_initialization(tmp_path, lerobot_dataset_factory):
    assert dataset.num_frames == len(dataset)


+# TODO(rcadene, aliberts): do not run LeRobotDataset.create, instead refactor LeRobotDatasetMetadata.create
+# and test the small resulting function that validates the features
+def test_dataset_feature_with_forward_slash_raises_error():
+    # make sure dir does not exist
+    from lerobot.constants import HF_LEROBOT_HOME
+
+    dataset_dir = HF_LEROBOT_HOME / "lerobot/test/with/slash"
+    # make sure does not exist
+    if dataset_dir.exists():
+        dataset_dir.rmdir()
+
+    with pytest.raises(ValueError):
+        LeRobotDataset.create(
+            repo_id="lerobot/test/with/slash",
+            fps=30,
+            features={"a/b": {"dtype": "float32", "shape": 2, "names": None}},
+        )
+
+
+def test_add_frame_missing_task(tmp_path, empty_lerobot_dataset_factory):
+    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
+    with pytest.raises(
+        ValueError, match="Feature mismatch in `frame` dictionary:\nMissing features: {'task'}\n"
+    ):
+        dataset.add_frame({"state": torch.randn(1)})
+
+
 def test_add_frame_missing_feature(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
    with pytest.raises(
        ValueError, match="Feature mismatch in `frame` dictionary:\nMissing features: {'state'}\n"
    ):
-        dataset.add_frame({"wrong_feature": torch.randn(1)}, task="Dummy task")
+        dataset.add_frame({"task": "Dummy task"})


 def test_add_frame_extra_feature(tmp_path, empty_lerobot_dataset_factory):
@@ -114,7 +151,7 @@ def test_add_frame_extra_feature(tmp_path, empty_lerobot_dataset_factory):
    with pytest.raises(
        ValueError, match="Feature mismatch in `frame` dictionary:\nExtra features: {'extra'}\n"
    ):
-        dataset.add_frame({"state": torch.randn(1), "extra": "dummy_extra"}, task="Dummy task")
+        dataset.add_frame({"state": torch.randn(1), "task": "Dummy task", "extra": "dummy_extra"})


 def test_add_frame_wrong_type(tmp_path, empty_lerobot_dataset_factory):
@@ -123,7 +160,7 @@ def test_add_frame_wrong_type(tmp_path, empty_lerobot_dataset_factory):
    with pytest.raises(
        ValueError, match="The feature 'state' of dtype 'float16' is not of the expected dtype 'float32'.\n"
    ):
-        dataset.add_frame({"state": torch.randn(1, dtype=torch.float16)}, task="Dummy task")
+        dataset.add_frame({"state": torch.randn(1, dtype=torch.float16), "task": "Dummy task"})


 def test_add_frame_wrong_shape(tmp_path, empty_lerobot_dataset_factory):
@@ -133,7 +170,7 @@ def test_add_frame_wrong_shape(tmp_path, empty_lerobot_dataset_factory):
        ValueError,
        match=re.escape("The feature 'state' of shape '(1,)' does not have the expected shape '(2,)'.\n"),
    ):
-        dataset.add_frame({"state": torch.randn(1)}, task="Dummy task")
+        dataset.add_frame({"state": torch.randn(1), "task": "Dummy task"})


 def test_add_frame_wrong_shape_python_float(tmp_path, empty_lerobot_dataset_factory):
@@ -145,7 +182,7 @@ def test_add_frame_wrong_shape_python_float(tmp_path, empty_lerobot_dataset_fact
            "The feature 'state' is not a 'np.ndarray'. Expected type is 'float32', but type '<class 'float'>' provided instead.\n"
        ),
    ):
-        dataset.add_frame({"state": 1.0}, task="Dummy task")
+        dataset.add_frame({"state": 1.0, "task": "Dummy task"})


 def test_add_frame_wrong_shape_torch_ndim_0(tmp_path, empty_lerobot_dataset_factory):
@@ -155,7 +192,7 @@ def test_add_frame_wrong_shape_torch_ndim_0(tmp_path, empty_lerobot_dataset_fact
        ValueError,
        match=re.escape("The feature 'state' of shape '()' does not have the expected shape '(1,)'.\n"),
    ):
-        dataset.add_frame({"state": torch.tensor(1.0)}, task="Dummy task")
+        dataset.add_frame({"state": torch.tensor(1.0), "task": "Dummy task"})


 def test_add_frame_wrong_shape_numpy_ndim_0(tmp_path, empty_lerobot_dataset_factory):
@@ -167,13 +204,13 @@ def test_add_frame_wrong_shape_numpy_ndim_0(tmp_path, empty_lerobot_dataset_fact
            "The feature 'state' is not a 'np.ndarray'. Expected type is 'float32', but type '<class 'numpy.float32'>' provided instead.\n"
        ),
    ):
-        dataset.add_frame({"state": np.float32(1.0)}, task="Dummy task")
+        dataset.add_frame({"state": np.float32(1.0), "task": "Dummy task"})


 def test_add_frame(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(1)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(1), "task": "Dummy task"})
    dataset.save_episode()

    assert len(dataset) == 1
@@ -185,7 +222,7 @@ def test_add_frame(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_1d(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (2,), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(2)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(2), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].shape == torch.Size([2])
@@ -194,7 +231,7 @@ def test_add_frame_state_1d(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_2d(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (2, 4), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(2, 4)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(2, 4), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].shape == torch.Size([2, 4])
@@ -203,7 +240,7 @@ def test_add_frame_state_2d(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_3d(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (2, 4, 3), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(2, 4, 3)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(2, 4, 3), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].shape == torch.Size([2, 4, 3])
@@ -212,7 +249,7 @@ def test_add_frame_state_3d(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_4d(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (2, 4, 3, 5), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(2, 4, 3, 5)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(2, 4, 3, 5), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].shape == torch.Size([2, 4, 3, 5])
@@ -221,7 +258,7 @@ def test_add_frame_state_4d(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_5d(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (2, 4, 3, 5, 1), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": torch.randn(2, 4, 3, 5, 1)}, task="Dummy task")
+    dataset.add_frame({"state": torch.randn(2, 4, 3, 5, 1), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].shape == torch.Size([2, 4, 3, 5, 1])
@@ -230,7 +267,7 @@ def test_add_frame_state_5d(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_state_numpy(tmp_path, empty_lerobot_dataset_factory):
    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"state": np.array([1], dtype=np.float32)}, task="Dummy task")
+    dataset.add_frame({"state": np.array([1], dtype=np.float32), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["state"].ndim == 0
@@ -239,7 +276,7 @@ def test_add_frame_state_numpy(tmp_path, empty_lerobot_dataset_factory):
 def test_add_frame_string(tmp_path, empty_lerobot_dataset_factory):
    features = {"caption": {"dtype": "string", "shape": (1,), "names": None}}
    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
-    dataset.add_frame({"caption": "Dummy caption"}, task="Dummy task")
+    dataset.add_frame({"caption": "Dummy caption", "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["caption"] == "Dummy caption"
@@ -254,7 +291,7 @@ def test_add_frame_image_wrong_shape(image_dataset):
        ),
    ):
        c, h, w = DUMMY_CHW
-        dataset.add_frame({"image": torch.randn(c, w, h)}, task="Dummy task")
+        dataset.add_frame({"image": torch.randn(c, w, h), "task": "Dummy task"})


 def test_add_frame_image_wrong_range(image_dataset):
@@ -267,14 +304,14 @@ def test_add_frame_image_wrong_range(image_dataset):
    Hence the image won't be saved on disk and save_episode will raise `FileNotFoundError`.
    """
    dataset = image_dataset
-    dataset.add_frame({"image": np.random.rand(*DUMMY_CHW) * 255}, task="Dummy task")
+    dataset.add_frame({"image": np.random.rand(*DUMMY_CHW) * 255, "task": "Dummy task"})
    with pytest.raises(FileNotFoundError):
        dataset.save_episode()


 def test_add_frame_image(image_dataset):
    dataset = image_dataset
-    dataset.add_frame({"image": np.random.rand(*DUMMY_CHW)}, task="Dummy task")
+    dataset.add_frame({"image": np.random.rand(*DUMMY_CHW), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["image"].shape == torch.Size(DUMMY_CHW)
@@ -282,7 +319,7 @@ def test_add_frame_image(image_dataset):

 def test_add_frame_image_h_w_c(image_dataset):
    dataset = image_dataset
-    dataset.add_frame({"image": np.random.rand(*DUMMY_HWC)}, task="Dummy task")
+    dataset.add_frame({"image": np.random.rand(*DUMMY_HWC), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["image"].shape == torch.Size(DUMMY_CHW)
@@ -291,7 +328,7 @@ def test_add_frame_image_h_w_c(image_dataset):
 def test_add_frame_image_uint8(image_dataset):
    dataset = image_dataset
    image = np.random.randint(0, 256, DUMMY_HWC, dtype=np.uint8)
-    dataset.add_frame({"image": image}, task="Dummy task")
+    dataset.add_frame({"image": image, "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["image"].shape == torch.Size(DUMMY_CHW)
@@ -300,7 +337,7 @@ def test_add_frame_image_uint8(image_dataset):
 def test_add_frame_image_pil(image_dataset):
    dataset = image_dataset
    image = np.random.randint(0, 256, DUMMY_HWC, dtype=np.uint8)
-    dataset.add_frame({"image": Image.fromarray(image)}, task="Dummy task")
+    dataset.add_frame({"image": Image.fromarray(image), "task": "Dummy task"})
    dataset.save_episode()

    assert dataset[0]["image"].shape == torch.Size(DUMMY_CHW)
@@ -319,6 +356,13 @@ def test_image_array_to_pil_image_wrong_range_float_0_255():
 # - [ ] test push_to_hub
 # - [ ] test smaller methods

+# TODO(rcadene):
+# - [ ] fix code so that old test_factory + backward pass
+# - [ ] write new unit tests to test save_episode + getitem
+#   - [ ] save_episode : case where new dataset, concatenate same file, write new file (meta/episodes, data, videos)
+#   - [ ]
+# - [ ] remove old tests
+

@pytest.mark.parametrize(
    "env_name, repo_id, policy_name",
@@ -338,9 +382,8 @@ def test_factory(env_name, repo_id, policy_name):
        # TODO(rcadene, aliberts): remove dataset download
        dataset=DatasetConfig(repo_id=repo_id, episodes=[0]),
        env=make_env_config(env_name),
-        policy=make_policy_config(policy_name, push_to_hub=False),
+        policy=make_policy_config(policy_name),
    )
-    cfg.validate()

    dataset = make_dataset(cfg)
    delta_timestamps = dataset.delta_timestamps
@@ -427,30 +470,6 @@ def test_multidataset_frames():
            assert torch.equal(sub_dataset_item[k], dataset_item[k])


-# TODO(aliberts): Move to more appropriate location
-def test_flatten_unflatten_dict():
-    d = {
-        "obs": {
-            "min": 0,
-            "max": 1,
-            "mean": 2,
-            "std": 3,
-        },
-        "action": {
-            "min": 4,
-            "max": 5,
-            "mean": 6,
-            "std": 7,
-        },
-    }
-
-    original_d = deepcopy(d)
-    d = unflatten_dict(flatten_dict(d))
-
-    # test equality between nested dicts
-    assert json.dumps(original_d, sort_keys=True) == json.dumps(d, sort_keys=True), f"{original_d} != {d}"
-
-
@pytest.mark.parametrize(
    "repo_id",
    [
@@ -497,38 +516,22 @@ def test_backward_compatibility(repo_id):
            )

    # test2 first frames of first episode
-    i = dataset.episode_data_index["from"][0].item()
+    i = dataset.meta.episodes[0]["dataset_from_index"]
    load_and_compare(i)
    load_and_compare(i + 1)

    # test 2 frames at the middle of first episode
-    i = int((dataset.episode_data_index["to"][0].item() - dataset.episode_data_index["from"][0].item()) / 2)
+    i = int(
+        (dataset.meta.episodes[0]["dataset_to_index"] - dataset.meta.episodes[0]["dataset_from_index"]) / 2
+    )
    load_and_compare(i)
    load_and_compare(i + 1)

    # test 2 last frames of first episode
-    i = dataset.episode_data_index["to"][0].item()
+    i = dataset.meta.episodes[0]["dataset_to_index"]
    load_and_compare(i - 2)
    load_and_compare(i - 1)

-    # TODO(rcadene): Enable testing on second and last episode
-    # We currently cant because our test dataset only contains the first episode
-
-    # # test 2 first frames of second episode
-    # i = dataset.episode_data_index["from"][1].item()
-    # load_and_compare(i)
-    # load_and_compare(i + 1)
-
-    # # test 2 last frames of second episode
-    # i = dataset.episode_data_index["to"][1].item()
-    # load_and_compare(i - 2)
-    # load_and_compare(i - 1)
-
-    # # test 2 last frames of last episode
-    # i = dataset.episode_data_index["to"][-1].item()
-    # load_and_compare(i - 2)
-    # load_and_compare(i - 1)
-

@pytest.mark.skip("Requires internet access")
 def test_create_branch():
@@ -556,18 +559,499 @@ def test_create_branch():
    api.delete_repo(repo_id, repo_type=repo_type)


-def test_dataset_feature_with_forward_slash_raises_error():
-    # make sure dir does not exist
-    from lerobot.constants import HF_LEROBOT_HOME
+def test_check_cached_episodes_sufficient(tmp_path, lerobot_dataset_factory):
+    """Test the _check_cached_episodes_sufficient method of LeRobotDataset."""
+    # Create a dataset with 5 episodes (0-4)
+    dataset = lerobot_dataset_factory(
+        root=tmp_path / "test",
+        total_episodes=5,
+        total_frames=200,
+        use_videos=False,
+    )

-    dataset_dir = HF_LEROBOT_HOME / "lerobot/test/with/slash"
-    # make sure does not exist
-    if dataset_dir.exists():
-        dataset_dir.rmdir()
+    # Test hf_dataset is None
+    dataset.hf_dataset = None
+    assert dataset._check_cached_episodes_sufficient() is False

-    with pytest.raises(ValueError):
-        LeRobotDataset.create(
-            repo_id="lerobot/test/with/slash",
-            fps=30,
-            features={"a/b": {"dtype": "float32", "shape": 2, "names": None}},
+    # Test hf_dataset is empty
+    import datasets
+
+    empty_features = get_hf_features_from_features(dataset.features)
+    dataset.hf_dataset = datasets.Dataset.from_dict(
+        {key: [] for key in empty_features}, features=empty_features
+    )
+    dataset.hf_dataset.set_transform(hf_transform_to_torch)
+    assert dataset._check_cached_episodes_sufficient() is False
+
+    # Restore the original dataset for remaining tests
+    dataset.hf_dataset = dataset.load_hf_dataset()
+
+    # Test all episodes requested (self.episodes = None) and all are available
+    dataset.episodes = None
+    assert dataset._check_cached_episodes_sufficient() is True
+
+    # Test specific episodes requested that are all available
+    dataset.episodes = [0, 2, 4]
+    assert dataset._check_cached_episodes_sufficient() is True
+
+    # Test request episodes that don't exist in the cached dataset
+    # Create a dataset with only episodes 0, 1, 2
+    limited_dataset = lerobot_dataset_factory(
+        root=tmp_path / "limited",
+        total_episodes=3,
+        total_frames=120,
+        use_videos=False,
+    )
+
+    # Request episodes that include non-existent ones
+    limited_dataset.episodes = [0, 1, 2, 3, 4]
+    assert limited_dataset._check_cached_episodes_sufficient() is False
+
+    # Test create a dataset with sparse episodes (e.g., only episodes 0, 2, 4)
+    # First create the full dataset structure
+    sparse_dataset = lerobot_dataset_factory(
+        root=tmp_path / "sparse",
+        total_episodes=5,
+        total_frames=200,
+        use_videos=False,
+    )
+
+    # Manually filter hf_dataset to only include episodes 0, 2, 4
+    episode_indices = sparse_dataset.hf_dataset["episode_index"]
+    mask = torch.zeros(len(episode_indices), dtype=torch.bool)
+    for ep in [0, 2, 4]:
+        mask |= torch.tensor(episode_indices) == ep
+
+    # Create a filtered dataset
+    filtered_data = {}
+    # Find image keys by checking features
+    image_keys = [key for key, ft in sparse_dataset.features.items() if ft.get("dtype") == "image"]
+
+    for key in sparse_dataset.hf_dataset.column_names:
+        values = sparse_dataset.hf_dataset[key]
+        # Filter values based on mask
+        filtered_values = [val for i, val in enumerate(values) if mask[i]]
+
+        # Convert float32 image tensors back to uint8 numpy arrays for HuggingFace dataset
+        if key in image_keys and len(filtered_values) > 0:
+            # Convert torch tensors (float32, [0, 1], CHW) back to numpy arrays (uint8, [0, 255], HWC)
+            filtered_values = [
+                (val.permute(1, 2, 0).numpy() * 255).astype(np.uint8) for val in filtered_values
+            ]
+
+        filtered_data[key] = filtered_values
+
+    sparse_dataset.hf_dataset = datasets.Dataset.from_dict(
+        filtered_data, features=get_hf_features_from_features(sparse_dataset.features)
+    )
+    sparse_dataset.hf_dataset.set_transform(hf_transform_to_torch)
+
+    # Test requesting all episodes when only some are cached
+    sparse_dataset.episodes = None
+    assert sparse_dataset._check_cached_episodes_sufficient() is False
+
+    # Test requesting only the available episodes
+    sparse_dataset.episodes = [0, 2, 4]
+    assert sparse_dataset._check_cached_episodes_sufficient() is True
+
+    # Test requesting a mix of available and unavailable episodes
+    sparse_dataset.episodes = [0, 1, 2]
+    assert sparse_dataset._check_cached_episodes_sufficient() is False
+
+
+def test_update_chunk_settings(tmp_path, empty_lerobot_dataset_factory):
+    """Test the update_chunk_settings functionality for both LeRobotDataset and LeRobotDatasetMetadata."""
+    features = {
+        "observation.state": {
+            "dtype": "float32",
+            "shape": (6,),
+            "names": ["shoulder_pan", "shoulder_lift", "elbow", "wrist_1", "wrist_2", "wrist_3"],
+        },
+        "action": {
+            "dtype": "float32",
+            "shape": (6,),
+            "names": ["shoulder_pan", "shoulder_lift", "elbow", "wrist_1", "wrist_2", "wrist_3"],
+        },
+    }
+
+    # Create dataset with default chunk settings
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features)
+
+    # Test initial default values
+    initial_settings = dataset.meta.get_chunk_settings()
+    assert initial_settings["chunks_size"] == DEFAULT_CHUNK_SIZE
+    assert initial_settings["data_files_size_in_mb"] == DEFAULT_DATA_FILE_SIZE_IN_MB
+    assert initial_settings["video_files_size_in_mb"] == DEFAULT_VIDEO_FILE_SIZE_IN_MB
+
+    # Test updating all settings at once
+    new_chunks_size = 2000
+    new_data_size = 200
+    new_video_size = 1000
+
+    dataset.meta.update_chunk_settings(
+        chunks_size=new_chunks_size,
+        data_files_size_in_mb=new_data_size,
+        video_files_size_in_mb=new_video_size,
+    )
+
+    # Verify settings were updated
+    updated_settings = dataset.meta.get_chunk_settings()
+    assert updated_settings["chunks_size"] == new_chunks_size
+    assert updated_settings["data_files_size_in_mb"] == new_data_size
+    assert updated_settings["video_files_size_in_mb"] == new_video_size
+
+    # Test updating individual settings
+    dataset.meta.update_chunk_settings(chunks_size=1500)
+    settings_after_partial = dataset.meta.get_chunk_settings()
+    assert settings_after_partial["chunks_size"] == 1500
+    assert settings_after_partial["data_files_size_in_mb"] == new_data_size
+    assert settings_after_partial["video_files_size_in_mb"] == new_video_size
+
+    # Test updating only data file size
+    dataset.meta.update_chunk_settings(data_files_size_in_mb=150)
+    settings_after_data = dataset.meta.get_chunk_settings()
+    assert settings_after_data["chunks_size"] == 1500
+    assert settings_after_data["data_files_size_in_mb"] == 150
+    assert settings_after_data["video_files_size_in_mb"] == new_video_size
+
+    # Test updating only video file size
+    dataset.meta.update_chunk_settings(video_files_size_in_mb=800)
+    settings_after_video = dataset.meta.get_chunk_settings()
+    assert settings_after_video["chunks_size"] == 1500
+    assert settings_after_video["data_files_size_in_mb"] == 150
+    assert settings_after_video["video_files_size_in_mb"] == 800
+
+    # Test that settings persist in the info file
+    info_path = dataset.root / "meta" / "info.json"
+    assert info_path.exists()
+
+    # Verify the underlying metadata properties
+    assert dataset.meta.chunks_size == 1500
+    assert dataset.meta.data_files_size_in_mb == 150
+    assert dataset.meta.video_files_size_in_mb == 800
+
+    # Test error handling for invalid values
+    with pytest.raises(ValueError, match="chunks_size must be positive"):
+        dataset.meta.update_chunk_settings(chunks_size=0)
+
+    with pytest.raises(ValueError, match="chunks_size must be positive"):
+        dataset.meta.update_chunk_settings(chunks_size=-100)
+
+    with pytest.raises(ValueError, match="data_files_size_in_mb must be positive"):
+        dataset.meta.update_chunk_settings(data_files_size_in_mb=0)
+
+    with pytest.raises(ValueError, match="data_files_size_in_mb must be positive"):
+        dataset.meta.update_chunk_settings(data_files_size_in_mb=-50)
+
+    with pytest.raises(ValueError, match="video_files_size_in_mb must be positive"):
+        dataset.meta.update_chunk_settings(video_files_size_in_mb=0)
+
+    with pytest.raises(ValueError, match="video_files_size_in_mb must be positive"):
+        dataset.meta.update_chunk_settings(video_files_size_in_mb=-200)
+
+    # Test calling with None values (should not change anything)
+    settings_before_none = dataset.meta.get_chunk_settings()
+    dataset.meta.update_chunk_settings(
+        chunks_size=None, data_files_size_in_mb=None, video_files_size_in_mb=None
+    )
+    settings_after_none = dataset.meta.get_chunk_settings()
+    assert settings_before_none == settings_after_none
+
+    # Test metadata direct access
+    meta_settings = dataset.meta.get_chunk_settings()
+    assert meta_settings == dataset.meta.get_chunk_settings()
+
+    # Test updating via metadata directly
+    dataset.meta.update_chunk_settings(chunks_size=3000)
+    assert dataset.meta.get_chunk_settings()["chunks_size"] == 3000
+
+
+def test_update_chunk_settings_video_dataset(tmp_path):
+    """Test update_chunk_settings with a video dataset to ensure video-specific logic works."""
+    features = {
+        "observation.images.cam": {
+            "dtype": "video",
+            "shape": (480, 640, 3),
+            "names": ["height", "width", "channels"],
+        },
+        "action": {"dtype": "float32", "shape": (6,), "names": ["j1", "j2", "j3", "j4", "j5", "j6"]},
+    }
+
+    # Create video dataset
+    dataset = LeRobotDataset.create(
+        repo_id=DUMMY_REPO_ID, fps=30, features=features, root=tmp_path / "video_test", use_videos=True
+    )
+
+    # Test that video-specific settings work
+    original_video_size = dataset.meta.get_chunk_settings()["video_files_size_in_mb"]
+    new_video_size = original_video_size * 2
+
+    dataset.meta.update_chunk_settings(video_files_size_in_mb=new_video_size)
+    assert dataset.meta.get_chunk_settings()["video_files_size_in_mb"] == new_video_size
+    assert dataset.meta.video_files_size_in_mb == new_video_size
+
+
+def test_episode_index_distribution(tmp_path, empty_lerobot_dataset_factory):
+    """Test that all frames have correct episode indices across multiple episodes."""
+    features = {"state": {"dtype": "float32", "shape": (2,), "names": None}}
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    # Create 3 episodes with different lengths
+    num_episodes = 3
+    frames_per_episode = [10, 15, 8]
+
+    for episode_idx in range(num_episodes):
+        for _ in range(frames_per_episode[episode_idx]):
+            dataset.add_frame({"state": torch.randn(2), "task": f"task_{episode_idx}"})
+        dataset.save_episode()
+
+    # Load the dataset and check episode indices
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    # Check specific frames across episode boundaries
+    cumulative = 0
+    for ep_idx, ep_length in enumerate(frames_per_episode):
+        # Check start, middle, and end of each episode
+        start_frame = cumulative
+        middle_frame = cumulative + ep_length // 2
+        end_frame = cumulative + ep_length - 1
+
+        for frame_idx in [start_frame, middle_frame, end_frame]:
+            frame_data = loaded_dataset[frame_idx]
+            actual_ep_idx = frame_data["episode_index"].item()
+            assert actual_ep_idx == ep_idx, (
+                f"Frame {frame_idx} has episode_index {actual_ep_idx}, should be {ep_idx}"
+            )
+
+        cumulative += ep_length
+
+    # Check episode index distribution
+    all_episode_indices = [loaded_dataset[i]["episode_index"].item() for i in range(len(loaded_dataset))]
+    from collections import Counter
+
+    distribution = Counter(all_episode_indices)
+    expected_dist = {i: frames_per_episode[i] for i in range(num_episodes)}
+
+    assert dict(distribution) == expected_dist, (
+        f"Episode distribution {dict(distribution)} != expected {expected_dist}"
+    )
+
+
+def test_multi_episode_metadata_consistency(tmp_path, empty_lerobot_dataset_factory):
+    """Test episode metadata consistency across multiple episodes."""
+    features = {
+        "state": {"dtype": "float32", "shape": (3,), "names": ["x", "y", "z"]},
+        "action": {"dtype": "float32", "shape": (2,), "names": ["v", "w"]},
+    }
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    num_episodes = 4
+    frames_per_episode = [20, 35, 10, 25]
+    tasks = ["pick", "place", "pick", "place"]
+
+    for episode_idx in range(num_episodes):
+        for _ in range(frames_per_episode[episode_idx]):
+            dataset.add_frame({"state": torch.randn(3), "action": torch.randn(2), "task": tasks[episode_idx]})
+        dataset.save_episode()
+
+    # Load and validate episode metadata
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    assert loaded_dataset.meta.total_episodes == num_episodes
+    assert loaded_dataset.meta.total_frames == sum(frames_per_episode)
+
+    cumulative_frames = 0
+    for episode_idx in range(num_episodes):
+        episode_metadata = loaded_dataset.meta.episodes[episode_idx]
+
+        # Check basic episode properties
+        assert episode_metadata["episode_index"] == episode_idx
+        assert episode_metadata["length"] == frames_per_episode[episode_idx]
+        assert episode_metadata["tasks"] == [tasks[episode_idx]]
+
+        # Check dataset indices
+        expected_from = cumulative_frames
+        expected_to = cumulative_frames + frames_per_episode[episode_idx]
+
+        assert episode_metadata["dataset_from_index"] == expected_from
+        assert episode_metadata["dataset_to_index"] == expected_to
+
+        cumulative_frames += frames_per_episode[episode_idx]
+
+
+def test_data_consistency_across_episodes(tmp_path, empty_lerobot_dataset_factory):
+    """Test that episodes have no gaps or overlaps in their data indices."""
+    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    num_episodes = 5
+    frames_per_episode = [12, 8, 20, 15, 5]
+
+    for episode_idx in range(num_episodes):
+        for _ in range(frames_per_episode[episode_idx]):
+            dataset.add_frame({"state": torch.randn(1), "task": "consistency_test"})
+        dataset.save_episode()
+
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    # Check data consistency - no gaps or overlaps
+    cumulative_check = 0
+    for episode_idx in range(num_episodes):
+        episode_metadata = loaded_dataset.meta.episodes[episode_idx]
+        from_idx = episode_metadata["dataset_from_index"]
+        to_idx = episode_metadata["dataset_to_index"]
+
+        # Check that episode starts exactly where previous ended
+        assert from_idx == cumulative_check, (
+            f"Episode {episode_idx} starts at {from_idx}, expected {cumulative_check}"
        )
+
+        # Check that episode length matches expected
+        actual_length = to_idx - from_idx
+        expected_length = frames_per_episode[episode_idx]
+        assert actual_length == expected_length, (
+            f"Episode {episode_idx} length {actual_length} != expected {expected_length}"
+        )
+
+        cumulative_check = to_idx
+
+    # Final check: last episode should end at total frames
+    expected_total_frames = sum(frames_per_episode)
+    assert cumulative_check == expected_total_frames, (
+        f"Final frame count {cumulative_check} != expected {expected_total_frames}"
+    )
+
+
+def test_statistics_metadata_validation(tmp_path, empty_lerobot_dataset_factory):
+    """Test that statistics are properly computed and stored for all features."""
+    features = {
+        "state": {"dtype": "float32", "shape": (2,), "names": ["pos", "vel"]},
+        "action": {"dtype": "float32", "shape": (1,), "names": ["force"]},
+    }
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    # Create controlled data to verify statistics
+    num_episodes = 2
+    frames_per_episode = [10, 10]
+
+    # Use deterministic data for predictable statistics
+    torch.manual_seed(42)
+    for episode_idx in range(num_episodes):
+        for frame_idx in range(frames_per_episode[episode_idx]):
+            state_data = torch.tensor([frame_idx * 0.1, frame_idx * 0.2], dtype=torch.float32)
+            action_data = torch.tensor([frame_idx * 0.05], dtype=torch.float32)
+            dataset.add_frame({"state": state_data, "action": action_data, "task": "stats_test"})
+        dataset.save_episode()
+
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    # Check that statistics exist for all features
+    assert loaded_dataset.meta.stats is not None, "No statistics found"
+
+    for feature_name in features.keys():
+        assert feature_name in loaded_dataset.meta.stats, f"No statistics for feature '{feature_name}'"
+
+        feature_stats = loaded_dataset.meta.stats[feature_name]
+        expected_stats = ["min", "max", "mean", "std", "count"]
+
+        for stat_key in expected_stats:
+            assert stat_key in feature_stats, f"Missing '{stat_key}' statistic for '{feature_name}'"
+
+            stat_value = feature_stats[stat_key]
+            # Basic sanity checks
+            if stat_key == "count":
+                assert stat_value == sum(frames_per_episode), f"Wrong count for '{feature_name}'"
+            elif stat_key in ["min", "max", "mean", "std"]:
+                # Check that statistics are reasonable (not NaN, proper shapes)
+                if hasattr(stat_value, "shape"):
+                    expected_shape = features[feature_name]["shape"]
+                    assert stat_value.shape == expected_shape or len(stat_value) == expected_shape[0], (
+                        f"Wrong shape for {stat_key} of '{feature_name}'"
+                    )
+                # Check no NaN values
+                if hasattr(stat_value, "__iter__"):
+                    assert not any(np.isnan(v) for v in stat_value), f"NaN in {stat_key} for '{feature_name}'"
+                else:
+                    assert not np.isnan(stat_value), f"NaN in {stat_key} for '{feature_name}'"
+
+
+def test_episode_boundary_integrity(tmp_path, empty_lerobot_dataset_factory):
+    """Test frame indices and episode transitions at episode boundaries."""
+    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    num_episodes = 3
+    frames_per_episode = [7, 12, 5]
+
+    for episode_idx in range(num_episodes):
+        for frame_idx in range(frames_per_episode[episode_idx]):
+            dataset.add_frame({"state": torch.tensor([float(frame_idx)]), "task": f"episode_{episode_idx}"})
+        dataset.save_episode()
+
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    # Test episode boundaries
+    cumulative = 0
+    for ep_idx, ep_length in enumerate(frames_per_episode):
+        if ep_idx > 0:
+            # Check last frame of previous episode
+            prev_frame = loaded_dataset[cumulative - 1]
+            assert prev_frame["episode_index"].item() == ep_idx - 1
+
+        # Check first frame of current episode
+        if cumulative < len(loaded_dataset):
+            curr_frame = loaded_dataset[cumulative]
+            assert curr_frame["episode_index"].item() == ep_idx
+
+        # Check frame_index within episode
+        for i in range(ep_length):
+            if cumulative + i < len(loaded_dataset):
+                frame = loaded_dataset[cumulative + i]
+                assert frame["frame_index"].item() == i, f"Frame {cumulative + i} has wrong frame_index"
+                assert frame["episode_index"].item() == ep_idx, (
+                    f"Frame {cumulative + i} has wrong episode_index"
+                )
+
+        cumulative += ep_length
+
+
+def test_task_indexing_and_validation(tmp_path, empty_lerobot_dataset_factory):
+    """Test that tasks are properly indexed and retrievable."""
+    features = {"state": {"dtype": "float32", "shape": (1,), "names": None}}
+    dataset = empty_lerobot_dataset_factory(root=tmp_path / "test", features=features, use_videos=False)
+
+    # Use multiple tasks, including repeated ones
+    tasks = ["pick", "place", "pick", "navigate", "place"]
+    unique_tasks = list(set(tasks))  # ["pick", "place", "navigate"]
+    frames_per_episode = [5, 8, 3, 10, 6]
+
+    for episode_idx, task in enumerate(tasks):
+        for _ in range(frames_per_episode[episode_idx]):
+            dataset.add_frame({"state": torch.randn(1), "task": task})
+        dataset.save_episode()
+
+    loaded_dataset = LeRobotDataset(dataset.repo_id, root=dataset.root)
+
+    # Check that all unique tasks are in the tasks metadata
+    stored_tasks = set(loaded_dataset.meta.tasks.index)
+    assert stored_tasks == set(unique_tasks), f"Stored tasks {stored_tasks} != expected {set(unique_tasks)}"
+
+    # Check that task indices are consistent
+    cumulative = 0
+    for episode_idx, expected_task in enumerate(tasks):
+        episode_metadata = loaded_dataset.meta.episodes[episode_idx]
+        assert episode_metadata["tasks"] == [expected_task]
+
+        # Check frames in this episode have correct task
+        for i in range(frames_per_episode[episode_idx]):
+            frame = loaded_dataset[cumulative + i]
+            assert frame["task"] == expected_task, f"Frame {cumulative + i} has wrong task"
+
+            # Check task_index consistency
+            expected_task_index = loaded_dataset.meta.get_task_index(expected_task)
+            assert frame["task_index"].item() == expected_task_index
+
+        cumulative += frames_per_episode[episode_idx]
+
+    # Check total number of tasks
+    assert loaded_dataset.meta.total_tasks == len(unique_tasks)
--- a/tests/datasets/test_delta_timestamps.py
+++ b/tests/datasets/test_delta_timestamps.py
@@ -11,83 +11,15 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from itertools import accumulate
-
-import datasets
-import numpy as np
-import pyarrow.compute as pc
 import pytest
-import torch

 from lerobot.datasets.utils import (
    check_delta_timestamps,
-    check_timestamps_sync,
    get_delta_indices,
 )
 from tests.fixtures.constants import DUMMY_MOTOR_FEATURES


-def calculate_total_episode(
-    hf_dataset: datasets.Dataset, raise_if_not_contiguous: bool = True
-) -> dict[str, torch.Tensor]:
-    episode_indices = sorted(hf_dataset.unique("episode_index"))
-    total_episodes = len(episode_indices)
-    if raise_if_not_contiguous and episode_indices != list(range(total_episodes)):
-        raise ValueError("episode_index values are not sorted and contiguous.")
-    return total_episodes
-
-
-def calculate_episode_data_index(hf_dataset: datasets.Dataset) -> dict[str, np.ndarray]:
-    episode_lengths = []
-    table = hf_dataset.data.table
-    total_episodes = calculate_total_episode(hf_dataset)
-    for ep_idx in range(total_episodes):
-        ep_table = table.filter(pc.equal(table["episode_index"], ep_idx))
-        episode_lengths.insert(ep_idx, len(ep_table))
-
-    cumulative_lengths = list(accumulate(episode_lengths))
-    return {
-        "from": np.array([0] + cumulative_lengths[:-1], dtype=np.int64),
-        "to": np.array(cumulative_lengths, dtype=np.int64),
-    }
-
-
-@pytest.fixture(scope="module")
-def synced_timestamps_factory(hf_dataset_factory):
-    def _create_synced_timestamps(fps: int = 30) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
-        hf_dataset = hf_dataset_factory(fps=fps)
-        timestamps = torch.stack(hf_dataset["timestamp"]).numpy()
-        episode_indices = torch.stack(hf_dataset["episode_index"]).numpy()
-        episode_data_index = calculate_episode_data_index(hf_dataset)
-        return timestamps, episode_indices, episode_data_index
-
-    return _create_synced_timestamps
-
-
-@pytest.fixture(scope="module")
-def unsynced_timestamps_factory(synced_timestamps_factory):
-    def _create_unsynced_timestamps(
-        fps: int = 30, tolerance_s: float = 1e-4
-    ) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
-        timestamps, episode_indices, episode_data_index = synced_timestamps_factory(fps=fps)
-        timestamps[30] += tolerance_s * 1.1  # Modify a single timestamp just outside tolerance
-        return timestamps, episode_indices, episode_data_index
-
-    return _create_unsynced_timestamps
-
-
-@pytest.fixture(scope="module")
-def slightly_off_timestamps_factory(synced_timestamps_factory):
-    def _create_slightly_off_timestamps(
-        fps: int = 30, tolerance_s: float = 1e-4
-    ) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
-        timestamps, episode_indices, episode_data_index = synced_timestamps_factory(fps=fps)
-        timestamps[30] += tolerance_s * 0.9  # Modify a single timestamp just inside tolerance
-        return timestamps, episode_indices, episode_data_index
-
-    return _create_slightly_off_timestamps
-
-
@pytest.fixture(scope="module")
 def valid_delta_timestamps_factory():
    def _create_valid_delta_timestamps(
@@ -136,78 +68,6 @@ def delta_indices_factory():
    return _delta_indices


-def test_check_timestamps_sync_synced(synced_timestamps_factory):
-    fps = 30
-    tolerance_s = 1e-4
-    timestamps, ep_idx, ep_data_index = synced_timestamps_factory(fps)
-    result = check_timestamps_sync(
-        timestamps=timestamps,
-        episode_indices=ep_idx,
-        episode_data_index=ep_data_index,
-        fps=fps,
-        tolerance_s=tolerance_s,
-    )
-    assert result is True
-
-
-def test_check_timestamps_sync_unsynced(unsynced_timestamps_factory):
-    fps = 30
-    tolerance_s = 1e-4
-    timestamps, ep_idx, ep_data_index = unsynced_timestamps_factory(fps, tolerance_s)
-    with pytest.raises(ValueError):
-        check_timestamps_sync(
-            timestamps=timestamps,
-            episode_indices=ep_idx,
-            episode_data_index=ep_data_index,
-            fps=fps,
-            tolerance_s=tolerance_s,
-        )
-
-
-def test_check_timestamps_sync_unsynced_no_exception(unsynced_timestamps_factory):
-    fps = 30
-    tolerance_s = 1e-4
-    timestamps, ep_idx, ep_data_index = unsynced_timestamps_factory(fps, tolerance_s)
-    result = check_timestamps_sync(
-        timestamps=timestamps,
-        episode_indices=ep_idx,
-        episode_data_index=ep_data_index,
-        fps=fps,
-        tolerance_s=tolerance_s,
-        raise_value_error=False,
-    )
-    assert result is False
-
-
-def test_check_timestamps_sync_slightly_off(slightly_off_timestamps_factory):
-    fps = 30
-    tolerance_s = 1e-4
-    timestamps, ep_idx, ep_data_index = slightly_off_timestamps_factory(fps, tolerance_s)
-    result = check_timestamps_sync(
-        timestamps=timestamps,
-        episode_indices=ep_idx,
-        episode_data_index=ep_data_index,
-        fps=fps,
-        tolerance_s=tolerance_s,
-    )
-    assert result is True
-
-
-def test_check_timestamps_sync_single_timestamp():
-    fps = 30
-    tolerance_s = 1e-4
-    timestamps, ep_idx = np.array([0.0]), np.array([0])
-    episode_data_index = {"to": np.array([1]), "from": np.array([0])}
-    result = check_timestamps_sync(
-        timestamps=timestamps,
-        episode_indices=ep_idx,
-        episode_data_index=episode_data_index,
-        fps=fps,
-        tolerance_s=tolerance_s,
-    )
-    assert result is True
-
-
 def test_check_delta_timestamps_valid(valid_delta_timestamps_factory):
    fps = 30
    tolerance_s = 1e-4
--- a/tests/datasets/test_sampler.py
+++ b/tests/datasets/test_sampler.py
@@ -32,7 +32,7 @@ def test_drop_n_first_frames():
    )
    dataset.set_transform(hf_transform_to_torch)
    episode_data_index = calculate_episode_data_index(dataset)
-    sampler = EpisodeAwareSampler(episode_data_index, drop_n_first_frames=1)
+    sampler = EpisodeAwareSampler(episode_data_index["from"], episode_data_index["to"], drop_n_first_frames=1)
    assert sampler.indices == [1, 4, 5]
    assert len(sampler) == 3
    assert list(sampler) == [1, 4, 5]
@@ -48,7 +48,7 @@ def test_drop_n_last_frames():
    )
    dataset.set_transform(hf_transform_to_torch)
    episode_data_index = calculate_episode_data_index(dataset)
-    sampler = EpisodeAwareSampler(episode_data_index, drop_n_last_frames=1)
+    sampler = EpisodeAwareSampler(episode_data_index["from"], episode_data_index["to"], drop_n_last_frames=1)
    assert sampler.indices == [0, 3, 4]
    assert len(sampler) == 3
    assert list(sampler) == [0, 3, 4]
@@ -64,7 +64,9 @@ def test_episode_indices_to_use():
    )
    dataset.set_transform(hf_transform_to_torch)
    episode_data_index = calculate_episode_data_index(dataset)
-    sampler = EpisodeAwareSampler(episode_data_index, episode_indices_to_use=[0, 2])
+    sampler = EpisodeAwareSampler(
+        episode_data_index["from"], episode_data_index["to"], episode_indices_to_use=[0, 2]
+    )
    assert sampler.indices == [0, 1, 3, 4, 5]
    assert len(sampler) == 5
    assert list(sampler) == [0, 1, 3, 4, 5]
@@ -80,11 +82,11 @@ def test_shuffle():
    )
    dataset.set_transform(hf_transform_to_torch)
    episode_data_index = calculate_episode_data_index(dataset)
-    sampler = EpisodeAwareSampler(episode_data_index, shuffle=False)
+    sampler = EpisodeAwareSampler(episode_data_index["from"], episode_data_index["to"], shuffle=False)
    assert sampler.indices == [0, 1, 2, 3, 4, 5]
    assert len(sampler) == 6
    assert list(sampler) == [0, 1, 2, 3, 4, 5]
-    sampler = EpisodeAwareSampler(episode_data_index, shuffle=True)
+    sampler = EpisodeAwareSampler(episode_data_index["from"], episode_data_index["to"], shuffle=True)
    assert sampler.indices == [0, 1, 2, 3, 4, 5]
    assert len(sampler) == 6
    assert set(sampler) == {0, 1, 2, 3, 4, 5}
--- a/tests/datasets/test_utils.py
+++ b/tests/datasets/test_utils.py
@@ -14,12 +14,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+import json
+from copy import deepcopy
+
 import torch
 from datasets import Dataset
 from huggingface_hub import DatasetCard

 from lerobot.datasets.push_dataset_to_hub.utils import calculate_episode_data_index
-from lerobot.datasets.utils import create_lerobot_dataset_card, hf_transform_to_torch
+from lerobot.datasets.utils import (
+    create_lerobot_dataset_card,
+    flatten_dict,
+    hf_transform_to_torch,
+    unflatten_dict,
+)


 def test_default_parameters():
@@ -53,3 +61,26 @@ def test_calculate_episode_data_index():
    episode_data_index = calculate_episode_data_index(dataset)
    assert torch.equal(episode_data_index["from"], torch.tensor([0, 2, 3]))
    assert torch.equal(episode_data_index["to"], torch.tensor([2, 3, 6]))
+
+
+def test_flatten_unflatten_dict():
+    d = {
+        "obs": {
+            "min": 0,
+            "max": 1,
+            "mean": 2,
+            "std": 3,
+        },
+        "action": {
+            "min": 4,
+            "max": 5,
+            "mean": 6,
+            "std": 7,
+        },
+    }
+
+    original_d = deepcopy(d)
+    d = unflatten_dict(flatten_dict(d))
+
+    # test equality between nested dicts
+    assert json.dumps(original_d, sort_keys=True) == json.dumps(d, sort_keys=True), f"{original_d} != {d}"
--- a/tests/fixtures/constants.py
+++ b/tests/fixtures/constants.py
@@ -29,8 +29,8 @@ DUMMY_MOTOR_FEATURES = {
    },
 }
 DUMMY_CAMERA_FEATURES = {
-    "laptop": {"shape": (480, 640, 3), "names": ["height", "width", "channels"], "info": None},
-    "phone": {"shape": (480, 640, 3), "names": ["height", "width", "channels"], "info": None},
+    "laptop": {"shape": (64, 96, 3), "names": ["height", "width", "channels"], "info": None},
+    "phone": {"shape": (64, 96, 3), "names": ["height", "width", "channels"], "info": None},
 }
 DEFAULT_FPS = 30
 DUMMY_VIDEO_INFO = {
--- a/tests/fixtures/dataset_factories.py
+++ b/tests/fixtures/dataset_factories.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import random
+import shutil
 from functools import partial
 from pathlib import Path
 from typing import Protocol
@@ -19,19 +20,25 @@ from unittest.mock import patch

 import datasets
 import numpy as np
+import pandas as pd
 import PIL.Image
 import pytest
 import torch
+from datasets import Dataset

 from lerobot.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset, LeRobotDatasetMetadata
 from lerobot.datasets.utils import (
    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_DATA_PATH,
    DEFAULT_FEATURES,
-    DEFAULT_PARQUET_PATH,
+    DEFAULT_VIDEO_FILE_SIZE_IN_MB,
    DEFAULT_VIDEO_PATH,
+    flatten_dict,
    get_hf_features_from_features,
    hf_transform_to_torch,
 )
+from lerobot.datasets.video_utils import encode_video_frames
 from tests.fixtures.constants import (
    DEFAULT_FPS,
    DUMMY_CAMERA_FEATURES,
@@ -46,10 +53,9 @@ class LeRobotDatasetFactory(Protocol):
    def __call__(self, *args, **kwargs) -> LeRobotDataset: ...


-def get_task_index(task_dicts: dict, task: str) -> int:
-    tasks = {d["task_index"]: d["task"] for d in task_dicts.values()}
-    task_to_task_index = {task: task_idx for task_idx, task in tasks.items()}
-    return task_to_task_index[task]
+def get_task_index(tasks: datasets.Dataset, task: str) -> int:
+    task_idx = tasks.loc[task].task_index.item()
+    return task_idx


@pytest.fixture(scope="session")
@@ -62,15 +68,49 @@ def img_tensor_factory():

@pytest.fixture(scope="session")
 def img_array_factory():
-    def _create_img_array(height=100, width=100, channels=3, dtype=np.uint8) -> np.ndarray:
-        if np.issubdtype(dtype, np.unsignedinteger):
-            # Int array in [0, 255] range
-            img_array = np.random.randint(0, 256, size=(height, width, channels), dtype=dtype)
-        elif np.issubdtype(dtype, np.floating):
-            # Float array in [0, 1] range
-            img_array = np.random.rand(height, width, channels).astype(dtype)
+    def _create_img_array(height=100, width=100, channels=3, dtype=np.uint8, content=None) -> np.ndarray:
+        if content is None:
+            # Original random noise behavior
+            if np.issubdtype(dtype, np.unsignedinteger):
+                # Int array in [0, 255] range
+                img_array = np.random.randint(0, 256, size=(height, width, channels), dtype=dtype)
+            elif np.issubdtype(dtype, np.floating):
+                # Float array in [0, 1] range
+                img_array = np.random.rand(height, width, channels).astype(dtype)
+            else:
+                raise ValueError(dtype)
        else:
-            raise ValueError(dtype)
+            # Create image with text content using OpenCV
+            import cv2
+
+            # Create white background
+            img_array = np.ones((height, width, channels), dtype=np.uint8) * 255
+
+            # Font settings
+            font = cv2.FONT_HERSHEY_SIMPLEX
+            font_scale = max(0.5, height / 200)  # Scale font with image size
+            font_color = (0, 0, 0)  # Black text
+            thickness = max(1, int(height / 100))
+
+            # Get text size to center it
+            text_size = cv2.getTextSize(content, font, font_scale, thickness)[0]
+            text_x = (width - text_size[0]) // 2
+            text_y = (height + text_size[1]) // 2
+
+            # Put text on image
+            cv2.putText(img_array, content, (text_x, text_y), font, font_scale, font_color, thickness)
+
+            # Handle single channel case
+            if channels == 1:
+                img_array = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
+                img_array = img_array[:, :, np.newaxis]
+
+            # Convert to target dtype
+            if np.issubdtype(dtype, np.floating):
+                img_array = img_array.astype(dtype) / 255.0
+            else:
+                img_array = img_array.astype(dtype)
+
        return img_array

    return _create_img_array
@@ -117,9 +157,10 @@ def info_factory(features_factory):
        total_frames: int = 0,
        total_tasks: int = 0,
        total_videos: int = 0,
-        total_chunks: int = 0,
        chunks_size: int = DEFAULT_CHUNK_SIZE,
-        data_path: str = DEFAULT_PARQUET_PATH,
+        data_files_size_in_mb: float = DEFAULT_DATA_FILE_SIZE_IN_MB,
+        video_files_size_in_mb: float = DEFAULT_VIDEO_FILE_SIZE_IN_MB,
+        data_path: str = DEFAULT_DATA_PATH,
        video_path: str = DEFAULT_VIDEO_PATH,
        motor_features: dict = DUMMY_MOTOR_FEATURES,
        camera_features: dict = DUMMY_CAMERA_FEATURES,
@@ -133,8 +174,9 @@ def info_factory(features_factory):
            "total_frames": total_frames,
            "total_tasks": total_tasks,
            "total_videos": total_videos,
-            "total_chunks": total_chunks,
            "chunks_size": chunks_size,
+            "data_files_size_in_mb": data_files_size_in_mb,
+            "video_files_size_in_mb": video_files_size_in_mb,
            "fps": fps,
            "splits": {},
            "data_path": data_path,
@@ -175,41 +217,26 @@ def stats_factory():
    return _create_stats


-@pytest.fixture(scope="session")
-def episodes_stats_factory(stats_factory):
-    def _create_episodes_stats(
-        features: dict[str],
-        total_episodes: int = 3,
-    ) -> dict:
-        episodes_stats = {}
-        for episode_index in range(total_episodes):
-            episodes_stats[episode_index] = {
-                "episode_index": episode_index,
-                "stats": stats_factory(features),
-            }
-        return episodes_stats
-
-    return _create_episodes_stats
-
-
@pytest.fixture(scope="session")
 def tasks_factory():
-    def _create_tasks(total_tasks: int = 3) -> int:
-        tasks = {}
-        for task_index in range(total_tasks):
-            task_dict = {"task_index": task_index, "task": f"Perform action {task_index}."}
-            tasks[task_index] = task_dict
-        return tasks
+    def _create_tasks(total_tasks: int = 3) -> pd.DataFrame:
+        ids = list(range(total_tasks))
+        tasks = [f"Perform action {i}." for i in ids]
+        df = pd.DataFrame({"task_index": ids}, index=tasks)
+        return df

    return _create_tasks


@pytest.fixture(scope="session")
-def episodes_factory(tasks_factory):
+def episodes_factory(tasks_factory, stats_factory):
    def _create_episodes(
+        features: dict[str],
+        fps: int = DEFAULT_FPS,
        total_episodes: int = 3,
        total_frames: int = 400,
-        tasks: dict | None = None,
+        video_keys: list[str] | None = None,
+        tasks: pd.DataFrame | None = None,
        multi_task: bool = False,
    ):
        if total_episodes <= 0 or total_frames <= 0:
@@ -217,66 +244,142 @@ def episodes_factory(tasks_factory):
        if total_frames < total_episodes:
            raise ValueError("total_length must be greater than or equal to num_episodes.")

-        if not tasks:
+        if tasks is None:
            min_tasks = 2 if multi_task else 1
            total_tasks = random.randint(min_tasks, total_episodes)
            tasks = tasks_factory(total_tasks)

-        if total_episodes < len(tasks) and not multi_task:
+        num_tasks_available = len(tasks)
+
+        if total_episodes < num_tasks_available and not multi_task:
            raise ValueError("The number of tasks should be less than the number of episodes.")

        # Generate random lengths that sum up to total_length
        lengths = np.random.multinomial(total_frames, [1 / total_episodes] * total_episodes).tolist()

-        tasks_list = [task_dict["task"] for task_dict in tasks.values()]
-        num_tasks_available = len(tasks_list)
+        # Create empty dictionaries with all keys
+        d = {
+            "episode_index": [],
+            "meta/episodes/chunk_index": [],
+            "meta/episodes/file_index": [],
+            "data/chunk_index": [],
+            "data/file_index": [],
+            "dataset_from_index": [],
+            "dataset_to_index": [],
+            "tasks": [],
+            "length": [],
+        }
+        if video_keys is not None:
+            for video_key in video_keys:
+                d[f"videos/{video_key}/chunk_index"] = []
+                d[f"videos/{video_key}/file_index"] = []
+                d[f"videos/{video_key}/from_timestamp"] = []
+                d[f"videos/{video_key}/to_timestamp"] = []

-        episodes = {}
-        remaining_tasks = tasks_list.copy()
+        for stats_key in flatten_dict({"stats": stats_factory(features)}):
+            d[stats_key] = []
+
+        num_frames = 0
+        remaining_tasks = list(tasks.index)
        for ep_idx in range(total_episodes):
            num_tasks_in_episode = random.randint(1, min(3, num_tasks_available)) if multi_task else 1
-            tasks_to_sample = remaining_tasks if remaining_tasks else tasks_list
+            tasks_to_sample = remaining_tasks if len(remaining_tasks) > 0 else list(tasks.index)
            episode_tasks = random.sample(tasks_to_sample, min(num_tasks_in_episode, len(tasks_to_sample)))
            if remaining_tasks:
                for task in episode_tasks:
                    remaining_tasks.remove(task)

-            episodes[ep_idx] = {
-                "episode_index": ep_idx,
-                "tasks": episode_tasks,
-                "length": lengths[ep_idx],
-            }
+            d["episode_index"].append(ep_idx)
+            # TODO(rcadene): remove heuristic of only one file
+            d["meta/episodes/chunk_index"].append(0)
+            d["meta/episodes/file_index"].append(0)
+            d["data/chunk_index"].append(0)
+            d["data/file_index"].append(0)
+            d["dataset_from_index"].append(num_frames)
+            d["dataset_to_index"].append(num_frames + lengths[ep_idx])
+            d["tasks"].append(episode_tasks)
+            d["length"].append(lengths[ep_idx])

-        return episodes
+            if video_keys is not None:
+                for video_key in video_keys:
+                    d[f"videos/{video_key}/chunk_index"].append(0)
+                    d[f"videos/{video_key}/file_index"].append(0)
+                    d[f"videos/{video_key}/from_timestamp"].append(num_frames / fps)
+                    d[f"videos/{video_key}/to_timestamp"].append((num_frames + lengths[ep_idx]) / fps)
+
+            # Add stats columns like "stats/action/max"
+            for stats_key, stats in flatten_dict({"stats": stats_factory(features)}).items():
+                d[stats_key].append(stats)
+
+            num_frames += lengths[ep_idx]
+
+        return Dataset.from_dict(d)

    return _create_episodes


+@pytest.fixture(scope="session")
+def create_videos(info_factory, img_array_factory):
+    def _create_video_directory(
+        root: Path,
+        info: dict | None = None,
+        total_episodes: int = 3,
+        total_frames: int = 150,
+        total_tasks: int = 1,
+    ):
+        if info is None:
+            info = info_factory(
+                total_episodes=total_episodes, total_frames=total_frames, total_tasks=total_tasks
+            )
+
+        video_feats = {key: feats for key, feats in info["features"].items() if feats["dtype"] == "video"}
+        for key, ft in video_feats.items():
+            # create and save images with identifiable content
+            tmp_dir = root / "tmp_images"
+            tmp_dir.mkdir(parents=True, exist_ok=True)
+            for frame_index in range(info["total_frames"]):
+                content = f"{key}-{frame_index}"
+                img = img_array_factory(height=ft["shape"][0], width=ft["shape"][1], content=content)
+                pil_img = PIL.Image.fromarray(img)
+                path = tmp_dir / f"frame-{frame_index:06d}.png"
+                pil_img.save(path)
+
+            video_path = root / DEFAULT_VIDEO_PATH.format(video_key=key, chunk_index=0, file_index=0)
+            video_path.parent.mkdir(parents=True, exist_ok=True)
+            # Use the global fps from info, not video-specific fps which might not exist
+            encode_video_frames(tmp_dir, video_path, fps=info["fps"])
+            shutil.rmtree(tmp_dir)
+
+    return _create_video_directory
+
+
@pytest.fixture(scope="session")
 def hf_dataset_factory(features_factory, tasks_factory, episodes_factory, img_array_factory):
    def _create_hf_dataset(
        features: dict | None = None,
-        tasks: list[dict] | None = None,
-        episodes: list[dict] | None = None,
+        tasks: pd.DataFrame | None = None,
+        episodes: datasets.Dataset | None = None,
        fps: int = DEFAULT_FPS,
    ) -> datasets.Dataset:
-        if not tasks:
+        if tasks is None:
            tasks = tasks_factory()
-        if not episodes:
-            episodes = episodes_factory()
-        if not features:
+        if features is None:
            features = features_factory()
+        if episodes is None:
+            episodes = episodes_factory(features, fps)

        timestamp_col = np.array([], dtype=np.float32)
        frame_index_col = np.array([], dtype=np.int64)
        episode_index_col = np.array([], dtype=np.int64)
        task_index = np.array([], dtype=np.int64)
-        for ep_dict in episodes.values():
+        for ep_dict in episodes:
            timestamp_col = np.concatenate((timestamp_col, np.arange(ep_dict["length"]) / fps))
            frame_index_col = np.concatenate((frame_index_col, np.arange(ep_dict["length"], dtype=int)))
            episode_index_col = np.concatenate(
                (episode_index_col, np.full(ep_dict["length"], ep_dict["episode_index"], dtype=int))
            )
+            # Slightly incorrect, but for simplicity, we assign to all frames the first task defined in the episode metadata.
+            # TODO(rcadene): assign the tasks of the episode per chunks of frames
            ep_task_index = get_task_index(tasks, ep_dict["tasks"][0])
            task_index = np.concatenate((task_index, np.full(ep_dict["length"], ep_task_index, dtype=int)))

@@ -286,8 +389,8 @@ def hf_dataset_factory(features_factory, tasks_factory, episodes_factory, img_ar
        for key, ft in features.items():
            if ft["dtype"] == "image":
                robot_cols[key] = [
-                    img_array_factory(height=ft["shapes"][1], width=ft["shapes"][0])
-                    for _ in range(len(index_col))
+                    img_array_factory(height=ft["shape"][1], width=ft["shape"][0], content=f"{key}-{i}")
+                    for i in range(len(index_col))
                ]
            elif ft["shape"][0] > 1 and ft["dtype"] != "video":
                robot_cols[key] = np.random.random((len(index_col), ft["shape"][0])).astype(ft["dtype"])
@@ -314,7 +417,6 @@ def hf_dataset_factory(features_factory, tasks_factory, episodes_factory, img_ar
 def lerobot_dataset_metadata_factory(
    info_factory,
    stats_factory,
-    episodes_stats_factory,
    tasks_factory,
    episodes_factory,
    mock_snapshot_download_factory,
@@ -324,29 +426,29 @@ def lerobot_dataset_metadata_factory(
        repo_id: str = DUMMY_REPO_ID,
        info: dict | None = None,
        stats: dict | None = None,
-        episodes_stats: list[dict] | None = None,
-        tasks: list[dict] | None = None,
-        episodes: list[dict] | None = None,
+        tasks: pd.DataFrame | None = None,
+        episodes: datasets.Dataset | None = None,
    ) -> LeRobotDatasetMetadata:
-        if not info:
+        if info is None:
            info = info_factory()
-        if not stats:
+        if stats is None:
            stats = stats_factory(features=info["features"])
-        if not episodes_stats:
-            episodes_stats = episodes_stats_factory(
-                features=info["features"], total_episodes=info["total_episodes"]
-            )
-        if not tasks:
+        if tasks is None:
            tasks = tasks_factory(total_tasks=info["total_tasks"])
-        if not episodes:
+        if episodes is None:
+            video_keys = [key for key, ft in info["features"].items() if ft["dtype"] == "video"]
            episodes = episodes_factory(
-                total_episodes=info["total_episodes"], total_frames=info["total_frames"], tasks=tasks
+                features=info["features"],
+                fps=info["fps"],
+                total_episodes=info["total_episodes"],
+                total_frames=info["total_frames"],
+                video_keys=video_keys,
+                tasks=tasks,
            )

        mock_snapshot_download = mock_snapshot_download_factory(
            info=info,
            stats=stats,
-            episodes_stats=episodes_stats,
            tasks=tasks,
            episodes=episodes,
        )
@@ -366,7 +468,6 @@ def lerobot_dataset_metadata_factory(
 def lerobot_dataset_factory(
    info_factory,
    stats_factory,
-    episodes_stats_factory,
    tasks_factory,
    episodes_factory,
    hf_dataset_factory,
@@ -380,50 +481,63 @@ def lerobot_dataset_factory(
        total_frames: int = 150,
        total_tasks: int = 1,
        multi_task: bool = False,
+        use_videos: bool = True,
        info: dict | None = None,
        stats: dict | None = None,
-        episodes_stats: list[dict] | None = None,
-        tasks: list[dict] | None = None,
-        episode_dicts: list[dict] | None = None,
+        tasks: pd.DataFrame | None = None,
+        episodes_metadata: datasets.Dataset | None = None,
        hf_dataset: datasets.Dataset | None = None,
+        data_files_size_in_mb: float = DEFAULT_DATA_FILE_SIZE_IN_MB,
+        chunks_size: int = DEFAULT_CHUNK_SIZE,
        **kwargs,
    ) -> LeRobotDataset:
-        if not info:
+        # Instantiate objects
+        if info is None:
            info = info_factory(
-                total_episodes=total_episodes, total_frames=total_frames, total_tasks=total_tasks
+                total_episodes=total_episodes,
+                total_frames=total_frames,
+                total_tasks=total_tasks,
+                use_videos=use_videos,
+                data_files_size_in_mb=data_files_size_in_mb,
+                chunks_size=chunks_size,
            )
-        if not stats:
+        if stats is None:
            stats = stats_factory(features=info["features"])
-        if not episodes_stats:
-            episodes_stats = episodes_stats_factory(features=info["features"], total_episodes=total_episodes)
-        if not tasks:
+        if tasks is None:
            tasks = tasks_factory(total_tasks=info["total_tasks"])
-        if not episode_dicts:
-            episode_dicts = episodes_factory(
+        if episodes_metadata is None:
+            video_keys = [key for key, ft in info["features"].items() if ft["dtype"] == "video"]
+            episodes_metadata = episodes_factory(
+                features=info["features"],
+                fps=info["fps"],
                total_episodes=info["total_episodes"],
                total_frames=info["total_frames"],
+                video_keys=video_keys,
                tasks=tasks,
                multi_task=multi_task,
            )
-        if not hf_dataset:
-            hf_dataset = hf_dataset_factory(tasks=tasks, episodes=episode_dicts, fps=info["fps"])
+        if hf_dataset is None:
+            hf_dataset = hf_dataset_factory(
+                features=info["features"], tasks=tasks, episodes=episodes_metadata, fps=info["fps"]
+            )

+        # Write data on disk
        mock_snapshot_download = mock_snapshot_download_factory(
            info=info,
            stats=stats,
-            episodes_stats=episodes_stats,
            tasks=tasks,
-            episodes=episode_dicts,
+            episodes=episodes_metadata,
            hf_dataset=hf_dataset,
+            data_files_size_in_mb=data_files_size_in_mb,
+            chunks_size=chunks_size,
        )
        mock_metadata = lerobot_dataset_metadata_factory(
            root=root,
            repo_id=repo_id,
            info=info,
            stats=stats,
-            episodes_stats=episodes_stats,
            tasks=tasks,
-            episodes=episode_dicts,
+            episodes=episodes_metadata,
        )
        with (
            patch("lerobot.datasets.lerobot_dataset.LeRobotDatasetMetadata") as mock_metadata_patch,
--- a/tests/fixtures/files.py
+++ b/tests/fixtures/files.py
@@ -11,137 +11,166 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import json
+import logging
 from pathlib import Path

 import datasets
-import jsonlines
-import pyarrow.compute as pc
-import pyarrow.parquet as pq
+import numpy as np
+import pandas as pd
 import pytest
+from datasets import Dataset

 from lerobot.datasets.utils import (
-    EPISODES_PATH,
-    EPISODES_STATS_PATH,
-    INFO_PATH,
-    STATS_PATH,
-    TASKS_PATH,
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_DATA_PATH,
+    get_hf_dataset_size_in_mb,
+    update_chunk_file_indices,
+    write_episodes,
+    write_info,
+    write_stats,
+    write_tasks,
 )


+def write_hf_dataset(
+    hf_dataset: Dataset,
+    local_dir: Path,
+    data_file_size_mb: float | None = None,
+    chunk_size: int | None = None,
+):
+    """
+    Writes a Hugging Face Dataset to one or more Parquet files in a structured directory format.
+
+    If the dataset size is within `DEFAULT_DATA_FILE_SIZE_IN_MB`, it's saved as a single file.
+    Otherwise, the dataset is split into multiple smaller Parquet files, each not exceeding the size limit.
+    The file and chunk indices are managed to organize the output files in a hierarchical structure,
+    e.g., `data/chunk-000/file-000.parquet`, `data/chunk-000/file-001.parquet`, etc.
+    This function ensures that episodes are not split across multiple files.
+
+    Args:
+        hf_dataset (Dataset): The Hugging Face Dataset to be written to disk.
+        local_dir (Path): The root directory where the dataset files will be stored.
+        data_file_size_mb (float, optional): Maximal size for the parquet data file, in MB. Defaults to DEFAULT_DATA_FILE_SIZE_IN_MB.
+        chunk_size (int, optional): Maximal number of files within a chunk folder before creating another one. Defaults to DEFAULT_CHUNK_SIZE.
+    """
+    if data_file_size_mb is None:
+        data_file_size_mb = DEFAULT_DATA_FILE_SIZE_IN_MB
+    if chunk_size is None:
+        chunk_size = DEFAULT_CHUNK_SIZE
+
+    dataset_size_in_mb = get_hf_dataset_size_in_mb(hf_dataset)
+
+    if dataset_size_in_mb <= data_file_size_mb:
+        # If the dataset is small enough, write it to a single file.
+        path = local_dir / DEFAULT_DATA_PATH.format(chunk_index=0, file_index=0)
+        path.parent.mkdir(parents=True, exist_ok=True)
+        hf_dataset.to_parquet(path)
+        return
+
+    # If the dataset is too large, split it into smaller chunks, keeping episodes whole.
+    episode_indices = np.array(hf_dataset["episode_index"])
+    episode_boundaries = np.where(np.diff(episode_indices) != 0)[0] + 1
+    episode_starts = np.concatenate(([0], episode_boundaries))
+    episode_ends = np.concatenate((episode_boundaries, [len(hf_dataset)]))
+
+    num_episodes = len(episode_starts)
+    current_episode_idx = 0
+    chunk_idx, file_idx = 0, 0
+
+    while current_episode_idx < num_episodes:
+        shard_start_row = episode_starts[current_episode_idx]
+        shard_end_row = episode_ends[current_episode_idx]
+        next_episode_to_try_idx = current_episode_idx + 1
+
+        while next_episode_to_try_idx < num_episodes:
+            potential_shard_end_row = episode_ends[next_episode_to_try_idx]
+            dataset_shard_candidate = hf_dataset.select(range(shard_start_row, potential_shard_end_row))
+            shard_size_mb = get_hf_dataset_size_in_mb(dataset_shard_candidate)
+
+            if shard_size_mb > data_file_size_mb:
+                break
+            else:
+                shard_end_row = potential_shard_end_row
+                next_episode_to_try_idx += 1
+
+        dataset_shard = hf_dataset.select(range(shard_start_row, shard_end_row))
+
+        if (
+            shard_start_row == episode_starts[current_episode_idx]
+            and shard_end_row == episode_ends[current_episode_idx]
+        ):
+            shard_size_mb = get_hf_dataset_size_in_mb(dataset_shard)
+            if shard_size_mb > data_file_size_mb:
+                logging.warning(
+                    f"Episode with index {hf_dataset[shard_start_row.item()]['episode_index']} has size {shard_size_mb:.2f}MB, "
+                    f"which is larger than data_file_size_mb ({data_file_size_mb}MB). "
+                    "Writing it to a separate shard anyway to preserve episode integrity."
+                )
+
+        # Define the path for the current shard and ensure the directory exists.
+        path = local_dir / DEFAULT_DATA_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
+        path.parent.mkdir(parents=True, exist_ok=True)
+
+        # Write the shard to a Parquet file.
+        dataset_shard.to_parquet(path)
+
+        # Update chunk and file indices for the next iteration.
+        chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, chunk_size)
+        current_episode_idx = next_episode_to_try_idx
+
+
@pytest.fixture(scope="session")
-def info_path(info_factory):
-    def _create_info_json_file(dir: Path, info: dict | None = None) -> Path:
-        if not info:
+def create_info(info_factory):
+    def _create_info(dir: Path, info: dict | None = None):
+        if info is None:
            info = info_factory()
-        fpath = dir / INFO_PATH
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        with open(fpath, "w") as f:
-            json.dump(info, f, indent=4, ensure_ascii=False)
-        return fpath
+        write_info(info, dir)

-    return _create_info_json_file
+    return _create_info


@pytest.fixture(scope="session")
-def stats_path(stats_factory):
-    def _create_stats_json_file(dir: Path, stats: dict | None = None) -> Path:
-        if not stats:
+def create_stats(stats_factory):
+    def _create_stats(dir: Path, stats: dict | None = None):
+        if stats is None:
            stats = stats_factory()
-        fpath = dir / STATS_PATH
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        with open(fpath, "w") as f:
-            json.dump(stats, f, indent=4, ensure_ascii=False)
-        return fpath
+        write_stats(stats, dir)

-    return _create_stats_json_file
+    return _create_stats


@pytest.fixture(scope="session")
-def episodes_stats_path(episodes_stats_factory):
-    def _create_episodes_stats_jsonl_file(dir: Path, episodes_stats: list[dict] | None = None) -> Path:
-        if not episodes_stats:
-            episodes_stats = episodes_stats_factory()
-        fpath = dir / EPISODES_STATS_PATH
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        with jsonlines.open(fpath, "w") as writer:
-            writer.write_all(episodes_stats.values())
-        return fpath
-
-    return _create_episodes_stats_jsonl_file
-
-
-@pytest.fixture(scope="session")
-def tasks_path(tasks_factory):
-    def _create_tasks_jsonl_file(dir: Path, tasks: list | None = None) -> Path:
-        if not tasks:
+def create_tasks(tasks_factory):
+    def _create_tasks(dir: Path, tasks: pd.DataFrame | None = None):
+        if tasks is None:
            tasks = tasks_factory()
-        fpath = dir / TASKS_PATH
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        with jsonlines.open(fpath, "w") as writer:
-            writer.write_all(tasks.values())
-        return fpath
+        write_tasks(tasks, dir)

-    return _create_tasks_jsonl_file
+    return _create_tasks


@pytest.fixture(scope="session")
-def episode_path(episodes_factory):
-    def _create_episodes_jsonl_file(dir: Path, episodes: list | None = None) -> Path:
-        if not episodes:
+def create_episodes(episodes_factory):
+    def _create_episodes(dir: Path, episodes: datasets.Dataset | None = None):
+        if episodes is None:
+            # TODO(rcadene): add features, fps as arguments
            episodes = episodes_factory()
-        fpath = dir / EPISODES_PATH
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        with jsonlines.open(fpath, "w") as writer:
-            writer.write_all(episodes.values())
-        return fpath
+        write_episodes(episodes, dir)

-    return _create_episodes_jsonl_file
+    return _create_episodes


@pytest.fixture(scope="session")
-def single_episode_parquet_path(hf_dataset_factory, info_factory):
-    def _create_single_episode_parquet(
-        dir: Path, ep_idx: int = 0, hf_dataset: datasets.Dataset | None = None, info: dict | None = None
-    ) -> Path:
-        if not info:
-            info = info_factory()
+def create_hf_dataset(hf_dataset_factory):
+    def _create_hf_dataset(
+        dir: Path,
+        hf_dataset: datasets.Dataset | None = None,
+        data_file_size_in_mb: float | None = None,
+        chunk_size: int | None = None,
+    ):
        if hf_dataset is None:
            hf_dataset = hf_dataset_factory()
+        write_hf_dataset(hf_dataset, dir, data_file_size_in_mb, chunk_size)

-        data_path = info["data_path"]
-        chunks_size = info["chunks_size"]
-        ep_chunk = ep_idx // chunks_size
-        fpath = dir / data_path.format(episode_chunk=ep_chunk, episode_index=ep_idx)
-        fpath.parent.mkdir(parents=True, exist_ok=True)
-        table = hf_dataset.data.table
-        ep_table = table.filter(pc.equal(table["episode_index"], ep_idx))
-        pq.write_table(ep_table, fpath)
-        return fpath
-
-    return _create_single_episode_parquet
-
-
-@pytest.fixture(scope="session")
-def multi_episode_parquet_path(hf_dataset_factory, info_factory):
-    def _create_multi_episode_parquet(
-        dir: Path, hf_dataset: datasets.Dataset | None = None, info: dict | None = None
-    ) -> Path:
-        if not info:
-            info = info_factory()
-        if hf_dataset is None:
-            hf_dataset = hf_dataset_factory()
-
-        data_path = info["data_path"]
-        chunks_size = info["chunks_size"]
-        total_episodes = info["total_episodes"]
-        for ep_idx in range(total_episodes):
-            ep_chunk = ep_idx // chunks_size
-            fpath = dir / data_path.format(episode_chunk=ep_chunk, episode_index=ep_idx)
-            fpath.parent.mkdir(parents=True, exist_ok=True)
-            table = hf_dataset.data.table
-            ep_table = table.filter(pc.equal(table["episode_index"], ep_idx))
-            pq.write_table(ep_table, fpath)
-        return dir / "data"
-
-    return _create_multi_episode_parquet
+    return _create_hf_dataset
--- a/tests/fixtures/hub.py
+++ b/tests/fixtures/hub.py
@@ -14,15 +14,19 @@
 from pathlib import Path

 import datasets
+import pandas as pd
 import pytest
 from huggingface_hub.utils import filter_repo_objects

 from lerobot.datasets.utils import (
-    EPISODES_PATH,
-    EPISODES_STATS_PATH,
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_DATA_FILE_SIZE_IN_MB,
+    DEFAULT_DATA_PATH,
+    DEFAULT_EPISODES_PATH,
+    DEFAULT_TASKS_PATH,
+    DEFAULT_VIDEO_PATH,
    INFO_PATH,
    STATS_PATH,
-    TASKS_PATH,
 )
 from tests.fixtures.constants import LEROBOT_TEST_DIR

@@ -30,17 +34,16 @@ from tests.fixtures.constants import LEROBOT_TEST_DIR
@pytest.fixture(scope="session")
 def mock_snapshot_download_factory(
    info_factory,
-    info_path,
+    create_info,
    stats_factory,
-    stats_path,
-    episodes_stats_factory,
-    episodes_stats_path,
+    create_stats,
    tasks_factory,
-    tasks_path,
+    create_tasks,
    episodes_factory,
-    episode_path,
-    single_episode_parquet_path,
+    create_episodes,
    hf_dataset_factory,
+    create_hf_dataset,
+    create_videos,
 ):
    """
    This factory allows to patch snapshot_download such that when called, it will create expected files rather
@@ -50,82 +53,93 @@ def mock_snapshot_download_factory(
    def _mock_snapshot_download_func(
        info: dict | None = None,
        stats: dict | None = None,
-        episodes_stats: list[dict] | None = None,
-        tasks: list[dict] | None = None,
-        episodes: list[dict] | None = None,
+        tasks: pd.DataFrame | None = None,
+        episodes: datasets.Dataset | None = None,
        hf_dataset: datasets.Dataset | None = None,
+        data_files_size_in_mb: float = DEFAULT_DATA_FILE_SIZE_IN_MB,
+        chunks_size: int = DEFAULT_CHUNK_SIZE,
    ):
-        if not info:
-            info = info_factory()
-        if not stats:
+        if info is None:
+            info = info_factory(data_files_size_in_mb=data_files_size_in_mb, chunks_size=chunks_size)
+        if stats is None:
            stats = stats_factory(features=info["features"])
-        if not episodes_stats:
-            episodes_stats = episodes_stats_factory(
-                features=info["features"], total_episodes=info["total_episodes"]
-            )
-        if not tasks:
+        if tasks is None:
            tasks = tasks_factory(total_tasks=info["total_tasks"])
-        if not episodes:
+        if episodes is None:
            episodes = episodes_factory(
-                total_episodes=info["total_episodes"], total_frames=info["total_frames"], tasks=tasks
+                features=info["features"],
+                fps=info["fps"],
+                total_episodes=info["total_episodes"],
+                total_frames=info["total_frames"],
+                tasks=tasks,
            )
-        if not hf_dataset:
+        if hf_dataset is None:
            hf_dataset = hf_dataset_factory(tasks=tasks, episodes=episodes, fps=info["fps"])

-        def _extract_episode_index_from_path(fpath: str) -> int:
-            path = Path(fpath)
-            if path.suffix == ".parquet" and path.stem.startswith("episode_"):
-                episode_index = int(path.stem[len("episode_") :])  # 'episode_000000' -> 0
-                return episode_index
-            else:
-                return None
-
        def _mock_snapshot_download(
-            repo_id: str,
+            repo_id: str,  # TODO(rcadene): repo_id should be used no?
            local_dir: str | Path | None = None,
            allow_patterns: str | list[str] | None = None,
            ignore_patterns: str | list[str] | None = None,
            *args,
            **kwargs,
        ) -> str:
-            if not local_dir:
+            if local_dir is None:
                local_dir = LEROBOT_TEST_DIR

            # List all possible files
-            all_files = []
-            meta_files = [INFO_PATH, STATS_PATH, EPISODES_STATS_PATH, TASKS_PATH, EPISODES_PATH]
-            all_files.extend(meta_files)
+            all_files = [
+                INFO_PATH,
+                STATS_PATH,
+                # TODO(rcadene): remove naive chunk 0 file 0 ?
+                DEFAULT_TASKS_PATH.format(chunk_index=0, file_index=0),
+                DEFAULT_EPISODES_PATH.format(chunk_index=0, file_index=0),
+                DEFAULT_DATA_PATH.format(chunk_index=0, file_index=0),
+            ]

-            data_files = []
-            for episode_dict in episodes.values():
-                ep_idx = episode_dict["episode_index"]
-                ep_chunk = ep_idx // info["chunks_size"]
-                data_path = info["data_path"].format(episode_chunk=ep_chunk, episode_index=ep_idx)
-                data_files.append(data_path)
-            all_files.extend(data_files)
+            video_keys = [key for key, feats in info["features"].items() if feats["dtype"] == "video"]
+            for key in video_keys:
+                all_files.append(DEFAULT_VIDEO_PATH.format(video_key=key, chunk_index=0, file_index=0))

            allowed_files = filter_repo_objects(
                all_files, allow_patterns=allow_patterns, ignore_patterns=ignore_patterns
            )

-            # Create allowed files
+            request_info = False
+            request_tasks = False
+            request_episodes = False
+            request_stats = False
+            request_data = False
+            request_videos = False
            for rel_path in allowed_files:
-                if rel_path.startswith("data/"):
-                    episode_index = _extract_episode_index_from_path(rel_path)
-                    if episode_index is not None:
-                        _ = single_episode_parquet_path(local_dir, episode_index, hf_dataset, info)
-                if rel_path == INFO_PATH:
-                    _ = info_path(local_dir, info)
-                elif rel_path == STATS_PATH:
-                    _ = stats_path(local_dir, stats)
-                elif rel_path == EPISODES_STATS_PATH:
-                    _ = episodes_stats_path(local_dir, episodes_stats)
-                elif rel_path == TASKS_PATH:
-                    _ = tasks_path(local_dir, tasks)
-                elif rel_path == EPISODES_PATH:
-                    _ = episode_path(local_dir, episodes)
+                if rel_path.startswith("meta/info.json"):
+                    request_info = True
+                elif rel_path.startswith("meta/stats"):
+                    request_stats = True
+                elif rel_path.startswith("meta/tasks"):
+                    request_tasks = True
+                elif rel_path.startswith("meta/episodes"):
+                    request_episodes = True
+                elif rel_path.startswith("data/"):
+                    request_data = True
+                elif rel_path.startswith("videos/"):
+                    request_videos = True
                else:
-                    pass
+                    raise ValueError(f"{rel_path} not supported.")
+
+            if request_info:
+                create_info(local_dir, info)
+            if request_stats:
+                create_stats(local_dir, stats)
+            if request_tasks:
+                create_tasks(local_dir, tasks)
+            if request_episodes:
+                create_episodes(local_dir, episodes)
+            if request_data:
+                create_hf_dataset(local_dir, hf_dataset, data_files_size_in_mb, chunks_size)
+            if request_videos:
+                create_videos(root=local_dir, info=info)
+
            return str(local_dir)

        return _mock_snapshot_download
--- a/tests/policies/test_policies.py
+++ b/tests/policies/test_policies.py
@@ -71,7 +71,11 @@ def dummy_dataset_metadata(lerobot_dataset_metadata_factory, info_factory, tmp_p
        },
    }
    info = info_factory(
-        total_episodes=1, total_frames=1, camera_features=camera_features, motor_features=motor_features
+        total_episodes=1,
+        total_frames=1,
+        total_tasks=1,
+        camera_features=camera_features,
+        motor_features=motor_features,
    )
    ds_meta = lerobot_dataset_metadata_factory(root=tmp_path / "init", info=info)
    return ds_meta
@@ -140,7 +144,6 @@ def test_policy(ds_repo_id, env_name, env_kwargs, policy_name, policy_kwargs):
    Note: We test various combinations of policy and dataset. The combinations are by no means exhaustive,
          and for now we add tests as we see fit.
    """
-
    train_cfg = TrainPipelineConfig(
        # TODO(rcadene, aliberts): remove dataset download
        dataset=DatasetConfig(repo_id=ds_repo_id, episodes=[0]),
--- a/tests/test_control_robot.py
+++ b/tests/test_control_robot.py
@@ -14,6 +14,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from unittest.mock import patch
+
 from lerobot.calibrate import CalibrateConfig, calibrate
 from lerobot.record import DatasetRecordConfig, RecordConfig, record
 from lerobot.replay import DatasetReplayConfig, ReplayConfig, replay
@@ -67,7 +69,14 @@ def test_record_and_resume(tmp_path):
    assert dataset.meta.total_tasks == 1

    cfg.resume = True
-    dataset = record(cfg)
+    # Mock the revision to prevent Hub calls during resume
+    with (
+        patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
+        patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
+    ):
+        mock_get_safe_version.return_value = "v3.0"
+        mock_snapshot_download.return_value = str(tmp_path / "record")
+        dataset = record(cfg)

    assert dataset.meta.total_episodes == dataset.num_episodes == 2
    assert dataset.meta.total_frames == dataset.num_frames == 6
@@ -103,4 +112,12 @@ def test_record_and_replay(tmp_path):
    )

    record(record_cfg)
-    replay(replay_cfg)
+
+    # Mock the revision to prevent Hub calls during replay
+    with (
+        patch("lerobot.datasets.lerobot_dataset.get_safe_version") as mock_get_safe_version,
+        patch("lerobot.datasets.lerobot_dataset.snapshot_download") as mock_snapshot_download,
+    ):
+        mock_get_safe_version.return_value = "v3.0"
+        mock_snapshot_download.return_value = str(tmp_path / "record_and_replay")
+        replay(replay_cfg)
--- a/tests/utils/test_replay_buffer.py
+++ b/tests/utils/test_replay_buffer.py
@@ -384,7 +384,7 @@ def test_to_lerobot_dataset(tmp_path):
            elif feature == "next.done":
                assert torch.equal(value, buffer.dones[i])
            elif feature == "observation.image":
-                # Tenssor -> numpy is not precise, so we have some diff there
+                # Tensor -> numpy is not precise, so we have some diff there
                # TODO: Check and fix it
                torch.testing.assert_close(value, buffer.states["observation.image"][i], rtol=0.3, atol=0.003)
            elif feature == "observation.state":