Release cleanup (#132)

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Cadene <re.cadene@gmail.com>
2024-05-06 03:03:14 +02:00
parent 6eaffbef1d
commit f5e76393eb
19 changed files with 312 additions and 237 deletions
--- a/lerobot/common/datasets/_video_benchmark/README.md
+++ b/lerobot/common/datasets/_video_benchmark/README.md
@@ -37,16 +37,16 @@ How to decode videos?
 ## Variables

 **Image content**
-We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this bechmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).
+We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this benchmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).

 **Requested timestamps**
-In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
+In this benchmark, we focus on the loading time of random access, so we are not interested in sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
 - `single_frame`: 1 frame,
 - `2_frames`: 2 consecutive frames (e.g. `[t, t + 1 / fps]`),
 - `2_frames_4_space`: 2 consecutive frames with 4 frames of spacing (e.g `[t, t + 4 / fps]`),

 **Data augmentations**
-We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robusts (e.g. robust to color changes, compression, etc.).
+We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robust (e.g. robust to color changes, compression, etc.).


 ## Results
--- a/lerobot/common/datasets/lerobot_dataset.py
+++ b/lerobot/common/datasets/lerobot_dataset.py
@@ -47,6 +47,7 @@ class LeRobotDataset(torch.utils.data.Dataset):

    @property
    def fps(self) -> int:
+        """Frames per second used during data collection."""
        return self.info["fps"]

    @property
@@ -61,15 +62,22 @@ class LeRobotDataset(torch.utils.data.Dataset):
        return self.hf_dataset.features

    @property
-    def image_keys(self) -> list[str]:
-        image_keys = []
+    def camera_keys(self) -> list[str]:
+        """Keys to access image and video stream from cameras."""
+        keys = []
        for key, feats in self.hf_dataset.features.items():
-            if isinstance(feats, datasets.Image):
-                image_keys.append(key)
-        return image_keys + self.video_frame_keys
+            if isinstance(feats, (datasets.Image, VideoFrame)):
+                keys.append(key)
+        return keys

    @property
-    def video_frame_keys(self):
+    def video_frame_keys(self) -> list[str]:
+        """Keys to access video frames that requires to be decoded into images.
+
+        Note: It is empty if the dataset contains images only,
+        or equal to `self.cameras` if the dataset contains videos only,
+        or can even be a subset of `self.cameras` in a case of a mixed image/video dataset.
+        """
        video_frame_keys = []
        for key, feats in self.hf_dataset.features.items():
            if isinstance(feats, VideoFrame):
@@ -78,10 +86,12 @@ class LeRobotDataset(torch.utils.data.Dataset):

    @property
    def num_samples(self) -> int:
+        """Number of samples/frames."""
        return len(self.hf_dataset)

    @property
    def num_episodes(self) -> int:
+        """Number of episodes."""
        return len(self.hf_dataset.unique("episode_index"))

    @property
@@ -121,6 +131,22 @@ class LeRobotDataset(torch.utils.data.Dataset):

        return item

+    def __repr__(self):
+        return (
+            f"{self.__class__.__name__}(\n"
+            f"  Repository ID: '{self.repo_id}',\n"
+            f"  Version: '{self.version}',\n"
+            f"  Split: '{self.split}',\n"
+            f"  Number of Samples: {self.num_samples},\n"
+            f"  Number of Episodes: {self.num_episodes},\n"
+            f"  Type: {'video (.mp4)' if self.video else 'image (.png)'},\n"
+            f"  Recorded Frames per Second: {self.fps},\n"
+            f"  Camera Keys: {self.camera_keys},\n"
+            f"  Video Frame Keys: {self.video_frame_keys if self.video else 'N/A'},\n"
+            f"  Transformations: {self.transform},\n"
+            f")"
+        )
+
    @classmethod
    def from_preloaded(
        cls,
--- a/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
+++ b/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py
@@ -142,12 +142,12 @@ def load_from_raw(raw_dir, out_dir, fps, video, debug):
 def to_hf_dataset(data_dict, video) -> Dataset:
    features = {}

-    image_keys = [key for key in data_dict if "observation.images." in key]
-    for image_key in image_keys:
+    keys = [key for key in data_dict if "observation.images." in key]
+    for key in keys:
        if video:
-            features[image_key] = VideoFrame()
+            features[key] = VideoFrame()
        else:
-            features[image_key] = Image()
+            features[key] = Image()

    features["observation.state"] = Sequence(
        length=data_dict["observation.state"].shape[1], feature=Value(dtype="float32", id=None)