forked from tangger/lerobot
Add docs for LeRobot Image transforms (#1972)
* Remove unused scripts, add docs for image transforms and add example * fix(examples): move train_policy.py under examples, remove outdated readme parts * remove script thats copied to train folder * remove outdated links to examples and example tests
This commit is contained in:
@@ -8,6 +8,7 @@ This docs will guide you to:
|
||||
- Record a dataset and push it to the Hub
|
||||
- Load datasets for training with `LeRobotDataset`
|
||||
- Stream datasets without downloading using `StreamingLeRobotDataset`
|
||||
- Apply image transforms for data augmentation during training
|
||||
- Migrate existing `v2.1` datasets to `v3.0`
|
||||
|
||||
## What’s new in `v3`
|
||||
@@ -150,6 +151,117 @@ dataset = StreamingLeRobotDataset(repo_id) # streams directly from the Hub
|
||||
</figure>
|
||||
</div>
|
||||
|
||||
## Image transforms
|
||||
|
||||
Image transforms are data augmentations applied to camera frames during training to improve model robustness and generalization. LeRobot supports various transforms including brightness, contrast, saturation, hue, and sharpness adjustments.
|
||||
|
||||
### Using transforms during dataset creation/recording
|
||||
|
||||
Currently, transforms are applied during **training time only**, not during recording. When you create or record a dataset, the raw images are stored without transforms. This allows you to experiment with different augmentations later without re-recording data.
|
||||
|
||||
### Adding transforms to existing datasets (API)
|
||||
|
||||
Use the `image_transforms` parameter when loading a dataset for training:
|
||||
|
||||
```python
|
||||
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
||||
from lerobot.datasets.transforms import ImageTransforms, ImageTransformsConfig, ImageTransformConfig
|
||||
|
||||
# Option 1: Use default transform configuration (disabled by default)
|
||||
transforms_config = ImageTransformsConfig(
|
||||
enable=True, # Enable transforms
|
||||
max_num_transforms=3, # Apply up to 3 transforms per frame
|
||||
random_order=False, # Apply in standard order
|
||||
)
|
||||
transforms = ImageTransforms(transforms_config)
|
||||
|
||||
dataset = LeRobotDataset(
|
||||
repo_id="your-username/your-dataset",
|
||||
image_transforms=transforms
|
||||
)
|
||||
|
||||
# Option 2: Create custom transform configuration
|
||||
custom_transforms_config = ImageTransformsConfig(
|
||||
enable=True,
|
||||
max_num_transforms=2,
|
||||
random_order=True,
|
||||
tfs={
|
||||
"brightness": ImageTransformConfig(
|
||||
weight=1.0,
|
||||
type="ColorJitter",
|
||||
kwargs={"brightness": (0.7, 1.3)} # Adjust brightness range
|
||||
),
|
||||
"contrast": ImageTransformConfig(
|
||||
weight=2.0, # Higher weight = more likely to be selected
|
||||
type="ColorJitter",
|
||||
kwargs={"contrast": (0.8, 1.2)}
|
||||
),
|
||||
"sharpness": ImageTransformConfig(
|
||||
weight=0.5, # Lower weight = less likely to be selected
|
||||
type="SharpnessJitter",
|
||||
kwargs={"sharpness": (0.3, 2.0)}
|
||||
),
|
||||
}
|
||||
)
|
||||
|
||||
dataset = LeRobotDataset(
|
||||
repo_id="your-username/your-dataset",
|
||||
image_transforms=ImageTransforms(custom_transforms_config)
|
||||
)
|
||||
|
||||
# Option 3: Use pure torchvision transforms
|
||||
from torchvision.transforms import v2
|
||||
|
||||
torchvision_transforms = v2.Compose([
|
||||
v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
|
||||
v2.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),
|
||||
])
|
||||
|
||||
dataset = LeRobotDataset(
|
||||
repo_id="your-username/your-dataset",
|
||||
image_transforms=torchvision_transforms
|
||||
)
|
||||
```
|
||||
|
||||
### Available transform types
|
||||
|
||||
LeRobot provides several transform types:
|
||||
|
||||
- **`ColorJitter`**: Adjusts brightness, contrast, saturation, and hue
|
||||
- **`SharpnessJitter`**: Randomly adjusts image sharpness
|
||||
- **`Identity`**: No transformation (useful for testing)
|
||||
|
||||
You can also use any `torchvision.transforms.v2` transform by passing it directly to the `image_transforms` parameter.
|
||||
|
||||
### Configuration options
|
||||
|
||||
- **`enable`**: Enable/disable transforms (default: `False`)
|
||||
- **`max_num_transforms`**: Maximum number of transforms applied per frame (default: `3`)
|
||||
- **`random_order`**: Apply transforms in random order vs. standard order (default: `False`)
|
||||
- **`weight`**: Sampling probability for each transform (higher = more likely, if sum of weights is not 1, they will be normalized)
|
||||
- **`kwargs`**: Transform-specific parameters (e.g., brightness range)
|
||||
|
||||
### Visualizing transforms
|
||||
|
||||
Use the visualization script to preview how transforms affect your data:
|
||||
|
||||
```bash
|
||||
python -m lerobot.scripts.visualize_image_transforms \
|
||||
--repo-id=your-username/your-dataset \
|
||||
--output-dir=./transform_examples \
|
||||
--n-examples=5
|
||||
```
|
||||
|
||||
This saves example images showing the effect of each transform, helping you tune parameters.
|
||||
|
||||
### Best practices
|
||||
|
||||
- **Start conservative**: Begin with small ranges (e.g., brightness 0.9-1.1) and increase gradually
|
||||
- **Test first**: Use the visualization script to ensure transforms look reasonable
|
||||
- **Monitor training**: Strong augmentations can hurt performance if too aggressive
|
||||
- **Match your domain**: If your robot operates in varying lighting, use brightness/contrast transforms
|
||||
- **Combine wisely**: Using too many transforms simultaneously can make training unstable
|
||||
|
||||
## Migrate `v2.1` → `v3.0`
|
||||
|
||||
A converter aggregates per‑episode files into larger shards and writes episode offsets/metadata. Convert your dataset using the instructions below.
|
||||
|
||||
Reference in New Issue
Block a user