* incremental parquet writing
* add .finalise() and a backup __del__ for stopping writers
* fix missing import
* precommit fixes added back the use of embed images
* added lazy loading for hf_Dataset to avoid frequently reloading the dataset during recording
* fix bug in video timestamps
* Added proper closing of parquet file before reading
* Added rigorous testing to validate the consistency of the meta data after creation of a new dataset
* fix bug in episode index during clear_episode_buffer
* fix(empty concat): check for empty paths list before data files concatenation
* fix(v3.0 message): updating v3.0 backward compatibility message.
* added fixes for the resume logic
* answering co-pilot review
* reverting some changes and style nits
* removed unused functions
* fix chunk_id and file_id when resuming
* - fix parquet loading when resuming
- add test to verify the parquet file integrity when resuming so that data files are now overwritten
* added general function get_file_size_in_mb and removed the one for video
* fix table size value when resuming
* Remove unnecessary reloading of the parquet file when resuming record.
Write to a new parquet file when resuming record
* added back reading parquet file for image datasets only
* - respond to Qlhoest comments
- Use pyarrows `from_pydict` function
- Add buffer for episode metadata to write to the parquet file in batches to improve efficiency
- Remove the use of `to_parquet_with_hf_images`
* fix(dataset_tools) with the new logic using proper finalize
bug in finding the latest path of the metdata that was pointing to the data files
added check for the metadata size in the case the metadatabuffer was not written yet
* nit in flush_metadata_buffer
* fix(lerobot_dataset) return the right dataset len when a subset of the dataset is requested
---------
Co-authored-by: Harsimrat Sandhawalia <hs.sandhawalia@gmail.com>
* feat(dataset-tools): add dataset utilities and example script
- Introduced dataset tools for LeRobotDataset, including functions for deleting episodes, splitting datasets, adding/removing features, and merging datasets.
- Added an example script demonstrating the usage of these utilities.
- Implemented comprehensive tests for all new functionalities to ensure reliability and correctness.
* style fixes
* move example to dataset dir
* missing lisence
* fixes mostly path
* clean comments
* move tests to functions instead of class based
* - fix video editting, decode, delete frames and rencode video
- copy unchanged video and parquet files to avoid recreating the entire dataset
* Fortify tooling tests
* Fix type issue resulting from saving numpy arrays with shape 3,1,1
* added lerobot_edit_dataset
* - revert changes in examples
- remove hardcoded split names
* update comment
* fix comment
add lerobot-edit-dataset shortcut
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
* style nit after copilot review
* fix: bug in dataset root when editing the dataset in place (without setting new_repo_id
* Fix bug in aggregate.py when accumelating video timestamps; add tests to fortify aggregate videos
* Added missing output repo id
* migrate delete episode to using pyav instead of decoding, writing frames to disk and encoding again.
Co-authored-by: Caroline Pascal <caroline8.pascal@gmail.com>
* added modified suffix in case repo_id is not set in delete_episode
* adding docs for dataset tools
* bump av version and add back time_base assignment
* linter
* modified push_to_hub logic in lerobot_edit_dataset
* fix(progress bar): fixing the progress bar issue in dataset tools
* chore(concatenate): removing no longer needed concatenate_datasets usage
* fix(file sizes forwarding): forwarding files and chunk sizes in metadata info when splitting and aggregating datasets
* style fix
* refactor(aggregate): Fix video indexing and timestamp bugs in dataset merging
There were three critical bugs in aggregate.py that prevented correct dataset merging:
1. Video file indices: Changed from += to = assignment to correctly reference
merged video files
2. Video timestamps: Implemented per-source-file offset tracking to maintain
continuous timestamps when merging split datasets (was causing non-monotonic
timestamp warnings)
3. File rotation offsets: Store timestamp offsets after rotation decision to
prevent out-of-bounds frame access (was causing "Invalid frame index" errors
with small file size limits)
Changes:
- Updated update_meta_data() to apply per-source-file timestamp offsets
- Updated aggregate_videos() to track offsets correctly during file rotation
- Added get_video_duration_in_s import for duration calculation
* Improved docs for split dataset and added a check for the possible case that the split size results in zero episodes
* chore(docs): update merge documentation details
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
---------
Co-authored-by: CarolinePascal <caroline8.pascal@gmail.com>
Co-authored-by: Jack Vial <vialjack@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* feat(devices): add lazy loading for 3rd party robots cameras and teleoperators
Co-authored-by: Darko Lukić <lukicdarkoo@gmail.com>
* feat(devices): load device class based on assumptions in naming
* docs(devices): instructions for using 3rd party devices
* docs: address review feedback
* chore(docs): add example for 3rd party devices
---------
Co-authored-by: Darko Lukić <lukicdarkoo@gmail.com>
* Add pre and post processing to async inference and update docs
* precommit fix typo
* fix tests
* refactor(async): no None branching for processors in _predict_action_chunk
---------
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
* fix bug in `augment_dataset_quantile_stats.py` that was not detecting the image features because we were looping over hf_dataset. Now we loop over the dataset itself
* Update src/lerobot/datasets/v30/augment_dataset_quantile_stats.py
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
---------
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* initial commit
* change device in test
* do detailed import
* adhere to python 3.11 syntax
* fix autodocstring
* additionally
* do same in other files
* add model. prefix to all keys in state dict
* use dummy stats
* add pi05
* also shorten action_steps
* fix test
* all test pass! and fix tokenizer max length between 05 and 0
* remove test
* fix transformer dependency
* fix test
* split pi0 and pi05 policy in seperate files
* fix test
* fix push to hub test
* add some comments, license and readme
* remove warning in config
* add pi05 to factory
* remove check
* rename action_horizon to chunk_size
* clean up padding of state and action (more in line with lerobot pi0)
* add openpi image transforms for training and add more flexibility to _preprocess_images similar to lerobot pi0
* fix key match from pytorch state dict (similar keys to openpi implementation now)
* also for pi05
* update to python 3.11
* revert to openpi transformer replace python 3.11
* fix(modeling pi0): nit warning message
* use safeauto_docstring
* fix: remove unused param
* fix from pretrained
* add preprocess tests
* also compile forward method
* Do not add model prefix to normalization
* use same name for action and state dim as lerobot pi0 and remove fixed image keys
* load from pretrained_path
* temp: hardcode base model
* fix override self.pretrained_path = None overwrite
* rename to loss
* remove additional image augmentations, lerobot dataset already does this
* Add docs
* put tests in test folder
* Add test to instatiate all base models
* go back to python 3.10
* update docs
* adapt docs pi05
* change docs: finetune base model options
* minor docs fixes and dependencies
* remove todo
* cast float64 to float32 for mps
* skip if no transformers
* fix tests
* add new models to modelcard
* add back init
* fix circular input
* feat: only run pi test on GPU
* remove require_nightly_gpu
* replace decorator test_pi0_openpi
* rename action_dim, state_dim to max_action_dim, max_state_dim
* fix doc and constants
* cleanup tests
* fix from pretrained
* fix tests
* add comment pi0 pi05 tests, add image features to pi0 pi05 hub tests
* fix, state is included in language not in flow head
* Move test to specific folder
* and paligemma task with newline
* remove add_special_tokens, not needed
* feedback pr
* Remove previous pi0 and rename pi0_openpi and pi05_openpi
* Add Quantile stats to LeRobotDataset (#1985)
* - Add RunningQuantileStats class for efficient histogram-based quantile computation
- Integrate quantile parameters (compute_quantiles, quantiles) into LeRobotDataset
- Support quantile computation during episode collection and aggregation
- Add comprehensive function-based test suite (24 tests) for quantile functionality
- Maintain full backward compatibility with existing stats computation
- Enable configurable quantiles (default: [0.01, 0.99]) for robust normalization
* style fixes, make quantiles computation by default to new datasets
* fix tests
* - Added DEFAULT_QUANTILES=[0.01, 0.10, 0.50, 0.90, 0.99] to be computed for each features instead of being chosen by the user
- Fortified tests.
* - add helper functions to reshape stats
- add missing test for quantiles
* - Add QUANTILE normalization mode to normalize the data with the 1st and 99th percentiles.
- Add QUANTILE10 normalization mode to normalize the data with the 10th and 90th percentiles.
* style fixes
* Added missing lisence
* Simplify compute_stats
* - added script `augment_dataset_quantile_stats.py` so that we can add quantile stats to existing v3 datasets that dont have quatniles
- modified quantile computation instead of using the edge for the value, interpolate the values in the bin
* rename pi0/pi05 files
* Remove open pi patch and use custom transformer branch for now
* renaming
* fix
* Revert "fix"
This reverts commit 1ea65730ac2cbca6e5869df734fbd4392561b3c6.
* fix naming
* feet(pi0/pi0.5): add pipeline (#2009)
* feat(processor): convert openpi model with processor
* TODO: Make test works
* fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests
- Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`.
- Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`.
- Enhanced task handling in tests to ensure proper formatting and batch size consistency.
- Cleaned up commented-out test code for clarity.
* refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy
- Updated imports and references throughout the codebase to reflect the new naming convention.
- Introduced a new processor file for PI0 to handle pre-processing and post-processing steps.
- Adjusted tests to utilize the renamed classes, ensuring consistency and functionality.
- Enhanced clarity and maintainability by removing outdated naming conventions.
* refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration
- Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions.
- Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`.
- Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter.
- Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability.
- Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility.
* feat(processor): convert openpi model with processor
* TODO: Make test works
* fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests
- Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`.
- Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`.
- Enhanced task handling in tests to ensure proper formatting and batch size consistency.
- Cleaned up commented-out test code for clarity.
* refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy
- Updated imports and references throughout the codebase to reflect the new naming convention.
- Introduced a new processor file for PI0 to handle pre-processing and post-processing steps.
- Adjusted tests to utilize the renamed classes, ensuring consistency and functionality.
- Enhanced clarity and maintainability by removing outdated naming conventions.
* refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration
- Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions.
- Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`.
- Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter.
- Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability.
- Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility.
* refactor(pi05): update imports and rename configuration classes
- Changed imports to reflect the new naming convention for PI05 configuration and policy classes.
- Renamed `PI05OpenPIConfig` to `PI05Config` and `PI05OpenPIPolicy` to `PI05Policy` for consistency.
- Introduced a new processor file for PI05, implementing pre-processing and post-processing steps.
- Updated tests to utilize the renamed classes, ensuring functionality and consistency across the codebase.
* update(pi05): increase tokenizer_max_length for improved processing
- Changed the `tokenizer_max_length` from 48 to 200 to enhance the model's capability in handling longer sequences.
- This adjustment aims to improve the overall performance and flexibility of the PI05 configuration.
* add default for state (max_state_dim)
* correct naming
* fix import
* cleanup code
* remove unused test
* us quantiles for action
* move to device
* remove discrete state assert
* fix pi05 test
* move pi05 to device
* use base models in comparison tests
* small renames for tests
* change number of tokens pi05 test
* fix openpi tokenization in test
* fix hub test
* fix test
* assert lerobot vs openpi tests
---------
Co-authored-by: Pepijn <pepijn@huggingface.co>
* add headers
* add back previously removed imports
* update if statement load processor with dataset stats
* remove to avoid circular import
* inject dataset stats for pretrained models
* check normalization before applying
* add link to quantile augument script
* fix(policies): transformers import for ci in PI0 & PI05 (#2039)
* fix(policies): transformers import for ci in PI0
* fix(policies): transformers import for ci in PI05
* test(processor): fix expected raise when normalization types are missing (#2040)
* switch normalization order pipeline for pi05
* Fix/quantiles script (#2064)
* refactor augment stats with quantiles script
add parallelization for faster processing
shift the quantile normalization between -1 1
* fix replay buffer tests
* fix comment
* overwrite the pipeline normalization features with the policy features
* remove double normalization overwrite
* cleanup from pretrained
* remove typo
* also set norm_map
* fix(augment_quantiles) images incorrectly divided by 255
* clamp quantiles
* link to lerobot base models
* rename tests
* encorperate PR feedback
* update docstring for RunningQuantileStats
* update doc links
* Revert "clamp quantiles"
This reverts commit 172207471c8f2cb62958e9a9e6a0535ba3ff67d4.
* fix self.paligemma
* fix tests related to quantiles that were scaled to [0,1], the new range is [-1, 1]
* fix libero doc and use different transformer branch
* use fix branch instead of feat
* update results libero
* add new line
* fix formatting
* precommit
* update results libero
* update libero doc
* update title
* final changes
* add quantiles to test
* run pre commit
---------
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
* Fix configs.py None MyPy error
* Use img_tensor instead of img in utils.py
* Add type assertion in factory.py
* Resolve merge conflict
* Uncomment envs moodule for mypy checks in pyproject.toml
---------
Signed-off-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
* feat(mypy): enable type checking for envs module and configure mypy settings in pyproject.toml
* Add mypy configuration to check only the envs module.
* Exclude examples, benchmarks, and tests from type checking.
* Set ignore_missing_imports to true and follow_imports to skip.
* chore: comment out mypy configuration in pyproject.toml and pre-commit-config.yaml
* Comment out mypy settings to disable type checking for the envs module.
* Update pre-commit configuration to reflect changes in mypy settings.
* feat(policies): add noise parameter to action prediction methods
- Introduced `ActionSelectKwargs` TypedDict for better type hinting.
- Updated `predict_action_chunk` and `select_action` methods in `PreTrainedPolicy` and its subclasses to accept a `noise` parameter.
- Modified `generate_actions` and `conditional_sample` methods in `DiffusionModel` to utilize the new noise parameter for action generation.
* refactor(policies): make ActionSelectKwargs TypedDict fields optional
- Updated `ActionSelectKwargs` to inherit with `total=False`, allowing for optional fields.
Revert "feat(normalization): add validation for empty features in NormalizerProcessorStep and UnnormalizerProcessorStep (#2087)"
This reverts commit f173265354.
* fix return type
* improve apply with vertorize op
* Update src/lerobot/datasets/aggregate.py
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* chore: replace hard-coded 'action' values with constants throughout all the source code
* chore(tests): replace hard-coded action values with constants throughout all the test code
* chore: replace hard-coded OBS values with constants throughout all the source code
* chore(tests): replace hard-coded OBS values with constants throughout all the test code