lerobot

Author	SHA1	Message	Date
AdilZouitine	0959694bab	Refactor SACPolicy and learner server for improved replay buffer management - Updated SACPolicy to create critic heads using a list comprehension for better readability. - Simplified the saving and loading of models using `save_model` and `load_model` functions from the safetensors library. - Introduced `initialize_offline_replay_buffer` function in the learner server to streamline offline dataset handling and replay buffer initialization. - Enhanced logging for dataset loading processes to improve traceability during training.	2025-04-18 15:06:52 +02:00
Michel Aractingi	7b01e16439	Add end effector action space to hil-serl (#861 ) Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-04-18 15:06:52 +02:00
AdilZouitine	66816fd871	Enhance SAC configuration and policy with gradient clipping and temperature management - Introduced `grad_clip_norm` parameter in SAC configuration for gradient clipping - Updated SACPolicy to store temperature as an instance variable for consistent usage - Modified loss calculations in SACPolicy to utilize the instance temperature - Enhanced MLP and CriticHead to support a customizable final activation function - Implemented gradient clipping in the learner server during training steps for both actor and critic - Added tracking for gradient norms in training information	2025-04-18 15:06:52 +02:00
pre-commit-ci[bot]	599326508f	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-04-18 15:06:52 +02:00
AdilZouitine	2f04d0d2b9	Add custom save and load methods for SAC policy - Implement `_save_pretrained` method to handle TensorDict state saving - Add `_from_pretrained` class method for loading SAC policy from files - Create utility function `find_and_copy_params` to handle parameter copying	2025-04-18 15:06:52 +02:00
AdilZouitine	e002c5ec56	Remove torch.no_grad decorator and optimize next action prediction in SAC policy - Removed `@torch.no_grad` decorator from Unnormalize forward method - Added TODO comment for optimizing next action prediction in SAC policy - Minor formatting adjustment in NaN assertion for log standard deviation Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>	2025-04-18 15:06:52 +02:00
Eugene Mironov	b6a2200983	[HIL-SERL] Migrate threading to multiprocessing (#759 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-04-18 15:06:52 +02:00
pre-commit-ci[bot]	85fe8a3f4e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-04-18 15:06:51 +02:00
AdilZouitine	bb69cb3c8c	Add storage device configuration for SAC policy and replay buffer - Introduce `storage_device` parameter in SAC configuration and training settings - Update learner server to use configurable storage device for replay buffer - Reduce online buffer capacity in ManiSkill configuration - Modify replay buffer initialization to support custom storage device	2025-04-18 15:04:58 +02:00
Michel Aractingi	d3b84ecd6f	Added caching function in the learner_server and modeling sac in order to limit the number of forward passes through the pretrained encoder when its frozen. Added tensordict dependencies Updated the version of torch and torchvision Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:58 +02:00
Eugene Mironov	e1d55c7a44	[Port HIL-SERL] Adjust Actor-Learner architecture & clean up dependency management for HIL-SERL (#722 )	2025-04-18 15:04:56 +02:00
AdilZouitine	85242cac67	Refactor SAC policy with performance optimizations and multi-camera support - Introduced Ensemble and CriticHead classes for more efficient critic network handling - Added support for multiple camera inputs in observation encoder - Optimized image encoding by batching image processing - Updated configuration for ManiSkill environment with reduced image size and action scaling - Compiled critic networks for improved performance - Simplified normalization and ensemble handling in critic networks Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>	2025-04-18 15:04:44 +02:00
Michel Aractingi	0d88a5ee09	- Fixed big issue in the loading of the policy parameters sent by the learner to the actor -- pass only the actor to the `update_policy_parameters` and remove `strict=False` - Fixed big issue in the normalization of the actions in the `forward` function of the critic -- remove the `torch.no_grad` decorator in `normalize.py` in the normalization function - Fixed performance issue to boost the optimization frequency by setting the storage device to be the same as the device of learning. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:44 +02:00
AdilZouitine	c85f88fb62	Improve wandb logging and custom step tracking in logger - Modify logger to support multiple custom step keys - Update logging method to handle custom step keys more flexibly - Enhance logging of optimization step and frequency Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>	2025-04-18 15:04:44 +02:00
Michel Aractingi	2ac25b02e2	nit Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:43 +02:00
Michel Aractingi	140e30e386	Changed the init_final value to center the starting mean and std of the policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:43 +02:00
Michel Aractingi	5195f40fd3	Hardcoded some normalization parameters. TODO refactor Added masking actions on the level of the intervention actions and offline dataset Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:43 +02:00
Michel Aractingi	98c6557869	fix log_alpha in modeling_sac: change to nn.parameter added pretrained vision model in policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:43 +02:00
Michel Aractingi	5d6879d93a	Added possiblity to record and replay delta actions during teleoperation rather than absolute actions Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:42 +02:00
Eugene Mironov	3a07301365	[Port HIL-SERL] Add resnet-10 as default encoder for HIL-SERL (#696 ) Co-authored-by: Khalil Meftah <kmeftah.khalil@gmail.com> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Ke Wang <superwk1017@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	f1af97dc9c	- Added JointMaskingActionSpace wrapper in `gym_manipulator` in order to select which joints will be controlled. For example, we can disable the gripper actions for some tasks. - Added Nan detection mechanisms in the actor, learner and gym_manipulator for the case where we encounter nans in the loop. - changed the non-blocking in the `.to(device)` functions to only work for the case of cuda because they were causing nans when running the policy on mps - Added some joint clipping and limits in the env, robot and policy configs. TODO clean this part and make the limits in one config file only. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	9784d8a47f	Several fixes to move the actor_server and learner_server code from the maniskill environment to the real robot environment. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	12c13e320e	- Added `lerobot/scripts/server/gym_manipulator.py` that contains all the necessary wrappers to run a gym-style env around the real robot. - Added `lerobot/scripts/server/find_joint_limits.py` to test the min and max angles of the motion you wish the robot to explore during RL training. - Added logic in `manipulator.py` to limit the maximum possible joint angles to allow motion within a predefined joint position range. The limits are specified in the yaml config for each robot. Checkout the so100.yaml. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	d2c41b35db	- Refactor observation encoder in `modeling_sac.py` - added `torch.compile` to the actor and learner servers. - organized imports in `train_sac.py` - optimized the parameters push by not sending the frozen pre-trained encoder. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Yoel	bc7b6d3daf	[Port HIL-SERL] Add HF vision encoder option in SAC (#651 ) Added support with custom pretrained vision encoder to the modeling sac implementation. Great job @ChorntonYoel !	2025-04-18 15:04:13 +02:00
Michel Aractingi	aebea08a99	Added support for checkpointing the policy. We can save and load the policy state dict, optimizers state, optimization step and interaction step Added functions for converting the replay buffer from and to LeRobotDataset. When we want to save the replay buffer, we convert it first to LeRobotDataset format and save it locally and vice-versa. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	8cd44ae163	- Added additional logging information in wandb around the timings of the policy loop and optimization loop. - Optimized critic design that improves the performance of the learner loop by a factor of 2 - Cleaned the code and fixed style issues - Completed the config with actor_learner_config field that contains host-ip and port elemnts that are necessary for the actor-learner servers. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	2ae657f568	FREEDOM, added back the optimization loop code in `learner_server.py` Ran experiment with pushcube env from maniskill. The learning seem to work. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
Michel Aractingi	508f5d1407	Added server directory in `lerobot/scripts` that contains scripts and the protobuf message types to split training into two processes, acting and learning. The actor rollouts the policy and collects interaction data while the learner recieves the data, trains the policy and sends the updated parameters to the actor. The two scripts are ran simultaneously Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>	2025-04-18 15:04:13 +02:00
AdilZouitine	c8b1132846	Stable version of rlpd + drq	2025-04-18 15:04:10 +02:00
AdilZouitine	ef777993cd	Add type annotations and restructure SACConfig class fields	2025-04-18 15:03:51 +02:00
Adil Zouitine	760d60ad4b	Change SAC policy implementation with configuration and modeling classes	2025-04-18 15:03:51 +02:00
Adil Zouitine	875c0271b7	SAC works	2025-04-18 15:03:51 +02:00
Adil Zouitine	57344bfde5	[WIP] correct sac implementation	2025-04-18 15:03:51 +02:00
Adil Zouitine	46827fb002	Add rlpd tricks	2025-04-18 15:03:51 +02:00
Adil Zouitine	2fd78879f6	SAC works	2025-04-18 15:03:51 +02:00
Adil Zouitine	e8449e9630	remove breakpoint	2025-04-18 15:03:51 +02:00
Adil Zouitine	a0e2be8b92	[WIP] correct sac implementation	2025-04-18 15:03:51 +02:00
Michel Aractingi	181727c0fe	Extend reward classifier for multiple camera views (#626 )	2025-04-18 15:03:50 +02:00
Eugene Mironov	d1d6ffd23c	[Port HIL_SERL] Final fixes for the Reward Classifier (#598 )	2025-04-18 15:03:01 +02:00
Michel Aractingi	e5801f467f	added temporary fix for missing task_index key in online environment	2025-04-18 15:03:01 +02:00
Michel Aractingi	c6ca9523de	split encoder for critic and actor	2025-04-18 15:03:01 +02:00
Michel Aractingi	642e3a3274	style fixes	2025-04-18 15:03:01 +02:00
KeWang1017	146148c48c	Refactor SAC configuration and policy for improved action sampling and stability - Updated SACConfig to replace standard deviation parameterization with log_std_min and log_std_max for better control over action distributions. - Modified SACPolicy to streamline action selection and log probability calculations, enhancing stochastic behavior. - Removed deprecated TanhMultivariateNormalDiag class to simplify the codebase and improve maintainability. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.	2025-04-18 15:03:01 +02:00
KeWang1017	8f15835daa	Refine SAC configuration and policy for enhanced performance - Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability. - Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations. - Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency. - Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.	2025-04-18 15:03:01 +02:00
KeWang1017	022bd65125	Refactor SACPolicy for improved action sampling and standard deviation handling - Updated action selection to use distribution sampling and log probabilities for better stochastic behavior. - Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs. - Cleaned up code by removing unnecessary comments and improving readability. These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.	2025-04-18 15:03:01 +02:00
KeWang1017	63d8c96514	trying to get sac running	2025-04-18 15:03:01 +02:00
Michel Aractingi	4624a836e5	Added normalization schemes and style checks	2025-04-18 15:03:01 +02:00
Michel Aractingi	ad7eea132d	added optimizer and sac to factory.py	2025-04-18 15:02:59 +02:00
Eugene Mironov	22a1899ff4	[HIL-SERL PORT] Fix linter issues (#588 )	2025-04-18 15:02:44 +02:00

1 2 3 4 5 ...

447 Commits