Commit Graph

930 Commits

Author SHA1 Message Date
AdilZouitine
7a3d8756b4 Refactor input and output normalization handling in SACPolicy for improved clarity and efficiency. Consolidate encoder initialization logic and remove redundant else statements. 2025-04-17 16:05:11 +00:00
AdilZouitine
dc1548fe1a Fix init temp
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
23c9441d5f Update log_std_min type to float in PolicyConfig for consistency 2025-04-16 16:46:37 +02:00
AdilZouitine
870e3efb92 fix caching
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
bfd48a8b70 Handle caching
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
5dc7ff6d3c change the tanh distribution to match hil serl
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
ee4ebeac9b match target entropy hil serl
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
fe7b47f459 stick to hil serl nn architecture
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
044ca3b039 Refactor modeling_sac and parameter handling for clarity and reusability.
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine
bc36c69b71 fix encoder training 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
2b9b05f1ba [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
Michel Aractingi
9eec7b8bb0 General fixes in code, removed delta action, fixed grasp penalty, added logic to put gripper reward in info 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
a80a9cf379 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine
7a42af835e fix caching and dataset stats is optional 2025-04-16 16:46:37 +02:00
AdilZouitine
9751328783 Add rounding for safety 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
7225bc74a3 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine
03b1644bf7 fix sign issue 2025-04-16 16:46:37 +02:00
AdilZouitine
9b6e5a383f Refactor complementary_info handling in ReplayBuffer 2025-04-16 16:46:37 +02:00
AdilZouitine
86466b025f Handle gripper penalty 2025-04-16 16:46:37 +02:00
AdilZouitine
54745f111d fix caching 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
82584cca78 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine
d3a8c2c247 fix indentation issue 2025-04-16 16:46:37 +02:00
AdilZouitine
74c11c4a75 Enhance SAC configuration and replay buffer with asynchronous prefetching support
- Added async_prefetch parameter to SACConfig for improved buffer management.
- Implemented get_iterator method in ReplayBuffer to support asynchronous prefetching of batches.
- Updated learner_server to utilize the new iterator for online and offline sampling, enhancing training efficiency.
2025-04-16 16:46:37 +02:00
AdilZouitine
2d932b710c Enhance SACPolicy to support shared encoder and optimize action selection
- Cached encoder output in select_action method to reduce redundant computations.
- Updated action selection and grasp critic calls to utilize cached encoder features when available.
2025-04-16 16:46:37 +02:00
AdilZouitine
a54baceabb Enhance SACPolicy and learner server for improved grasp critic integration
- Updated SACPolicy to conditionally compute grasp critic losses based on the presence of discrete actions.
- Refactored the forward method to handle grasp critic model selection and loss computation more clearly.
- Adjusted learner server to utilize optimized parameters for grasp critic during training.
- Improved action handling in the ManiskillMockGripperWrapper to accommodate both tuple and single action inputs.
2025-04-16 16:46:37 +02:00
AdilZouitine
077d18b439 Refactor SACPolicy for improved readability and action dimension handling
- Cleaned up code formatting for better readability, including consistent spacing and removal of unnecessary blank lines.
- Consolidated continuous action dimension calculation to enhance clarity and maintainability.
- Simplified loss return statements in the forward method to improve code structure.
- Ensured grasp critic parameters are included conditionally based on configuration settings.
2025-04-16 16:46:37 +02:00
AdilZouitine
c6cd1475a7 Add mock gripper support and enhance SAC policy action handling
- Introduced mock_gripper parameter in ManiskillEnvConfig to enable gripper simulation.
- Added ManiskillMockGripperWrapper to adjust action space for environments with discrete actions.
- Updated SACPolicy to compute continuous action dimensions correctly, ensuring compatibility with the new gripper setup.
- Refactored action handling in the training loop to accommodate the changes in action dimensions.
2025-04-16 16:46:37 +02:00
AdilZouitine
e35ee47b07 Refactor SAC policy and training loop to enhance discrete action support
- Updated SACPolicy to conditionally compute losses for grasp critic based on num_discrete_actions.
- Simplified forward method to return loss outputs as a dictionary for better clarity.
- Adjusted learner_server to handle both main and grasp critic losses during training.
- Ensured optimizers are created conditionally for grasp critic based on configuration settings.
2025-04-16 16:46:37 +02:00
AdilZouitine
c3f2487026 Refactor SAC configuration and policy to support discrete actions
- Removed GraspCriticNetworkConfig class and integrated its parameters into SACConfig.
- Added num_discrete_actions parameter to SACConfig for better action handling.
- Updated SACPolicy to conditionally create grasp critic networks based on num_discrete_actions.
- Enhanced grasp critic forward pass to handle discrete actions and compute losses accordingly.
2025-04-16 16:46:37 +02:00
Michel Aractingi
c621077b62 Added Gripper quantization wrapper and grasp penalty
removed complementary info from buffer and learner server
removed get_gripper_action function
added gripper parameters to `common/envs/configs.py`
2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
f5cfd9fd48 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
s1lent4gnt
22da1739b1 Add grasp critic to the training loop
- Integrated the grasp critic gradient update to the training loop in learner_server
- Added Adam optimizer and configured grasp critic learning rate in configuration_sac
- Added target critics networks update after the critics gradient step
2025-04-16 16:46:37 +02:00
s1lent4gnt
d38d5f988d Add get_gripper_action method to GamepadController 2025-04-16 16:46:37 +02:00
s1lent4gnt
8d1936ffe0 Add gripper penalty wrapper 2025-04-16 16:46:37 +02:00
s1lent4gnt
cef944e1b1 Add complementary info in the replay buffer
- Added complementary info in the add method
- Added complementary info in the sample method
2025-04-16 16:46:37 +02:00
s1lent4gnt
384eb2cd07 Add grasp critic
- Implemented grasp critic to evaluate gripper actions
- Added corresponding config parameters for tuning
2025-04-16 16:46:37 +02:00
pre-commit-ci[bot]
0f706ce543 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-31 13:59:32 +00:00
AdilZouitine
026ad463a9 Fix convergence of sac, multiple torch compile on the same model caused divergence 2025-03-31 13:54:21 +00:00
AdilZouitine
8494634d48 Fix cuda graph break 2025-03-31 07:59:56 +00:00
s1lent4gnt
66c3672738 Fix: Prevent Invalid next_state References When optimize_memory=True (#918) 2025-03-31 09:43:40 +02:00
pre-commit-ci[bot]
c05e4835d0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-28 17:20:39 +00:00
Michel Aractingi
808cf63221 Added support for controlling the gripper with the pygame interface of gamepad
Minor modifications in gym_manipulator to quantize the gripper actions
clamped the observations after F.resize in ConvertToLeRobotObservation wrapper due to a bug in F.resize, images were returned exceeding the maximum value of 1.0
2025-03-28 17:18:48 +00:00
AdilZouitine
0150139668 Refactor SACPolicy for improved type annotations and readability
- Enhanced type annotations for variables in the `SACPolicy` class to improve code clarity.
- Updated method calls to use keyword arguments for better readability.
- Streamlined the extraction of batch components, ensuring consistent typing across the class methods.
2025-03-28 17:18:48 +00:00
AdilZouitine
b3ad63cf6e Refactor SACPolicy and learner_server for improved clarity and functionality
- Updated the `forward` method in `SACPolicy` to handle loss computation for actor, critic, and temperature models.
- Replaced direct calls to `compute_loss_*` methods with a unified `forward` method in `learner_server`.
- Enhanced batch processing by consolidating input parameters into a single dictionary for better readability and maintainability.
- Removed redundant code and improved documentation for clarity.
2025-03-28 17:18:48 +00:00
AdilZouitine
8b02e81bb5 Refactor actor_server.py for improved structure and logging
- Consolidated logging initialization and enhanced logging for actor processes.
- Streamlined the handling of gRPC connections and process management.
- Improved readability by organizing core algorithm functions and communication functions.
- Added detailed comments and documentation for clarity.
- Ensured proper queue management and shutdown handling for actor processes.
2025-03-28 17:18:48 +00:00
AdilZouitine
dcce446a66 Refactor learner_server.py for improved structure and clarity
- Removed unused imports and streamlined the code structure.
- Consolidated logging initialization and enhanced logging for training processes.
- Improved handling of training state loading and resume logic.
- Refactored transition and interaction message processing for better readability and maintainability.
- Added detailed comments and documentation for clarity.
2025-03-28 17:18:48 +00:00
AdilZouitine
82a6b69e0e Refactor imports in modeling_sac.py for improved organization
- Rearranged import statements for better readability.
- Removed unused imports and streamlined the code structure.
2025-03-28 17:18:48 +00:00
AdilZouitine
6f7024242a Refactor SACConfig properties for improved readability
- Simplified the `image_features` property to directly iterate over `input_features`.
- Removed unused imports and unnecessary code related to main execution, enhancing clarity and maintainability.
2025-03-28 17:18:48 +00:00
AdilZouitine
3c56ad33c3 fix 2025-03-28 17:18:48 +00:00
AdilZouitine
49baa1ff49 Enhance logging for actor and learner servers
- Implemented process-specific logging for actor and learner servers to improve traceability.
- Created a dedicated logs directory and ensured it exists before logging.
- Initialized logging with explicit log files for each process, including actor transitions, interactions, and policy.
- Updated the actor CLI to validate configuration and set up logging accordingly.
2025-03-28 17:18:48 +00:00