Enhance SAC configuration and policy with new parameters and subsampling logic

- Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig. - Implemented target entropy calculation in SACPolicy if not provided. - Introduced subsampling of critics to prevent overfitting during updates. - Updated temperature loss calculation to use the new target entropy. - Added comments for future UTD update implementation. These changes improve the flexibility and performance of the SAC implementation.
2024-12-17 15:58:04 +00:00
parent dbadaae28b
commit a5228a0dfe
2 changed files with 20 additions and 4 deletions
--- a/lerobot/common/policies/sac/configuration_sac.py
+++ b/lerobot/common/policies/sac/configuration_sac.py
@@ -23,8 +23,11 @@ class SACConfig:
    discount = 0.99
    temperature_init = 1.0
    num_critics = 2
+    num_subsample_critics = None
    critic_lr = 3e-4
    actor_lr = 3e-4
+    critic_target_update_weight = 0.005
+    utd_ratio = 2
    critic_network_kwargs = {
            "hidden_dims": [256, 256],
            "activate_final": True,