Enhance SAC configuration and policy with new parameters and subsampling logic

- Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig.
- Implemented target entropy calculation in SACPolicy if not provided.
- Introduced subsampling of critics to prevent overfitting during updates.
- Updated temperature loss calculation to use the new target entropy.
- Added comments for future UTD update implementation.

These changes improve the flexibility and performance of the SAC implementation.
This commit is contained in:
KeWang1017
2024-12-17 15:58:04 +00:00
committed by AdilZouitine
parent dbadaae28b
commit a5228a0dfe
2 changed files with 20 additions and 4 deletions

View File

@@ -23,8 +23,11 @@ class SACConfig:
discount = 0.99
temperature_init = 1.0
num_critics = 2
num_subsample_critics = None
critic_lr = 3e-4
actor_lr = 3e-4
critic_target_update_weight = 0.005
utd_ratio = 2
critic_network_kwargs = {
"hidden_dims": [256, 256],
"activate_final": True,