Enhance SAC configuration and policy with gradient clipping and temperature management

- Introduced `grad_clip_norm` parameter in SAC configuration for gradient clipping
- Updated SACPolicy to store temperature as an instance variable for consistent usage
- Modified loss calculations in SACPolicy to utilize the instance temperature
- Enhanced MLP and CriticHead to support a customizable final activation function
- Implemented gradient clipping in the learner server during training steps for both actor and critic
- Added tracking for gradient norms in training information
This commit is contained in:
AdilZouitine
2025-03-17 10:50:28 +00:00
parent dee154a1a5
commit 7960f2c3c1
3 changed files with 60 additions and 9 deletions

View File

@@ -84,10 +84,12 @@ class SACConfig:
latent_dim: int = 256
target_entropy: float | None = None
use_backup_entropy: bool = True
grad_clip_norm: float = 40.0
critic_network_kwargs: dict[str, Any] = field(
default_factory=lambda: {
"hidden_dims": [256, 256],
"activate_final": True,
"final_activation": None,
}
)
actor_network_kwargs: dict[str, Any] = field(