Refine SAC configuration and policy for enhanced performance

- Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability.
- Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations.
- Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency.
- Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles.

These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.
This commit is contained in:
KeWang1017
2024-12-28 22:11:34 +00:00
committed by AdilZouitine
parent 0ecf40d396
commit 70e3b9248c
3 changed files with 31 additions and 39 deletions

View File

@@ -19,7 +19,7 @@ training:
grad_clip_norm: 10.0
lr: 3e-4
eval_freq: 10000
eval_freq: 50000
log_freq: 500
save_freq: 50000