Files
lerobot/lerobot/common
Ke-Wang1017 f99e670976 Refactor SACPolicy and configuration for improved training dynamics
- Introduced target critic networks in SACPolicy to enhance stability during training.
- Updated TD target calculation to incorporate entropy adjustments, improving robustness.
- Increased online buffer capacity in configuration from 10,000 to 40,000 for better data handling.
- Adjusted learning rates for critic, actor, and temperature to 3e-4 for optimized training performance.

These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.
2025-01-06 10:14:34 +00:00
..
2024-12-17 02:42:53 +07:00