- Added JointMaskingActionSpace wrapper in gym_manipulator in order to select which joints will be controlled. For example, we can disable the gripper actions for some tasks.

- Added Nan detection mechanisms in the actor, learner and gym_manipulator for the case where we encounter nans in the loop.
- changed the non-blocking in the `.to(device)` functions to only work for the case of cuda because they were causing nans when running the policy on mps
- Added some joint clipping and limits in the env, robot and policy configs. TODO clean this part and make the limits in one config file only.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
This commit is contained in:
Michel Aractingi
2025-02-11 11:34:46 +01:00
committed by AdilZouitine
parent 3cb43f801c
commit c623824139
9 changed files with 161 additions and 31 deletions

View File

@@ -278,12 +278,23 @@ def learner_push_parameters(
torch.save(params_dict, buf)
params_bytes = buf.getvalue()
# Push them to the Actors "SendParameters" method
# Push them to the Actor's "SendParameters" method
logging.info("[LEARNER] Publishing parameters to the Actor")
response = actor_stub.SendParameters(hilserl_pb2.Parameters(parameter_bytes=params_bytes)) # noqa: F841
time.sleep(seconds_between_pushes)
def check_nan_in_transition(observations: torch.Tensor, actions: torch.Tensor, next_state: torch.Tensor):
for k in observations:
if torch.isnan(observations[k]).any():
logging.error(f"observations[{k}] contains NaN values")
for k in next_state:
if torch.isnan(next_state[k]).any():
logging.error(f"next_state[{k}] contains NaN values")
if torch.isnan(actions).any():
logging.error("actions contains NaN values")
def add_actor_information_and_train(
cfg,
device: str,
@@ -372,6 +383,7 @@ def add_actor_information_and_train(
observations = batch["state"]
next_observations = batch["next_state"]
done = batch["done"]
check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)
with policy_lock:
loss_critic = policy.compute_loss_critic(
@@ -399,6 +411,8 @@ def add_actor_information_and_train(
next_observations = batch["next_state"]
done = batch["done"]
assert_and_breakpoint(observations=observations, actions=actions, next_state=next_observations)
with policy_lock:
loss_critic = policy.compute_loss_critic(
observations=observations,
@@ -497,8 +511,8 @@ def make_optimizers_and_scheduler(cfg, policy: nn.Module):
It also initializes a learning rate scheduler, though currently, it is set to `None`.
**NOTE:**
- If the encoder is shared, its parameters are excluded from the actors optimization process.
- The policys log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
- If the encoder is shared, its parameters are excluded from the actor's optimization process.
- The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
Args:
cfg: Configuration object containing hyperparameters.