- Added JointMaskingActionSpace wrapper in gym_manipulator in order to select which joints will be controlled. For example, we can disable the gripper actions for some tasks.

- Added Nan detection mechanisms in the actor, learner and gym_manipulator for the case where we encounter nans in the loop. - changed the non-blocking in the `.to(device)` functions to only work for the case of cuda because they were causing nans when running the policy on mps - Added some joint clipping and limits in the env, robot and policy configs. TODO clean this part and make the limits in one config file only. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
2025-02-11 11:34:46 +01:00
parent 3cb43f801c
commit c623824139
9 changed files with 161 additions and 31 deletions
--- a/lerobot/scripts/server/learner_server.py
+++ b/lerobot/scripts/server/learner_server.py
@@ -278,12 +278,23 @@ def learner_push_parameters(
        torch.save(params_dict, buf)
        params_bytes = buf.getvalue()

-        # Push them to the Actor’s "SendParameters" method
+        # Push them to the Actor's "SendParameters" method
        logging.info("[LEARNER] Publishing parameters to the Actor")
        response = actor_stub.SendParameters(hilserl_pb2.Parameters(parameter_bytes=params_bytes))  # noqa: F841
        time.sleep(seconds_between_pushes)


+def check_nan_in_transition(observations: torch.Tensor, actions: torch.Tensor, next_state: torch.Tensor):
+    for k in observations:
+        if torch.isnan(observations[k]).any():
+            logging.error(f"observations[{k}] contains NaN values")
+    for k in next_state:
+        if torch.isnan(next_state[k]).any():
+            logging.error(f"next_state[{k}] contains NaN values")
+    if torch.isnan(actions).any():
+        logging.error("actions contains NaN values")
+
+
 def add_actor_information_and_train(
    cfg,
    device: str,
@@ -372,6 +383,7 @@ def add_actor_information_and_train(
            observations = batch["state"]
            next_observations = batch["next_state"]
            done = batch["done"]
+            check_nan_in_transition(observations=observations, actions=actions, next_state=next_observations)

            with policy_lock:
                loss_critic = policy.compute_loss_critic(
@@ -399,6 +411,8 @@ def add_actor_information_and_train(
        next_observations = batch["next_state"]
        done = batch["done"]

+        assert_and_breakpoint(observations=observations, actions=actions, next_state=next_observations)
+
        with policy_lock:
            loss_critic = policy.compute_loss_critic(
                observations=observations,
@@ -497,8 +511,8 @@ def make_optimizers_and_scheduler(cfg, policy: nn.Module):
    It also initializes a learning rate scheduler, though currently, it is set to `None`.

    **NOTE:**
-    - If the encoder is shared, its parameters are excluded from the actor’s optimization process.
-    - The policy’s log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.
+    - If the encoder is shared, its parameters are excluded from the actor's optimization process.
+    - The policy's log temperature (`log_alpha`) is wrapped in a list to ensure proper optimization as a standalone tensor.

    Args:
        cfg: Configuration object containing hyperparameters.