Add test3 test4

Cremaillaire HF
WIP
2024-12-29 13:30:17 +01:00 · 2024-12-24 11:33:55 +01:00 · 2024-11-27 13:59:37 +01:00 · 2024-10-26 10:43:52 +02:00
28 changed files with 1826 additions and 1312 deletions
--- a/README.md
+++ b/README.md
@@ -55,9 +55,9 @@

 <table>
  <tr>
-    <td><img src="media/gym/aloha_act.gif" width="100%" alt="ACT policy on ALOHA env"/></td>
-    <td><img src="media/gym/simxarm_tdmpc.gif" width="100%" alt="TDMPC policy on SimXArm env"/></td>
-    <td><img src="media/gym/pusht_diffusion.gif" width="100%" alt="Diffusion policy on PushT env"/></td>
+    <td><img src="http://remicadene.com/assets/gif/aloha_act.gif" width="100%" alt="ACT policy on ALOHA env"/></td>
+    <td><img src="http://remicadene.com/assets/gif/simxarm_tdmpc.gif" width="100%" alt="TDMPC policy on SimXArm env"/></td>
+    <td><img src="http://remicadene.com/assets/gif/pusht_diffusion.gif" width="100%" alt="Diffusion policy on PushT env"/></td>
  </tr>
  <tr>
    <td align="center">ACT policy on ALOHA env</td>
@@ -144,7 +144,7 @@ wandb login

 ### Visualize datasets

-Check out [example 1](./examples/1_load_lerobot_dataset.py) that illustrates how to use our dataset class which automatically downloads data from the Hugging Face hub.
+Check out [example 1](./examples/1_load_lerobot_dataset.py) that illustrates how to use our dataset class which automatically download data from the Hugging Face hub.

 You can also locally visualize episodes from a dataset on the hub by executing our script from the command line:
 ```bash
@@ -280,7 +280,7 @@ To use wandb for logging training and evaluation curves, make sure you've run `w
    wandb.enable=true
 ```

-A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser. Please also check [here](https://github.com/huggingface/lerobot/blob/main/examples/4_train_policy_with_script.md#typical-logs-and-metrics) for the explanation of some commonly used metrics in logs.
+A link to the wandb logs for the run will also show up in yellow in your terminal. Here is an example of what they look like in your browser. Please also check [here](https://github.com/huggingface/lerobot/blob/main/examples/4_train_policy_with_script.md#typical-logs-and-metrics) for the explaination of some commonly used metrics in logs.

 ![](media/wandb.png)

--- a/examples/10_use_so100.md
+++ b/examples/10_use_so100.md
@@ -123,22 +123,22 @@ Follow step 4 of the [assembly video](https://www.youtube.com/watch?v=FioA2oeFZ5

 Next, you'll need to calibrate your SO-100 robot to ensure that the leader and follower arms have the same position values when they are in the same physical position. This calibration is essential because it allows a neural network trained on one SO-100 robot to work on another.

-**Manual calibration of follower arm**
-/!\ Contrarily to step 6 of the [assembly video](https://www.youtube.com/watch?v=FioA2oeFZ5I) which illustrates the auto calibration, we will actually do manual calibration of follower for now.
+**Auto-calibration of follower arm**
+Follow step 5 of the [assembly video](https://www.youtube.com/watch?v=FioA2oeFZ5I) which illustrates the auto-calibration of the follower arm. You first need to manually move your follower arm to this initial position:

-You will need to move the follower arm to these positions sequentially:
+<div style="text-align:center;">
+  <img src="../media/so100/follower_initial.webp?raw=true" alt="SO-100 follower arm initial position" title="SO-100 follower arm initial position" width="50%">
+</div>

-| 1. Zero position | 2. Rotated position | 3. Rest position |
-|---|---|---|
-| <img src="../media/so100/follower_zero.webp?raw=true" alt="SO-100 follower arm zero position" title="SO-100 follower arm zero position" style="width:100%;"> | <img src="../media/so100/follower_rotated.webp?raw=true" alt="SO-100 follower arm rotated position" title="SO-100 follower arm rotated position" style="width:100%;"> | <img src="../media/so100/follower_rest.webp?raw=true" alt="SO-100 follower arm rest position" title="SO-100 follower arm rest position" style="width:100%;"> |
-
-Make sure both arms are connected and run this script to launch manual calibration:
+Then run this script to launch auto-calibration:
 ```bash
 python lerobot/scripts/control_robot.py calibrate \
    --robot-path lerobot/configs/robot/so100.yaml \
    --robot-overrides '~cameras' --arms main_follower
 ```

+Note: You can't run auto-calibration for the leader arm, since we removed the gears. Thus, you will need to manually calibrate the leader arm. It's less precise than auto-calibration, but precision is not as critical for the leader arm.
+
 **Manual calibration of leader arm**
 Follow step 6 of the [assembly video](https://www.youtube.com/watch?v=FioA2oeFZ5I) which illustrates the manual calibration. You will need to move the leader arm to these positions sequentially:

--- a/examples/11_use_moss.md
+++ b/examples/11_use_moss.md
@@ -123,22 +123,22 @@ Follow step 4 of the [assembly video](https://www.youtube.com/watch?v=DA91NJOtMi

 Next, you'll need to calibrate your Moss v1 robot to ensure that the leader and follower arms have the same position values when they are in the same physical position. This calibration is essential because it allows a neural network trained on one Moss v1 robot to work on another.

-**Manual calibration of follower arm**
-/!\ Contrarily to step 6 of the [assembly video](https://www.youtube.com/watch?v=DA91NJOtMic) which illustrates the auto calibration, we will actually do manual calibration of follower for now.
+**Auto-calibration of follower arm**
+Follow step 5 of the [assembly video](https://www.youtube.com/watch?v=DA91NJOtMic) which illustrates the auto-calibration of the follower arm. You first need to manually move your follower arm to this initial position:

-You will need to move the follower arm to these positions sequentially:
+<div style="text-align:center;">
+  <img src="../media/moss/follower_initial.webp?raw=true" alt="Moss v1 follower arm initial position" title="Moss v1 follower arm initial position" width="50%">
+</div>

-| 1. Zero position | 2. Rotated position | 3. Rest position |
-|---|---|---|
-| <img src="../media/moss/follower_zero.webp?raw=true" alt="Moss v1 follower arm zero position" title="Moss v1 follower arm zero position" style="width:100%;"> | <img src="../media/moss/follower_rotated.webp?raw=true" alt="Moss v1 follower arm rotated position" title="Moss v1 follower arm rotated position" style="width:100%;"> | <img src="../media/moss/follower_rest.webp?raw=true" alt="Moss v1 follower arm rest position" title="Moss v1 follower arm rest position" style="width:100%;"> |
-
-Make sure both arms are connected and run this script to launch manual calibration:
+Then run this script to launch auto-calibration:
 ```bash
 python lerobot/scripts/control_robot.py calibrate \
    --robot-path lerobot/configs/robot/moss.yaml \
    --robot-overrides '~cameras' --arms main_follower
 ```

+Note: You can't run auto-calibration for the leader arm, since we removed the gears. Thus, you will need to manually calibrate the leader arm. It's less precise than auto-calibration, but precision is not as critical for the leader arm.
+
 **Manual calibration of leader arm**
 Follow step 6 of the [assembly video](https://www.youtube.com/watch?v=DA91NJOtMic) which illustrates the manual calibration. You will need to move the leader arm to these positions sequentially:

--- a/examples/test.py
+++ b/examples/test.py
@@ -0,0 +1,681 @@
+import threading
+import time
+from typing import Callable
+
+import cv2
+import numpy as np
+
+# from qai_hub_models.models.mediapipe_hand.app import MediaPipeHandApp
+# from qai_hub_models.models.mediapipe_hand.model import (
+#     MediaPipeHand,
+# )
+# from qai_hub_models.utils.image_processing import (
+#     app_to_net_image_inputs,
+# )
+from lerobot.common.robot_devices.motors.feetech import (
+    CalibrationMode,
+    FeetechMotorsBus,
+)
+
+LOWER_BOUND_LINEAR = -100
+UPPER_BOUND_LINEAR = 200
+
+import serial
+
+
+class HomonculusGlove:
+    def __init__(self):
+        self.serial_port = "/dev/tty.usbmodem21401"
+        self.baud_rate = 115200
+        self.serial = serial.Serial(self.serial_port, self.baud_rate, timeout=1)
+        self.thread = threading.Thread(target=self.async_read)
+        self.thread.start()
+        self.last_d = {
+            "thumb_0": 100,
+            "thumb_1": 100,
+            "thumb_2": 100,
+            "thumb_3": 100,
+            "index_0": 100,
+            "index_1": 100,
+            "index_2": 100,
+            "middle_0": 100,
+            "middle_1": 100,
+            "middle_2": 100,
+            "ring_0": 100,
+            "ring_1": 100,
+            "ring_2": 100,
+            "pinky_0": 100,
+            "pinky_1": 100,
+            "pinky_2": 100,
+            "battery_voltage": 100,
+        }
+        self.calibration = None
+
+    @property
+    def joint_names(self):
+        return list(self.last_d.keys())
+
+    def read(self, motor_names: list[str] | None = None):
+        if motor_names is None:
+            motor_names = self.joint_names
+
+        values = np.array([self.last_d[k] for k in motor_names])
+
+        print(motor_names)
+        print(values)
+
+        if self.calibration is not None:
+            values = self.apply_calibration(values, motor_names)
+            print(values)
+        return values
+
+    def async_read(self):
+        while True:
+            if self.serial.in_waiting > 0:
+                self.serial.flush()
+                vals = self.serial.readline().decode("utf-8").strip()
+                vals = vals.split(" ")
+                if len(vals) != 17:
+                    continue
+                vals = [int(val) for val in vals]
+
+                d = {
+                    "thumb_0": vals[0],
+                    "thumb_1": vals[1],
+                    "thumb_2": vals[2],
+                    "thumb_3": vals[3],
+                    "index_0": vals[4],
+                    "index_1": vals[5],
+                    "index_2": vals[6],
+                    "middle_0": vals[7],
+                    "middle_1": vals[8],
+                    "middle_2": vals[9],
+                    "ring_0": vals[10],
+                    "ring_1": vals[11],
+                    "ring_2": vals[12],
+                    "pinky_0": vals[13],
+                    "pinky_1": vals[14],
+                    "pinky_2": vals[15],
+                    "battery_voltage": vals[16],
+                }
+                self.last_d = d
+                # print(d.values())
+
+    def run_calibration(self):
+        print("\nMove arm to open position")
+        input("Press Enter to continue...")
+        open_pos_list = []
+        for _ in range(300):
+            open_pos = self.read()
+            open_pos_list.append(open_pos)
+            time.sleep(0.01)
+        open_pos = np.array(open_pos_list)
+        max_open_pos = open_pos.max(axis=0)
+        min_open_pos = open_pos.min(axis=0)
+
+        print(f"{max_open_pos=}")
+        print(f"{min_open_pos=}")
+
+        print("\nMove arm to closed position")
+        input("Press Enter to continue...")
+        closed_pos_list = []
+        for _ in range(300):
+            closed_pos = self.read()
+            closed_pos_list.append(closed_pos)
+            time.sleep(0.01)
+        closed_pos = np.array(closed_pos_list)
+        max_closed_pos = closed_pos.max(axis=0)
+        closed_pos[closed_pos < 1000] = 60000
+        min_closed_pos = closed_pos.min(axis=0)
+
+        print(f"{max_closed_pos=}")
+        print(f"{min_closed_pos=}")
+
+        open_pos = np.array([max_open_pos, max_closed_pos]).max(axis=0)
+        closed_pos = np.array([min_open_pos, min_closed_pos]).min(axis=0)
+
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        for i, jname in enumerate(self.joint_names):
+            if jname in ["thumb_0", "thumb_3", "index_2", "middle_2", "ring_2", "pinky_0", "pinky_2"]:
+                tmp_pos = open_pos[i]
+                open_pos[i] = closed_pos[i]
+                closed_pos[i] = tmp_pos
+
+        print()
+        print(f"{open_pos=}")
+        print(f"{closed_pos=}")
+
+        homing_offset = [0] * len(self.joint_names)
+        drive_mode = [0] * len(self.joint_names)
+        calib_modes = [CalibrationMode.LINEAR.name] * len(self.joint_names)
+
+        calib_dict = {
+            "homing_offset": homing_offset,
+            "drive_mode": drive_mode,
+            "start_pos": open_pos,
+            "end_pos": closed_pos,
+            "calib_mode": calib_modes,
+            "motor_names": self.joint_names,
+        }
+        # return calib_dict
+        self.set_calibration(calib_dict)
+
+    def set_calibration(self, calibration: dict[str, list]):
+        self.calibration = calibration
+
+    def apply_calibration(self, values: np.ndarray | list, motor_names: list[str] | None):
+        """Convert from unsigned int32 joint position range [0, 2**32[ to the universal float32 nominal degree range ]-180.0, 180.0[ with
+        a "zero position" at 0 degree.
+
+        Note: We say "nominal degree range" since the motors can take values outside this range. For instance, 190 degrees, if the motor
+        rotate more than a half a turn from the zero position. However, most motors can't rotate more than 180 degrees and will stay in this range.
+
+        Joints values are original in [0, 2**32[ (unsigned int32). Each motor are expected to complete a full rotation
+        when given a goal position that is + or - their resolution. For instance, feetech xl330-m077 have a resolution of 4096, and
+        at any position in their original range, let's say the position 56734, they complete a full rotation clockwise by moving to 60830,
+        or anticlockwise by moving to 52638. The position in the original range is arbitrary and might change a lot between each motor.
+        To harmonize between motors of the same model, different robots, or even models of different brands, we propose to work
+        in the centered nominal degree range ]-180, 180[.
+        """
+        if motor_names is None:
+            motor_names = self.motor_names
+
+        # Convert from unsigned int32 original range [0, 2**32] to signed float32 range
+        values = values.astype(np.float32)
+
+        for i, name in enumerate(motor_names):
+            calib_idx = self.calibration["motor_names"].index(name)
+            calib_mode = self.calibration["calib_mode"][calib_idx]
+
+            if CalibrationMode[calib_mode] == CalibrationMode.LINEAR:
+                start_pos = self.calibration["start_pos"][calib_idx]
+                end_pos = self.calibration["end_pos"][calib_idx]
+
+                # Rescale the present position to a nominal range [0, 100] %,
+                # useful for joints with linear motions like Aloha gripper
+                values[i] = (values[i] - start_pos) / (end_pos - start_pos) * 100
+
+                if (values[i] < LOWER_BOUND_LINEAR) or (values[i] > UPPER_BOUND_LINEAR):
+                    if name == "pinky_1" and (values[i] < LOWER_BOUND_LINEAR):
+                        values[i] = end_pos
+                    else:
+                        msg = (
+                            f"Wrong motor position range detected for {name}. "
+                            f"Expected to be in nominal range of [0, 100] % (a full linear translation), "
+                            f"with a maximum range of [{LOWER_BOUND_LINEAR}, {UPPER_BOUND_LINEAR}] % to account for some imprecision during calibration, "
+                            f"but present value is {values[i]} %. "
+                            "This might be due to a cable connection issue creating an artificial jump in motor values. "
+                            "You need to recalibrate by running: `python lerobot/scripts/control_robot.py calibrate`"
+                        )
+                        print(msg)
+                        # raise JointOutOfRangeError(msg)
+
+        return values
+
+    # def revert_calibration(self, values: np.ndarray | list, motor_names: list[str] | None):
+    #     """Inverse of `apply_calibration`."""
+    #     if motor_names is None:
+    #         motor_names = self.motor_names
+
+    #     for i, name in enumerate(motor_names):
+    #         calib_idx = self.calibration["motor_names"].index(name)
+    #         calib_mode = self.calibration["calib_mode"][calib_idx]
+
+    #         if CalibrationMode[calib_mode] == CalibrationMode.LINEAR:
+    #             start_pos = self.calibration["start_pos"][calib_idx]
+    #             end_pos = self.calibration["end_pos"][calib_idx]
+
+    #             # Convert from nominal lnear range of [0, 100] % to
+    #             # actual motor range of values which can be arbitrary.
+    #             values[i] = values[i] / 100 * (end_pos - start_pos) + start_pos
+
+    #     values = np.round(values).astype(np.int32)
+    #     return values
+
+
+class HopeJuniorRobot:
+    def __init__(self):
+        self.arm_bus = FeetechMotorsBus(
+            port="/dev/tty.usbmodem58760429571",
+            motors={
+                # "motor1": (2, "sts3250"),
+                # "motor2": (1, "scs0009"),
+                "shoulder_pitch": [1, "sts3250"],
+                "shoulder_yaw": [2, "sts3215"],  # TODO: sts3250
+                "shoulder_roll": [3, "sts3215"],  # TODO: sts3250
+                "elbow_flex": [4, "sts3250"],
+                "wrist_roll": [5, "sts3215"],
+                "wrist_yaw": [6, "sts3215"],
+                "wrist_pitch": [7, "sts3215"],
+            },
+            protocol_version=0,
+        )
+        self.hand_bus = FeetechMotorsBus(
+            port="/dev/tty.usbmodem585A0077581",
+            motors={
+                "thumb_basel_rotation": [30, "scs0009"],
+                "thumb_flexor": [27, "scs0009"],
+                "thumb_pinky_side": [26, "scs0009"],
+                "thumb_thumb_side": [28, "scs0009"],
+                "index_flexor": [25, "scs0009"],
+                "index_pinky_side": [31, "scs0009"],
+                "index_thumb_side": [32, "scs0009"],
+                "middle_flexor": [24, "scs0009"],
+                "middle_pinky_side": [33, "scs0009"],
+                "middle_thumb_side": [34, "scs0009"],
+                "ring_flexor": [21, "scs0009"],
+                "ring_pinky_side": [36, "scs0009"],
+                "ring_thumb_side": [35, "scs0009"],
+                "pinky_flexor": [23, "scs0009"],
+                "pinky_pinky_side": [38, "scs0009"],
+                "pinky_thumb_side": [37, "scs0009"],
+            },
+            protocol_version=1,
+            group_sync_read=False,
+        )
+
+    def get_hand_calibration(self):
+        homing_offset = [0] * len(self.hand_bus.motor_names)
+        drive_mode = [0] * len(self.hand_bus.motor_names)
+
+        start_pos = [
+            500,
+            900,
+            1000,
+            0,
+            100,
+            250,
+            750,
+            100,
+            400,
+            150,
+            100,
+            120,
+            980,
+            100,
+            950,
+            750,
+        ]
+
+        end_pos = [
+            500 - 250,
+            900 - 300,
+            1000 - 550,
+            0 + 550,
+            1000,
+            250 + 700,
+            750 - 700,
+            1000,
+            400 + 700,
+            150 + 700,
+            1000,
+            120 + 700,
+            980 - 700,
+            1000,
+            950 - 700,
+            750 - 700,
+        ]
+
+        calib_modes = [CalibrationMode.LINEAR.name] * len(self.hand_bus.motor_names)
+
+        calib_dict = {
+            "homing_offset": homing_offset,
+            "drive_mode": drive_mode,
+            "start_pos": start_pos,
+            "end_pos": end_pos,
+            "calib_mode": calib_modes,
+            "motor_names": self.hand_bus.motor_names,
+        }
+        return calib_dict
+
+    def connect(self):
+        self.arm_bus.connect()
+        self.hand_bus.connect()
+
+
+ESCAPE_KEY_ID = 27
+
+
+def capture_and_display_processed_frames(
+    frame_processor: Callable[[np.ndarray], np.ndarray],
+    window_display_name: str,
+    cap_device: int = 0,
+) -> None:
+    """
+    Capture frames from the given input camera device, run them through
+    the frame processor, and display the outputs in a window with the given name.
+
+    User should press Esc to exit.
+
+    Inputs:
+        frame_processor: Callable[[np.ndarray], np.ndarray]
+            Processes frames.
+            Input and output are numpy arrays of shape (H W C) with BGR channel layout and dtype uint8 / byte.
+        window_display_name: str
+            Name of the window used to display frames.
+        cap_device: int
+            Identifier for the camera to use to capture frames.
+    """
+    cv2.namedWindow(window_display_name)
+    capture = cv2.VideoCapture(cap_device)
+    if not capture.isOpened():
+        raise ValueError("Unable to open video capture.")
+
+    frame_count = 0
+    has_frame, frame = capture.read()
+    while has_frame:
+        assert isinstance(frame, np.ndarray)
+
+        frame_count = frame_count + 1
+        # mirror frame
+        frame = np.ascontiguousarray(frame[:, ::-1, ::-1])
+
+        # process & show frame
+        processed_frame = frame_processor(frame)
+        cv2.imshow(window_display_name, processed_frame[:, :, ::-1])
+
+        has_frame, frame = capture.read()
+        key = cv2.waitKey(1)
+        if key == ESCAPE_KEY_ID:
+            break
+
+    capture.release()
+
+
+def main():
+    robot = HopeJuniorRobot()
+    robot.connect()
+
+    # robot.hand_bus.calibration = None
+
+    # breakpoint()
+    # print(robot.arm_bus.read("Present_Position"))
+    robot.arm_bus.write("Torque_Enable", 1)
+    robot.arm_bus.write("Acceleration", 20)
+    robot.arm_bus.read("Acceleration")
+
+    calibration = robot.get_hand_calibration()
+    robot.hand_bus.write("Goal_Position", calibration["start_pos"])
+    # robot.hand_bus.write("Goal_Position", calibration["end_pos"][:4], robot.hand_bus.motor_names[:4])
+    robot.hand_bus.set_calibration(calibration)
+    lol = 1
+
+    # # print(motors_bus.write("Goal_Position", 500))
+    # print(robot.hand_bus.read("Present_Position"))
+    # # pos = hand_bus.read("Present_Position")
+    # # hand_bus.write("Goal_Position", pos[0]+20, hand_bus.motor_names[0])
+    # # hand_bus.write("Goal_Position", pos[i]+delta, hand_bus.motor_names[i])
+    # robot.hand_bus.read("Acceleration")
+    # robot.hand_bus.write("Acceleration", 10)
+
+    # sleep = 1
+    # # robot.hand_bus.write(
+    # #     "Goal_Position", [glove.last_d['index_2']-1500,300,300], ["index_pinky_side", "index_flexor", "index_thumb_side"]
+    # # )
+    # #time.sleep(sleep)
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [100, 100, 100], ["index_flexor", "index_pinky_side", "index_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [100, 0, 0], ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [200, 200, 0], ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [200, 100, 600], ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+    # )
+    # time.sleep(sleep)
+
+    # breakpoint()
+
+    glove = HomonculusGlove()
+    glove.run_calibration()
+    # while True:
+    #     joint_names = ["index_1", "index_2"]
+    #     joint_values = glove.read(joint_names)
+    #     print(joint_values)
+
+    input()
+    while True:
+        joint_names = []
+        joint_names += ["thumb_0", "thumb_2", "thumb_3"]
+        joint_names += ["index_1", "index_2"]
+        joint_names += ["middle_1", "middle_2"]
+        joint_names += ["ring_1", "ring_2"]
+        joint_names += ["pinky_1", "pinky_2"]
+        joint_values = glove.read(joint_names)
+        joint_values = joint_values.round().astype(int)
+        joint_dict = {k: v for k, v in zip(joint_names, joint_values, strict=False)}
+
+        motor_values = []
+        motor_names = []
+        motor_names += ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"]
+        motor_values += [
+            joint_dict["thumb_3"],
+            joint_dict["thumb_0"],
+            joint_dict["thumb_2"],
+            joint_dict["thumb_2"],
+        ]
+        motor_names += ["index_flexor", "index_pinky_side", "index_thumb_side"]
+        motor_values += [joint_dict["index_2"], joint_dict["index_1"], joint_dict["index_1"]]
+        motor_names += ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+        motor_values += [joint_dict["middle_2"], joint_dict["middle_1"], joint_dict["middle_1"]]
+        motor_names += ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+        motor_values += [joint_dict["ring_2"], joint_dict["ring_1"], joint_dict["ring_1"]]
+        motor_names += ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+
+        motor_values += [joint_dict["pinky_2"], joint_dict["pinky_1"], joint_dict["pinky_1"]]
+
+        motor_values = np.array(motor_values)
+        motor_values = np.clip(motor_values, 0, 100)
+
+        robot.hand_bus.write("Goal_Position", motor_values, motor_names)
+        time.sleep(0.02)
+
+    while True:
+        # print(glove.read()['index_2']-1500)
+        glove_index_flexor = glove.read()["index_2"] - 1500
+        glove_index_subflexor = glove.read()["index_1"] - 1500
+        glove_index_side = glove.read()["index_0"] - 2100
+
+        vals = [glove_index_flexor, 1000 - (glove_index_subflexor), glove_index_subflexor]
+
+        keys = ["index_flexor", "index_pinky_side", "index_thumb_side"]
+
+        glove_middle_flexor = glove.read()["middle_2"] - 1500
+        glove_middle_subflexor = 1000 - (glove.read()["middle_1"] - 1700)
+        vals += [glove_middle_flexor, glove_middle_subflexor, glove_middle_subflexor - 200]
+        keys += ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+
+        glove_ring_flexor = glove.read()["ring_2"] - 1300
+        print(glove_ring_flexor)
+        glove_ring_subflexor = glove.read()["ring_1"] - 1100
+
+        vals += [glove_ring_flexor, 1000 - glove_ring_subflexor, glove_ring_subflexor]
+        keys += ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+
+        glove_pinky_flexor = glove.read()["pinky_2"] - 1500
+        glove_pinky_subflexor = glove.read()["pinky_1"] - 1300
+        vals += [300 + glove_pinky_flexor, max(1000 - glove_pinky_subflexor - 100, 0), glove_pinky_subflexor]
+        keys += ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+
+        robot.hand_bus.write("Goal_Position", vals, keys)
+        time.sleep(0.1)
+
+    time.sleep(3)
+
+    def move_arm(loop=10):
+        sleep = 1
+        for i in range(loop):
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1195])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 2195])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1457, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 2357, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 974, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 2674, 1957, 1695])
+            time.sleep(sleep + 2)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 1632, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 1369, 1632, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 1330, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [2381, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1681, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+
+    def move_hand(loop=10):
+        sleep = 0.5
+        for i in range(loop):
+            robot.hand_bus.write(
+                "Goal_Position",
+                [500, 1000, 0, 1000],
+                ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [100, 100, 100], ["index_flexor", "index_pinky_side", "index_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [100, 1000, 150], ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [200, 200, 0], ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [200, 100, 700], ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+            )
+            time.sleep(sleep)
+
+            robot.hand_bus.write(
+                "Goal_Position",
+                [500, 1000 - 250, 0 + 300, 1000 - 200],
+                ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [100 + 450, 100 + 400, 100 + 400],
+                ["index_flexor", "index_pinky_side", "index_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [100 + 350, 1000 - 450, 150 + 450],
+                ["middle_flexor", "middle_pinky_side", "middle_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [200 + 650, 200 + 350, 0 + 350],
+                ["ring_flexor", "ring_pinky_side", "ring_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [200 + 450, 100 + 400, 700 - 400],
+                ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"],
+            )
+            time.sleep(sleep)
+
+    move_hand(3)
+
+    move_arm(1)
+
+    from concurrent.futures import ThreadPoolExecutor
+
+    with ThreadPoolExecutor() as executor:
+        executor.submit(move_arm)
+        executor.submit(move_hand)
+
+    # initial position
+    for i in range(3):
+        robot.hand_bus.write(
+            "Goal_Position", [500, 1000, 0, 1000, 100, 950, 100, 100, 1000, 150, 200, 200, 0, 200, 100, 700]
+        )
+        time.sleep(1)
+
+    # for i in range(3):
+    #     robot.hand_bus.write("Goal_Position", [500, 1000-150, 0+250, 1000-150,
+    #                                             100+300, 950-250, 100+250,
+    #                                             100+200, 1000-300, 150+300,
+    #                                             200+500, 200+200, 0+200,
+    #                                             200+300, 100+200, 700-200])
+    #     time.sleep(1)
+
+    # camera = 0
+    # score_threshold = 0.95
+    # iou_threshold = 0.3
+
+    # app = MediaPipeHandApp(MediaPipeHand.from_pretrained(), score_threshold, iou_threshold)
+
+    # def frame_processor(frame: np.ndarray) -> np.ndarray:
+    #     # Input Prep
+    #     NHWC_int_numpy_frames, NCHW_fp32_torch_frames = app_to_net_image_inputs(frame)
+
+    #     # Run Bounding Box & Keypoint Detector
+    #     batched_selected_boxes, batched_selected_keypoints = app._run_box_detector(NCHW_fp32_torch_frames)
+
+    #     # The region of interest ( bounding box of 4 (x, y) corners).
+    #     # list[torch.Tensor(shape=[Num Boxes, 4, 2])],
+    #     # where 2 == (x, y)
+    #     #
+    #     # A list element will be None if there is no selected ROI.
+    #     batched_roi_4corners = app._compute_object_roi(batched_selected_boxes, batched_selected_keypoints)
+
+    #     # selected landmarks for the ROI (if any)
+    #     # list[torch.Tensor(shape=[Num Selected Landmarks, K, 3])],
+    #     # where K == number of landmark keypoints, 3 == (x, y, confidence)
+    #     #
+    #     # A list element will be None if there is no ROI.
+    #     landmarks_out = app._run_landmark_detector(NHWC_int_numpy_frames, batched_roi_4corners)
+
+    #     app._draw_predictions(
+    #         NHWC_int_numpy_frames,
+    #         batched_selected_boxes,
+    #         batched_selected_keypoints,
+    #         batched_roi_4corners,
+    #         *landmarks_out,
+    #     )
+
+    #     return NHWC_int_numpy_frames[0]
+
+    # capture_and_display_processed_frames(frame_processor, "QAIHM Mediapipe Hand Demo", camera)
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/test2.py
+++ b/examples/test2.py
@@ -0,0 +1,133 @@
+#!/usr/bin/env python
+#
+# *********     Ping Example      *********
+#
+#
+# Available SCServo model on this example : All models using Protocol SCS
+# This example is tested with a SCServo(STS/SMS/SCS), and an URT
+# Be sure that SCServo(STS/SMS/SCS) properties are already set as %% ID : 1 / Baudnum : 6 (Baudrate : 1000000)
+#
+
+import os
+
+if os.name == "nt":
+    import msvcrt
+
+    def getch():
+        return msvcrt.getch().decode()
+else:
+    import sys
+    import termios
+    import tty
+
+    fd = sys.stdin.fileno()
+    old_settings = termios.tcgetattr(fd)
+
+    def getch():
+        try:
+            tty.setraw(sys.stdin.fileno())
+            ch = sys.stdin.read(1)
+        finally:
+            termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
+        return ch
+
+
+from scservo_sdk import *  # Uses SCServo SDK library
+
+# Default setting
+SCS_ID = 1  # SCServo ID : 1
+BAUDRATE = 1000000  # SCServo default baudrate : 1000000
+DEVICENAME = "/dev/tty.usbserial-2130"  # Check which port is being used on your controller
+# ex) Windows: "COM1"   Linux: "/dev/ttyUSB0" Mac: "/dev/tty.usbserial-*"
+
+protocol_end = 1  # SCServo bit end(STS/SMS=0, SCS=1)
+
+# Initialize PortHandler instance
+# Set the port path
+# Get methods and members of PortHandlerLinux or PortHandlerWindows
+portHandler = PortHandler(DEVICENAME)
+
+# Initialize PacketHandler instance
+# Get methods and members of Protocol
+packetHandler = PacketHandler(protocol_end)
+
+# Open port
+if portHandler.openPort():
+    print("Succeeded to open the port")
+else:
+    print("Failed to open the port")
+    print("Press any key to terminate...")
+    getch()
+    quit()
+
+
+# Set port baudrate
+if portHandler.setBaudRate(BAUDRATE):
+    print("Succeeded to change the baudrate")
+else:
+    print("Failed to change the baudrate")
+    print("Press any key to terminate...")
+    getch()
+    quit()
+
+# Try to ping the SCServo
+# Get SCServo model number
+scs_model_number, scs_comm_result, scs_error = packetHandler.ping(portHandler, SCS_ID)
+if scs_comm_result != COMM_SUCCESS:
+    print("%s" % packetHandler.getTxRxResult(scs_comm_result))
+elif scs_error != 0:
+    print("%s" % packetHandler.getRxPacketError(scs_error))
+else:
+    print("[ID:%03d] ping Succeeded. SCServo model number : %d" % (SCS_ID, scs_model_number))
+
+
+ADDR_SCS_PRESENT_POSITION = 56
+scs_present_position, scs_comm_result, scs_error = packetHandler.read2ByteTxRx(
+    portHandler, SCS_ID, ADDR_SCS_PRESENT_POSITION
+)
+if scs_comm_result != COMM_SUCCESS:
+    print(packetHandler.getTxRxResult(scs_comm_result))
+elif scs_error != 0:
+    print(packetHandler.getRxPacketError(scs_error))
+
+breakpoint()
+scs_present_position = SCS_LOWORD(scs_present_position)
+# scs_present_speed = SCS_HIWORD(scs_present_position_speed)
+# print("[ID:%03d] PresPos:%03d PresSpd:%03d" % (SCS_ID, scs_present_position, SCS_TOHOST(scs_present_speed, 15)))
+print("[ID:%03d] PresPos:%03d" % (SCS_ID, scs_present_position))
+
+groupSyncRead = GroupSyncRead(portHandler, packetHandler, ADDR_SCS_PRESENT_POSITION, 2)
+
+scs_addparam_result = groupSyncRead.addParam(SCS_ID)
+if scs_addparam_result != True:
+    print("[ID:%03d] groupSyncRead addparam failed" % SCS_ID)
+    quit()
+
+# Syncread present position
+scs_comm_result = groupSyncRead.txRxPacket()
+if scs_comm_result != COMM_SUCCESS:
+    print("%s" % packetHandler.getTxRxResult(scs_comm_result))
+
+# Check if groupsyncread data of SCServo#1 is available
+scs_getdata_result = groupSyncRead.isAvailable(SCS_ID, ADDR_SCS_PRESENT_POSITION, 2)
+if scs_getdata_result == True:
+    # Get SCServo#1 present position value
+    scs_present_position = groupSyncRead.getData(SCS_ID, ADDR_SCS_PRESENT_POSITION, 2)
+else:
+    scs_present_position = 0
+    print("[ID:%03d] groupSyncRead getdata failed" % SCS_ID)
+
+# # Check if groupsyncread data of SCServo#2 is available
+# scs_getdata_result = groupSyncRead.isAvailable(SCS2_ID, ADDR_SCS_PRESENT_POSITION, 2)
+# if scs_getdata_result == True:
+#     # Get SCServo#2 present position value
+#     scs2_present_position_speed = groupSyncRead.getData(SCS2_ID, ADDR_SCS_PRESENT_POSITION, 2)
+# else:
+#     print("[ID:%03d] groupSyncRead getdata failed" % SCS2_ID)
+
+scs_present_position = SCS_LOWORD(scs_present_position)
+print("[ID:%03d] PresPos:%03d" % (SCS_ID, scs_present_position))
+
+
+# Close port
+portHandler.closePort()
--- a/examples/test3.py
+++ b/examples/test3.py
@@ -0,0 +1,45 @@
+import serial
+
+
+class HomonculusGlove:
+    def __init__(self):
+        self.serial_port = "/dev/tty.usbmodem1101"
+        self.baud_rate = 115200
+        self.serial = serial.Serial(self.serial_port, self.baud_rate, timeout=1)
+
+    def read(self):
+        while True:
+            if self.serial.in_waiting > 0:
+                vals = self.serial.readline().decode("utf-8").strip()
+                vals = vals.split(" ")
+                vals = [int(val) for val in vals]
+
+                d = {
+                    "thumb_0": vals[0],
+                    "thumb_1": vals[1],
+                    "thumb_2": vals[2],
+                    "thumb_3": vals[3],
+                    "index_0": vals[4],
+                    "index_1": vals[5],
+                    "index_2": vals[6],
+                    "middle_0": vals[7],
+                    "middle_1": vals[8],
+                    "middle_2": vals[9],
+                    "ring_0": vals[10],
+                    "ring_1": vals[11],
+                    "ring_2": vals[12],
+                    "pinky_0": vals[13],
+                    "pinky_1": vals[14],
+                    "pinky_2": vals[15],
+                }
+                return d
+
+        # if ser.in_waiting > 0:
+        #     line = ser.readline().decode('utf-8').strip()
+        #     print(line)
+
+
+if __name__ == "__main__":
+    glove = HomonculusGlove()
+    d = glove.read()
+    lol = 1
--- a/examples/test4.py
+++ b/examples/test4.py
@@ -0,0 +1,693 @@
+import threading
+import time
+from typing import Callable
+
+import cv2
+import numpy as np
+
+# from qai_hub_models.models.mediapipe_hand.app import MediaPipeHandApp
+# from qai_hub_models.models.mediapipe_hand.model import (
+#     MediaPipeHand,
+# )
+# from qai_hub_models.utils.image_processing import (
+#     app_to_net_image_inputs,
+# )
+from lerobot.common.robot_devices.motors.feetech import (
+    CalibrationMode,
+    FeetechMotorsBus,
+)
+
+LOWER_BOUND_LINEAR = -100
+UPPER_BOUND_LINEAR = 200
+
+import serial
+
+
+class HomonculusGlove:
+    def __init__(self):
+        self.serial_port = "/dev/tty.usbmodem1401"
+        self.baud_rate = 115200
+        self.serial = serial.Serial(self.serial_port, self.baud_rate, timeout=1)
+        self.thread = threading.Thread(target=self.async_read)
+        self.thread.start()
+        self.last_d = {
+            "thumb_0": 100,
+            "thumb_1": 100,
+            "thumb_2": 100,
+            "thumb_3": 100,
+            "index_0": 100,
+            "index_1": 100,
+            "index_2": 100,
+            "middle_0": 100,
+            "middle_1": 100,
+            "middle_2": 100,
+            "ring_0": 100,
+            "ring_1": 100,
+            "ring_2": 100,
+            "pinky_0": 100,
+            "pinky_1": 100,
+            "pinky_2": 100,
+            "battery_voltage": 100,
+        }
+        self.calibration = None
+
+    @property
+    def joint_names(self):
+        return list(self.last_d.keys())
+
+    def read(self, motor_names: list[str] | None = None):
+        if motor_names is None:
+            motor_names = self.joint_names
+
+        values = np.array([self.last_d[k] for k in motor_names])
+
+        print(motor_names)
+        print(values)
+
+        if self.calibration is not None:
+            values = self.apply_calibration(values, motor_names)
+            print(values)
+        return values
+
+    def async_read(self):
+        while True:
+            if self.serial.in_waiting > 0:
+                self.serial.flush()
+                vals = self.serial.readline().decode("utf-8").strip()
+                vals = vals.split(" ")
+                if len(vals) != 17:
+                    continue
+                vals = [int(val) for val in vals]
+
+                d = {
+                    "thumb_0": vals[0],
+                    "thumb_1": vals[1],
+                    "thumb_2": vals[2],
+                    "thumb_3": vals[3],
+                    "index_0": vals[4],
+                    "index_1": vals[5],
+                    "index_2": vals[6],
+                    "middle_0": vals[7],
+                    "middle_1": vals[8],
+                    "middle_2": vals[9],
+                    "ring_0": vals[10],
+                    "ring_1": vals[11],
+                    "ring_2": vals[12],
+                    "pinky_0": vals[13],
+                    "pinky_1": vals[14],
+                    "pinky_2": vals[15],
+                    "battery_voltage": vals[16],
+                }
+                self.last_d = d
+                # print(d.values())
+
+    def run_calibration(self):
+        print("\nMove arm to open position")
+        input("Press Enter to continue...")
+        open_pos_list = []
+        for _ in range(300):
+            open_pos = self.read()
+            open_pos_list.append(open_pos)
+            time.sleep(0.01)
+        open_pos = np.array(open_pos_list)
+        max_open_pos = open_pos.max(axis=0)
+        min_open_pos = open_pos.min(axis=0)
+
+        print(f"{max_open_pos=}")
+        print(f"{min_open_pos=}")
+
+        print("\nMove arm to closed position")
+        input("Press Enter to continue...")
+        closed_pos_list = []
+        for _ in range(300):
+            closed_pos = self.read()
+            closed_pos_list.append(closed_pos)
+            time.sleep(0.01)
+        closed_pos = np.array(closed_pos_list)
+        max_closed_pos = closed_pos.max(axis=0)
+        closed_pos[closed_pos < 1000] = 60000
+        min_closed_pos = closed_pos.min(axis=0)
+
+        print(f"{max_closed_pos=}")
+        print(f"{min_closed_pos=}")
+
+        open_pos = np.array([max_open_pos, max_closed_pos]).max(axis=0)
+        closed_pos = np.array([min_open_pos, min_closed_pos]).min(axis=0)
+
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        # INVERTION
+        for i, jname in enumerate(self.joint_names):
+            if jname in [
+                "thumb_0",
+                "thumb_3",
+                "index_2",
+                "middle_2",
+                "ring_2",
+                "pinky_0",
+                "pinky_2",
+                "index_0",
+            ]:
+                tmp_pos = open_pos[i]
+                open_pos[i] = closed_pos[i]
+                closed_pos[i] = tmp_pos
+
+        print()
+        print(f"{open_pos=}")
+        print(f"{closed_pos=}")
+
+        homing_offset = [0] * len(self.joint_names)
+        drive_mode = [0] * len(self.joint_names)
+        calib_modes = [CalibrationMode.LINEAR.name] * len(self.joint_names)
+
+        calib_dict = {
+            "homing_offset": homing_offset,
+            "drive_mode": drive_mode,
+            "start_pos": open_pos,
+            "end_pos": closed_pos,
+            "calib_mode": calib_modes,
+            "motor_names": self.joint_names,
+        }
+        # return calib_dict
+        self.set_calibration(calib_dict)
+
+    def set_calibration(self, calibration: dict[str, list]):
+        self.calibration = calibration
+
+    def apply_calibration(self, values: np.ndarray | list, motor_names: list[str] | None):
+        """Convert from unsigned int32 joint position range [0, 2**32[ to the universal float32 nominal degree range ]-180.0, 180.0[ with
+        a "zero position" at 0 degree.
+
+        Note: We say "nominal degree range" since the motors can take values outside this range. For instance, 190 degrees, if the motor
+        rotate more than a half a turn from the zero position. However, most motors can't rotate more than 180 degrees and will stay in this range.
+
+        Joints values are original in [0, 2**32[ (unsigned int32). Each motor are expected to complete a full rotation
+        when given a goal position that is + or - their resolution. For instance, feetech xl330-m077 have a resolution of 4096, and
+        at any position in their original range, let's say the position 56734, they complete a full rotation clockwise by moving to 60830,
+        or anticlockwise by moving to 52638. The position in the original range is arbitrary and might change a lot between each motor.
+        To harmonize between motors of the same model, different robots, or even models of different brands, we propose to work
+        in the centered nominal degree range ]-180, 180[.
+        """
+        if motor_names is None:
+            motor_names = self.motor_names
+
+        # Convert from unsigned int32 original range [0, 2**32] to signed float32 range
+        values = values.astype(np.float32)
+
+        for i, name in enumerate(motor_names):
+            calib_idx = self.calibration["motor_names"].index(name)
+            calib_mode = self.calibration["calib_mode"][calib_idx]
+
+            if CalibrationMode[calib_mode] == CalibrationMode.LINEAR:
+                start_pos = self.calibration["start_pos"][calib_idx]
+                end_pos = self.calibration["end_pos"][calib_idx]
+
+                # Rescale the present position to a nominal range [0, 100] %,
+                # useful for joints with linear motions like Aloha gripper
+                values[i] = (values[i] - start_pos) / (end_pos - start_pos) * 100
+
+                if (values[i] < LOWER_BOUND_LINEAR) or (values[i] > UPPER_BOUND_LINEAR):
+                    if name == "pinky_1" and (values[i] < LOWER_BOUND_LINEAR):
+                        values[i] = end_pos
+                    else:
+                        msg = (
+                            f"Wrong motor position range detected for {name}. "
+                            f"Expected to be in nominal range of [0, 100] % (a full linear translation), "
+                            f"with a maximum range of [{LOWER_BOUND_LINEAR}, {UPPER_BOUND_LINEAR}] % to account for some imprecision during calibration, "
+                            f"but present value is {values[i]} %. "
+                            "This might be due to a cable connection issue creating an artificial jump in motor values. "
+                            "You need to recalibrate by running: `python lerobot/scripts/control_robot.py calibrate`"
+                        )
+                        print(msg)
+                        # raise JointOutOfRangeError(msg)
+
+        return values
+
+    # def revert_calibration(self, values: np.ndarray | list, motor_names: list[str] | None):
+    #     """Inverse of `apply_calibration`."""
+    #     if motor_names is None:
+    #         motor_names = self.motor_names
+
+    #     for i, name in enumerate(motor_names):
+    #         calib_idx = self.calibration["motor_names"].index(name)
+    #         calib_mode = self.calibration["calib_mode"][calib_idx]
+
+    #         if CalibrationMode[calib_mode] == CalibrationMode.LINEAR:
+    #             start_pos = self.calibration["start_pos"][calib_idx]
+    #             end_pos = self.calibration["end_pos"][calib_idx]
+
+    #             # Convert from nominal lnear range of [0, 100] % to
+    #             # actual motor range of values which can be arbitrary.
+    #             values[i] = values[i] / 100 * (end_pos - start_pos) + start_pos
+
+    #     values = np.round(values).astype(np.int32)
+    #     return values
+
+
+class HopeJuniorRobot:
+    def __init__(self):
+        self.arm_bus = FeetechMotorsBus(
+            port="/dev/tty.usbmodem58760429571",
+            motors={
+                # "motor1": (2, "sts3250"),
+                # "motor2": (1, "scs0009"),
+                "shoulder_pitch": [1, "sts3250"],
+                "shoulder_yaw": [2, "sts3215"],  # TODO: sts3250
+                "shoulder_roll": [3, "sts3215"],  # TODO: sts3250
+                "elbow_flex": [4, "sts3250"],
+                "wrist_roll": [5, "sts3215"],
+                "wrist_yaw": [6, "sts3215"],
+                "wrist_pitch": [7, "sts3215"],
+            },
+            protocol_version=0,
+        )
+        self.hand_bus = FeetechMotorsBus(
+            port="/dev/tty.usbmodem585A0077581",
+            motors={
+                "thumb_basel_rotation": [30, "scs0009"],
+                "thumb_flexor": [27, "scs0009"],
+                "thumb_pinky_side": [26, "scs0009"],
+                "thumb_thumb_side": [28, "scs0009"],
+                "index_flexor": [25, "scs0009"],
+                "index_pinky_side": [31, "scs0009"],
+                "index_thumb_side": [32, "scs0009"],
+                "middle_flexor": [24, "scs0009"],
+                "middle_pinky_side": [33, "scs0009"],
+                "middle_thumb_side": [34, "scs0009"],
+                "ring_flexor": [21, "scs0009"],
+                "ring_pinky_side": [36, "scs0009"],
+                "ring_thumb_side": [35, "scs0009"],
+                "pinky_flexor": [23, "scs0009"],
+                "pinky_pinky_side": [38, "scs0009"],
+                "pinky_thumb_side": [37, "scs0009"],
+            },
+            protocol_version=1,
+            group_sync_read=False,
+        )
+
+    def get_hand_calibration(self):
+        homing_offset = [0] * len(self.hand_bus.motor_names)
+        drive_mode = [0] * len(self.hand_bus.motor_names)
+
+        start_pos = [
+            500,
+            900,
+            1000,
+            0,
+            100,
+            250,
+            750,
+            100,
+            400,
+            150,
+            100,
+            120,
+            980,
+            100,
+            950,
+            750,
+        ]
+
+        end_pos = [
+            500 - 250,
+            900 - 300,
+            1000 - 550,
+            0 + 550,
+            1000,
+            start_pos[5] + 500,
+            start_pos[6] - 500,
+            1000,
+            400 + 700,
+            150 + 700,
+            1000,
+            120 + 700,
+            980 - 700,
+            1000,
+            950 - 700,
+            750 - 700,
+        ]
+
+        calib_modes = [CalibrationMode.LINEAR.name] * len(self.hand_bus.motor_names)
+
+        calib_dict = {
+            "homing_offset": homing_offset,
+            "drive_mode": drive_mode,
+            "start_pos": start_pos,
+            "end_pos": end_pos,
+            "calib_mode": calib_modes,
+            "motor_names": self.hand_bus.motor_names,
+        }
+        return calib_dict
+
+    def connect(self):
+        self.arm_bus.connect()
+        self.hand_bus.connect()
+
+
+ESCAPE_KEY_ID = 27
+
+
+def capture_and_display_processed_frames(
+    frame_processor: Callable[[np.ndarray], np.ndarray],
+    window_display_name: str,
+    cap_device: int = 0,
+) -> None:
+    """
+    Capture frames from the given input camera device, run them through
+    the frame processor, and display the outputs in a window with the given name.
+
+    User should press Esc to exit.
+
+    Inputs:
+        frame_processor: Callable[[np.ndarray], np.ndarray]
+            Processes frames.
+            Input and output are numpy arrays of shape (H W C) with BGR channel layout and dtype uint8 / byte.
+        window_display_name: str
+            Name of the window used to display frames.
+        cap_device: int
+            Identifier for the camera to use to capture frames.
+    """
+    cv2.namedWindow(window_display_name)
+    capture = cv2.VideoCapture(cap_device)
+    if not capture.isOpened():
+        raise ValueError("Unable to open video capture.")
+
+    frame_count = 0
+    has_frame, frame = capture.read()
+    while has_frame:
+        assert isinstance(frame, np.ndarray)
+
+        frame_count = frame_count + 1
+        # mirror frame
+        frame = np.ascontiguousarray(frame[:, ::-1, ::-1])
+
+        # process & show frame
+        processed_frame = frame_processor(frame)
+        cv2.imshow(window_display_name, processed_frame[:, :, ::-1])
+
+        has_frame, frame = capture.read()
+        key = cv2.waitKey(1)
+        if key == ESCAPE_KEY_ID:
+            break
+
+    capture.release()
+
+
+def main():
+    robot = HopeJuniorRobot()
+    robot.connect()
+
+    # robot.hand_bus.calibration = None
+
+    # breakpoint()
+    # print(robot.arm_bus.read("Present_Position"))
+    robot.arm_bus.write("Torque_Enable", 1)
+    robot.arm_bus.write("Acceleration", 20)
+    robot.arm_bus.read("Acceleration")
+
+    calibration = robot.get_hand_calibration()
+    robot.hand_bus.write("Goal_Position", calibration["start_pos"])
+    # robot.hand_bus.write("Goal_Position", calibration["end_pos"][:4], robot.hand_bus.motor_names[:4])
+    robot.hand_bus.set_calibration(calibration)
+    lol = 1
+
+    # # print(motors_bus.write("Goal_Position", 500))
+    # print(robot.hand_bus.read("Present_Position"))
+    # # pos = hand_bus.read("Present_Position")
+    # # hand_bus.write("Goal_Position", pos[0]+20, hand_bus.motor_names[0])
+    # # hand_bus.write("Goal_Position", pos[i]+delta, hand_bus.motor_names[i])
+    # robot.hand_bus.read("Acceleration")
+    # robot.hand_bus.write("Acceleration", 10)
+
+    # sleep = 1
+    # # robot.hand_bus.write(
+    # #     "Goal_Position", [glove.last_d['index_2']-1500,300,300], ["index_pinky_side", "index_flexor", "index_thumb_side"]
+    # # )
+    # #time.sleep(sleep)
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [100, 100, 100], ["index_flexor", "index_pinky_side", "index_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [100, 0, 0], ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [200, 200, 0], ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+    # )
+    # time.sleep(sleep)
+    # robot.hand_bus.write(
+    #     "Goal_Position", [200, 100, 600], ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+    # )
+    # time.sleep(sleep)
+
+    # breakpoint()
+
+    glove = HomonculusGlove()
+    glove.run_calibration()
+    # while True:
+    #     joint_names = ["index_1", "index_2"]
+    #     joint_values = glove.read(joint_names)
+    #     print(joint_values)
+
+    input()
+    while True:
+        joint_names = []
+        # joint_names += ["thumb_0", "thumb_2", "thumb_3"]
+        joint_names += ["index_0", "index_1"]
+        # joint_names += ["middle_1", "middle_2"]
+        # joint_names += ["ring_1", "ring_2"]
+        # joint_names += ["pinky_0", "pinky_2"]
+        joint_values = glove.read(joint_names)
+        joint_values = joint_values.round().astype(int)
+        joint_dict = {k: v for k, v in zip(joint_names, joint_values, strict=False)}
+
+        motor_values = []
+        motor_names = []
+        # motor_names += ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"]
+        # motor_values += [joint_dict["thumb_3"], joint_dict["thumb_0"], joint_dict["thumb_2"], joint_dict["thumb_2"]]
+        motor_names += ["index_pinky_side", "index_thumb_side"]
+        # if joint_dict["index_0"] -2100 > 0:
+        splayamount = 0.5
+        motor_values += [
+            (100 - joint_dict["index_0"]) * splayamount + joint_dict["index_1"] * (1 - splayamount),
+            (joint_dict["index_0"]) * splayamount + joint_dict["index_1"] * (1 - splayamount),
+        ]
+        # else:
+        #     motor_values += [100-joint_dict["index_0"], joint_dict["index_0"]]
+
+        # motor_names += ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+        # motor_values += [joint_dict["middle_2"], joint_dict["middle_1"], joint_dict["middle_1"]]
+        # motor_names += ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+        # motor_values += [joint_dict["ring_2"], joint_dict["ring_1"], joint_dict["ring_1"]]
+        # motor_names += ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+
+        # motor_values += [joint_dict["pinky_2"], joint_dict["pinky_0"], joint_dict["pinky_0"]]
+
+        motor_values = np.array(motor_values)
+        motor_values = np.clip(motor_values, 0, 100)
+
+        robot.hand_bus.write("Goal_Position", motor_values, motor_names)
+        time.sleep(0.02)
+
+    while True:
+        # print(glove.read()['index_2']-1500)
+        glove_index_flexor = glove.read()["index_2"] - 1500
+        glove_index_subflexor = glove.read()["index_1"] - 1500
+        glove_index_side = glove.read()["index_0"] - 2100
+
+        vals = [glove_index_flexor, 1000 - (glove_index_subflexor), glove_index_subflexor]
+
+        keys = ["index_flexor", "index_pinky_side", "index_thumb_side"]
+
+        glove_middle_flexor = glove.read()["middle_2"] - 1500
+        glove_middle_subflexor = 1000 - (glove.read()["middle_1"] - 1700)
+        vals += [glove_middle_flexor, glove_middle_subflexor, glove_middle_subflexor - 200]
+        keys += ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+
+        glove_ring_flexor = glove.read()["ring_2"] - 1300
+        print(glove_ring_flexor)
+        glove_ring_subflexor = glove.read()["ring_1"] - 1100
+
+        vals += [glove_ring_flexor, 1000 - glove_ring_subflexor, glove_ring_subflexor]
+        keys += ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+
+        glove_pinky_flexor = glove.read()["pinky_2"] - 1500
+        glove_pinky_subflexor = glove.read()["pinky_1"] - 1300
+        vals += [300 + glove_pinky_flexor, max(1000 - glove_pinky_subflexor - 100, 0), glove_pinky_subflexor]
+        keys += ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+
+        robot.hand_bus.write("Goal_Position", vals, keys)
+        time.sleep(0.1)
+
+    time.sleep(3)
+
+    def move_arm(loop=10):
+        sleep = 1
+        for i in range(loop):
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1195])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 2195])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1457, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 2357, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 974, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 2674, 1957, 1695])
+            time.sleep(sleep + 2)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 1632, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 1369, 1632, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 1330, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [2381, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1681, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+            robot.arm_bus.write("Goal_Position", [1981, 2030, 2069, 2032, 1874, 1957, 1695])
+            time.sleep(sleep)
+
+    def move_hand(loop=10):
+        sleep = 0.5
+        for i in range(loop):
+            robot.hand_bus.write(
+                "Goal_Position",
+                [500, 1000, 0, 1000],
+                ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [100, 100, 100], ["index_flexor", "index_pinky_side", "index_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [100, 1000, 150], ["middle_flexor", "middle_pinky_side", "middle_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [200, 200, 0], ["ring_flexor", "ring_pinky_side", "ring_thumb_side"]
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position", [200, 100, 700], ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"]
+            )
+            time.sleep(sleep)
+
+            robot.hand_bus.write(
+                "Goal_Position",
+                [500, 1000 - 250, 0 + 300, 1000 - 200],
+                ["thumb_basel_rotation", "thumb_flexor", "thumb_pinky_side", "thumb_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [100 + 450, 100 + 400, 100 + 400],
+                ["index_flexor", "index_pinky_side", "index_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [100 + 350, 1000 - 450, 150 + 450],
+                ["middle_flexor", "middle_pinky_side", "middle_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [200 + 650, 200 + 350, 0 + 350],
+                ["ring_flexor", "ring_pinky_side", "ring_thumb_side"],
+            )
+            time.sleep(sleep)
+            robot.hand_bus.write(
+                "Goal_Position",
+                [200 + 450, 100 + 400, 700 - 400],
+                ["pinky_flexor", "pinky_pinky_side", "pinky_thumb_side"],
+            )
+            time.sleep(sleep)
+
+    move_hand(3)
+
+    move_arm(1)
+
+    from concurrent.futures import ThreadPoolExecutor
+
+    with ThreadPoolExecutor() as executor:
+        executor.submit(move_arm)
+        executor.submit(move_hand)
+
+    # initial position
+    for i in range(3):
+        robot.hand_bus.write(
+            "Goal_Position", [500, 1000, 0, 1000, 100, 950, 100, 100, 1000, 150, 200, 200, 0, 200, 100, 700]
+        )
+        time.sleep(1)
+
+    # for i in range(3):
+    #     robot.hand_bus.write("Goal_Position", [500, 1000-150, 0+250, 1000-150,
+    #                                             100+300, 950-250, 100+250,
+    #                                             100+200, 1000-300, 150+300,
+    #                                             200+500, 200+200, 0+200,
+    #                                             200+300, 100+200, 700-200])
+    #     time.sleep(1)
+
+    # camera = 0
+    # score_threshold = 0.95
+    # iou_threshold = 0.3
+
+    # app = MediaPipeHandApp(MediaPipeHand.from_pretrained(), score_threshold, iou_threshold)
+
+    # def frame_processor(frame: np.ndarray) -> np.ndarray:
+    #     # Input Prep
+    #     NHWC_int_numpy_frames, NCHW_fp32_torch_frames = app_to_net_image_inputs(frame)
+
+    #     # Run Bounding Box & Keypoint Detector
+    #     batched_selected_boxes, batched_selected_keypoints = app._run_box_detector(NCHW_fp32_torch_frames)
+
+    #     # The region of interest ( bounding box of 4 (x, y) corners).
+    #     # list[torch.Tensor(shape=[Num Boxes, 4, 2])],
+    #     # where 2 == (x, y)
+    #     #
+    #     # A list element will be None if there is no selected ROI.
+    #     batched_roi_4corners = app._compute_object_roi(batched_selected_boxes, batched_selected_keypoints)
+
+    #     # selected landmarks for the ROI (if any)
+    #     # list[torch.Tensor(shape=[Num Selected Landmarks, K, 3])],
+    #     # where K == number of landmark keypoints, 3 == (x, y, confidence)
+    #     #
+    #     # A list element will be None if there is no ROI.
+    #     landmarks_out = app._run_landmark_detector(NHWC_int_numpy_frames, batched_roi_4corners)
+
+    #     app._draw_predictions(
+    #         NHWC_int_numpy_frames,
+    #         batched_selected_boxes,
+    #         batched_selected_keypoints,
+    #         batched_roi_4corners,
+    #         *landmarks_out,
+    #     )
+
+    #     return NHWC_int_numpy_frames[0]
+
+    # capture_and_display_processed_frames(frame_processor, "QAIHM Mediapipe Hand Demo", camera)
+
+
+if __name__ == "__main__":
+    main()
--- a/lerobot/common/policies/diffusion/configuration_diffusion.py
+++ b/lerobot/common/policies/diffusion/configuration_diffusion.py
@@ -67,7 +67,6 @@ class DiffusionConfig:
        use_group_norm: Whether to replace batch normalization with group normalization in the backbone.
            The group sizes are set to be about 16 (to be precise, feature_dim // 16).
        spatial_softmax_num_keypoints: Number of keypoints for SpatialSoftmax.
-        use_separate_rgb_encoders_per_camera: Whether to use a separate RGB encoder for each camera view.
        down_dims: Feature dimension for each stage of temporal downsampling in the diffusion modeling Unet.
            You may provide a variable number of dimensions, therefore also controlling the degree of
            downsampling.
@@ -131,7 +130,6 @@ class DiffusionConfig:
    pretrained_backbone_weights: str | None = None
    use_group_norm: bool = True
    spatial_softmax_num_keypoints: int = 32
-    use_separate_rgb_encoder_per_camera: bool = False
    # Unet.
    down_dims: tuple[int, ...] = (512, 1024, 2048)
    kernel_size: int = 5
--- a/lerobot/common/policies/diffusion/modeling_diffusion.py
+++ b/lerobot/common/policies/diffusion/modeling_diffusion.py
@@ -182,13 +182,8 @@ class DiffusionModel(nn.Module):
        self._use_env_state = False
        if num_images > 0:
            self._use_images = True
-            if self.config.use_separate_rgb_encoder_per_camera:
-                encoders = [DiffusionRgbEncoder(config) for _ in range(num_images)]
-                self.rgb_encoder = nn.ModuleList(encoders)
-                global_cond_dim += encoders[0].feature_dim * num_images
-            else:
-                self.rgb_encoder = DiffusionRgbEncoder(config)
-                global_cond_dim += self.rgb_encoder.feature_dim * num_images
+            self.rgb_encoder = DiffusionRgbEncoder(config)
+            global_cond_dim += self.rgb_encoder.feature_dim * num_images
        if "observation.environment_state" in config.input_shapes:
            self._use_env_state = True
            global_cond_dim += config.input_shapes["observation.environment_state"][0]
@@ -244,32 +239,16 @@ class DiffusionModel(nn.Module):
        """Encode image features and concatenate them all together along with the state vector."""
        batch_size, n_obs_steps = batch["observation.state"].shape[:2]
        global_cond_feats = [batch["observation.state"]]
-        # Extract image features.
+        # Extract image feature (first combine batch, sequence, and camera index dims).
        if self._use_images:
-            if self.config.use_separate_rgb_encoder_per_camera:
-                # Combine batch and sequence dims while rearranging to make the camera index dimension first.
-                images_per_camera = einops.rearrange(batch["observation.images"], "b s n ... -> n (b s) ...")
-                img_features_list = torch.cat(
-                    [
-                        encoder(images)
-                        for encoder, images in zip(self.rgb_encoder, images_per_camera, strict=True)
-                    ]
-                )
-                # Separate batch and sequence dims back out. The camera index dim gets absorbed into the
-                # feature dim (effectively concatenating the camera features).
-                img_features = einops.rearrange(
-                    img_features_list, "(n b s) ... -> b s (n ...)", b=batch_size, s=n_obs_steps
-                )
-            else:
-                # Combine batch, sequence, and "which camera" dims before passing to shared encoder.
-                img_features = self.rgb_encoder(
-                    einops.rearrange(batch["observation.images"], "b s n ... -> (b s n) ...")
-                )
-                # Separate batch dim and sequence dim back out. The camera index dim gets absorbed into the
-                # feature dim (effectively concatenating the camera features).
-                img_features = einops.rearrange(
-                    img_features, "(b s n) ... -> b s (n ...)", b=batch_size, s=n_obs_steps
-                )
+            img_features = self.rgb_encoder(
+                einops.rearrange(batch["observation.images"], "b s n ... -> (b s n) ...")
+            )
+            # Separate batch dim and sequence dim back out. The camera index dim gets absorbed into the
+            # feature dim (effectively concatenating the camera features).
+            img_features = einops.rearrange(
+                img_features, "(b s n) ... -> b s (n ...)", b=batch_size, s=n_obs_steps
+            )
            global_cond_feats.append(img_features)

        if self._use_env_state:
--- a/lerobot/common/policies/factory.py
+++ b/lerobot/common/policies/factory.py
@@ -51,13 +51,6 @@ def get_policy_and_config_classes(name: str) -> tuple[Policy, object]:
        from lerobot.common.policies.tdmpc.modeling_tdmpc import TDMPCPolicy

        return TDMPCPolicy, TDMPCConfig
-
-    elif name == "tdmpc2":
-        from lerobot.common.policies.tdmpc2.configuration_tdmpc2 import TDMPC2Config
-        from lerobot.common.policies.tdmpc2.modeling_tdmpc2 import TDMPC2Policy
-
-        return TDMPC2Policy, TDMPC2Config
-
    elif name == "diffusion":
        from lerobot.common.policies.diffusion.configuration_diffusion import DiffusionConfig
        from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy
--- a/lerobot/common/policies/tdmpc2/configuration_tdmpc2.py
+++ b/lerobot/common/policies/tdmpc2/configuration_tdmpc2.py
@@ -1,193 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 Nicklas Hansen, Xiaolong Wang, Hao Su,
-# and The HuggingFace Inc. team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from dataclasses import dataclass, field
-
-
-@dataclass
-class TDMPC2Config:
-    """Configuration class for TDMPC2Policy.
-
-    Defaults are configured for training with xarm_lift_medium_replay providing proprioceptive and single
-    camera observations.
-
-    The parameters you will most likely need to change are the ones which depend on the environment / sensors.
-    Those are: `input_shapes`, `output_shapes`, and perhaps `max_random_shift_ratio`.
-
-    Args:
-        n_action_repeats: The number of times to repeat the action returned by the planning. (hint: Google
-            action repeats in Q-learning or ask your favorite chatbot)
-        horizon: Horizon for model predictive control.
-        n_action_steps: Number of action steps to take from the plan given by model predictive control. This
-            is an alternative to using action repeats. If this is set to more than 1, then we require
-            `n_action_repeats == 1`, `use_mpc == True` and `n_action_steps <= horizon`. Note that this
-            approach of using multiple steps from the plan is not in the original implementation.
-        input_shapes: A dictionary defining the shapes of the input data for the policy. The key represents
-            the input data name, and the value is a list indicating the dimensions of the corresponding data.
-            For example, "observation.image" refers to an input from a camera with dimensions [3, 96, 96],
-            indicating it has three color channels and 96x96 resolution. Importantly, `input_shapes` doesn't
-            include batch dimension or temporal dimension.
-        output_shapes: A dictionary defining the shapes of the output data for the policy. The key represents
-            the output data name, and the value is a list indicating the dimensions of the corresponding data.
-            For example, "action" refers to an output shape of [14], indicating 14-dimensional actions.
-            Importantly, `output_shapes` doesn't include batch dimension or temporal dimension.
-        input_normalization_modes: A dictionary with key representing the modality (e.g. "observation.state"),
-            and the value specifies the normalization mode to apply. The two available modes are "mean_std"
-            which subtracts the mean and divides by the standard deviation and "min_max" which rescale in a
-            [-1, 1] range. Note that here this defaults to None meaning inputs are not normalized. This is to
-            match the original implementation.
-        output_normalization_modes: Similar dictionary as `normalize_input_modes`, but to unnormalize to the
-            original scale. Note that this is also used for normalizing the training targets. NOTE: Clipping
-            to [-1, +1] is used during MPPI/CEM. Therefore, it is recommended that you stick with "min_max"
-            normalization mode here.
-        image_encoder_hidden_dim: Number of channels for the convolutional layers used for image encoding.
-        state_encoder_hidden_dim: Hidden dimension for MLP used for state vector encoding.
-        latent_dim: Observation's latent embedding dimension.
-        q_ensemble_size: Number of Q function estimators to use in an ensemble for uncertainty estimation.
-        mlp_dim: Hidden dimension of MLPs used for modelling the dynamics encoder, reward function, policy
-            (π), Q ensemble, and V.
-        discount: Discount factor (γ) to use for the reinforcement learning formalism.
-        use_mpc: Whether to use model predictive control. The alternative is to just sample the policy model
-            (π) for each step.
-        cem_iterations: Number of iterations for the MPPI/CEM loop in MPC.
-        max_std: Maximum standard deviation for actions sampled from the gaussian PDF in CEM.
-        min_std: Minimum standard deviation for noise applied to actions sampled from the policy model (π).
-            Doubles up as the minimum standard deviation for actions sampled from the gaussian PDF in CEM.
-        n_gaussian_samples: Number of samples to draw from the gaussian distribution every CEM iteration. Must
-            be non-zero.
-        n_pi_samples: Number of samples to draw from the policy / world model rollout every CEM iteration. Can
-            be zero.
-        n_elites: The number of elite samples to use for updating the gaussian parameters every CEM iteration.
-        elite_weighting_temperature: The temperature to use for softmax weighting (by trajectory value) of the
-            elites, when updating the gaussian parameters for CEM.
-        max_random_shift_ratio: Maximum random shift (as a proportion of the image size) to apply to the
-            image(s) (in units of pixels) for training-time augmentation. If set to 0, no such augmentation
-            is applied. Note that the input images are assumed to be square for this augmentation.
-        reward_coeff: Loss weighting coefficient for the reward regression loss.
-        value_coeff: Loss weighting coefficient for both the state-action value (Q) TD loss, and the state
-            value (V) expectile regression loss.
-        consistency_coeff: Loss weighting coefficient for the consistency loss.
-        temporal_decay_coeff: Exponential decay coefficient for decaying the loss coefficient for future time-
-            steps. Hint: each loss computation involves `horizon` steps worth of actions starting from the
-            current time step.
-        target_model_momentum: Momentum (α) used for EMA updates of the target models. Updates are calculated
-            as ϕ ← αϕ + (1-α)θ where ϕ are the parameters of the target model and θ are the parameters of the
-            model being trained.
-    """
-
-    # Input / output structure.
-    n_action_repeats: int = 1
-    horizon: int = 3
-    n_action_steps: int = 1
-
-    input_shapes: dict[str, list[int]] = field(
-        default_factory=lambda: {
-            "observation.image": [3, 84, 84],
-            "observation.state": [4],
-        }
-    )
-    output_shapes: dict[str, list[int]] = field(
-        default_factory=lambda: {
-            "action": [4],
-        }
-    )
-
-    # Normalization / Unnormalization
-    input_normalization_modes: dict[str, str] | None = None
-    output_normalization_modes: dict[str, str] = field(
-        default_factory=lambda: {"action": "min_max"},
-    )
-
-    # Architecture / modeling.
-    # Neural networks.
-    image_encoder_hidden_dim: int = 32
-    state_encoder_hidden_dim: int = 256
-    latent_dim: int = 512
-    q_ensemble_size: int = 5
-    num_enc_layers: int = 2
-    mlp_dim: int = 512
-    # Reinforcement learning.
-    discount: float = 0.9
-    simnorm_dim: int = 8
-    dropout: float = 0.01
-
-    # actor
-    log_std_min: float = -10
-    log_std_max: float = 2
-
-    # critic
-    num_bins: int = 101
-    vmin: int = -10
-    vmax: int = +10
-
-    # Inference.
-    use_mpc: bool = True
-    cem_iterations: int = 6
-    max_std: float = 2.0
-    min_std: float = 0.05
-    n_gaussian_samples: int = 512
-    n_pi_samples: int = 24
-    n_elites: int = 64
-    elite_weighting_temperature: float = 0.5
-
-    # Training and loss computation.
-    max_random_shift_ratio: float = 0.0476
-    # Loss coefficients.
-    reward_coeff: float = 0.1
-    value_coeff: float = 0.1
-    consistency_coeff: float = 20.0
-    entropy_coef: float = 1e-4
-    temporal_decay_coeff: float = 0.5
-    # Target model. NOTE (michel_aractingi) this is equivelant to
-    # 1 - target_model_momentum of our TD-MPC1 implementation because
-    # of the use of `torch.lerp`
-    target_model_momentum: float = 0.01
-
-    def __post_init__(self):
-        """Input validation (not exhaustive)."""
-        # There should only be one image key.
-        image_keys = {k for k in self.input_shapes if k.startswith("observation.image")}
-        if len(image_keys) > 1:
-            raise ValueError(
-                f"{self.__class__.__name__} handles at most one image for now. Got image keys {image_keys}."
-            )
-        if len(image_keys) > 0:
-            image_key = next(iter(image_keys))
-            if self.input_shapes[image_key][-2] != self.input_shapes[image_key][-1]:
-                # TODO(alexander-soare): This limitation is solely because of code in the random shift
-                # augmentation. It should be able to be removed.
-                raise ValueError(
-                    f"Only square images are handled now. Got image shape {self.input_shapes[image_key]}."
-                )
-        if self.n_gaussian_samples <= 0:
-            raise ValueError(
-                f"The number of guassian samples for CEM should be non-zero. Got `{self.n_gaussian_samples=}`"
-            )
-        if self.output_normalization_modes != {"action": "min_max"}:
-            raise ValueError(
-                "TD-MPC assumes the action space dimensions to all be in [-1, 1]. Therefore it is strongly "
-                f"advised that you stick with the default. See {self.__class__.__name__} docstring for more "
-                "information."
-            )
-        if self.n_action_steps > 1:
-            if self.n_action_repeats != 1:
-                raise ValueError(
-                    "If `n_action_steps > 1`, `n_action_repeats` must be left to its default value of 1."
-                )
-            if not self.use_mpc:
-                raise ValueError("If `n_action_steps > 1`, `use_mpc` must be set to `True`.")
-            if self.n_action_steps > self.horizon:
-                raise ValueError("`n_action_steps` must be less than or equal to `horizon`.")
--- a/lerobot/common/policies/tdmpc2/modeling_tdmpc2.py
+++ b/lerobot/common/policies/tdmpc2/modeling_tdmpc2.py
@@ -1,834 +0,0 @@
-#!/usr/bin/env python
-
-# Copyright 2024 Nicklas Hansen and The HuggingFace Inc. team.
-# All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Implementation of TD-MPC2: Scalable, Robust World Models for Continuous Control
-
-We refer to the main paper and codebase:
-    TD-MPC2 paper: (https://arxiv.org/abs/2310.16828)
-    TD-MPC2 code:  (https://github.com/nicklashansen/tdmpc2)
-"""
-
-# ruff: noqa: N806
-
-from collections import deque
-from copy import deepcopy
-from functools import partial
-from typing import Callable
-
-import einops
-import numpy as np
-import torch
-import torch.nn as nn
-import torch.nn.functional as F  # noqa: N812
-from huggingface_hub import PyTorchModelHubMixin
-from torch import Tensor
-
-from lerobot.common.policies.normalize import Normalize, Unnormalize
-from lerobot.common.policies.tdmpc2.configuration_tdmpc2 import TDMPC2Config
-from lerobot.common.policies.tdmpc2.tdmpc2_utils import (
-    NormedLinear,
-    SimNorm,
-    gaussian_logprob,
-    soft_cross_entropy,
-    squash,
-    two_hot_inv,
-)
-from lerobot.common.policies.utils import get_device_from_parameters, populate_queues
-
-
-class TDMPC2Policy(
-    nn.Module,
-    PyTorchModelHubMixin,
-    library_name="lerobot",
-    repo_url="https://github.com/huggingface/lerobot",
-    tags=["robotics", "tdmpc2"],
-):
-    """Implementation of TD-MPC2 learning + inference."""
-
-    name = "tdmpc2"
-
-    def __init__(
-        self, config: TDMPC2Config | None = None, dataset_stats: dict[str, dict[str, Tensor]] | None = None
-    ):
-        """
-        Args:
-            config: Policy configuration class instance or None, in which case the default instantiation of
-                the configuration class is used.
-            dataset_stats: Dataset statistics to be used for normalization. If not passed here, it is expected
-                that they will be passed with a call to `load_state_dict` before the policy is used.
-        """
-        super().__init__()
-
-        if config is None:
-            config = TDMPC2Config()
-        self.config = config
-        self.model = TDMPC2WorldModel(config)
-        # TODO (michel-aractingi) temp fix for gpu
-        self.model = self.model.to("cuda:0")
-
-        if config.input_normalization_modes is not None:
-            self.normalize_inputs = Normalize(
-                config.input_shapes, config.input_normalization_modes, dataset_stats
-            )
-        else:
-            self.normalize_inputs = nn.Identity()
-        self.normalize_targets = Normalize(
-            config.output_shapes, config.output_normalization_modes, dataset_stats
-        )
-        self.unnormalize_outputs = Unnormalize(
-            config.output_shapes, config.output_normalization_modes, dataset_stats
-        )
-
-        image_keys = [k for k in config.input_shapes if k.startswith("observation.image")]
-        # Note: This check is covered in the post-init of the config but have a sanity check just in case.
-        self._use_image = False
-        self._use_env_state = False
-        if len(image_keys) > 0:
-            assert len(image_keys) == 1
-            self._use_image = True
-            self.input_image_key = image_keys[0]
-        if "observation.environment_state" in config.input_shapes:
-            self._use_env_state = True
-
-        self.scale = RunningScale(self.config.target_model_momentum)
-        self.discount = (
-            self.config.discount
-        )  # TODO (michel-aractingi) downscale discount according to episode length
-
-        self.reset()
-
-    def reset(self):
-        """
-        Clear observation and action queues. Clear previous means for warm starting of MPPI/CEM. Should be
-        called on `env.reset()`
-        """
-        self._queues = {
-            "observation.state": deque(maxlen=1),
-            "action": deque(maxlen=max(self.config.n_action_steps, self.config.n_action_repeats)),
-        }
-        if self._use_image:
-            self._queues["observation.image"] = deque(maxlen=1)
-        if self._use_env_state:
-            self._queues["observation.environment_state"] = deque(maxlen=1)
-        # Previous mean obtained from the cross-entropy method (CEM) used during MPC. It is used to warm start
-        # CEM for the next step.
-        self._prev_mean: torch.Tensor | None = None
-
-    @torch.no_grad()
-    def select_action(self, batch: dict[str, Tensor]) -> Tensor:
-        """Select a single action given environment observations."""
-        batch = self.normalize_inputs(batch)
-        if self._use_image:
-            batch = dict(batch)  # shallow copy so that adding a key doesn't modify the original
-            batch["observation.image"] = batch[self.input_image_key]
-
-        self._queues = populate_queues(self._queues, batch)
-
-        # When the action queue is depleted, populate it again by querying the policy.
-        if len(self._queues["action"]) == 0:
-            batch = {key: torch.stack(list(self._queues[key]), dim=1) for key in batch}
-
-            # Remove the time dimensions as it is not handled yet.
-            for key in batch:
-                assert batch[key].shape[1] == 1
-                batch[key] = batch[key][:, 0]
-
-            # NOTE: Order of observations matters here.
-            encode_keys = []
-            if self._use_image:
-                encode_keys.append("observation.image")
-            if self._use_env_state:
-                encode_keys.append("observation.environment_state")
-            encode_keys.append("observation.state")
-            z = self.model.encode({k: batch[k] for k in encode_keys})
-            if self.config.use_mpc:  # noqa: SIM108
-                actions = self.plan(z)  # (horizon, batch, action_dim)
-            else:
-                # Plan with the policy (π) alone. This always returns one action so unsqueeze to get a
-                # sequence dimension like in the MPC branch.
-                actions = self.model.pi(z)[0].unsqueeze(0)
-
-            actions = torch.clamp(actions, -1, +1)
-
-            actions = self.unnormalize_outputs({"action": actions})["action"]
-
-            if self.config.n_action_repeats > 1:
-                for _ in range(self.config.n_action_repeats):
-                    self._queues["action"].append(actions[0])
-            else:
-                # Action queue is (n_action_steps, batch_size, action_dim), so we transpose the action.
-                self._queues["action"].extend(actions[: self.config.n_action_steps])
-
-        action = self._queues["action"].popleft()
-        return action
-
-    @torch.no_grad()
-    def plan(self, z: Tensor) -> Tensor:
-        """Plan sequence of actions using TD-MPC inference.
-
-        Args:
-            z: (batch, latent_dim,) tensor for the initial state.
-        Returns:
-            (horizon, batch, action_dim,) tensor for the planned trajectory of actions.
-        """
-        device = get_device_from_parameters(self)
-
-        batch_size = z.shape[0]
-
-        # Sample Nπ trajectories from the policy.
-        pi_actions = torch.empty(
-            self.config.horizon,
-            self.config.n_pi_samples,
-            batch_size,
-            self.config.output_shapes["action"][0],
-            device=device,
-        )
-        if self.config.n_pi_samples > 0:
-            _z = einops.repeat(z, "b d -> n b d", n=self.config.n_pi_samples)
-            for t in range(self.config.horizon):
-                # Note: Adding a small amount of noise here doesn't hurt during inference and may even be
-                # helpful for CEM.
-                pi_actions[t] = self.model.pi(_z)[0]
-                _z = self.model.latent_dynamics(_z, pi_actions[t])
-
-        # In the CEM loop we will need this for a call to estimate_value with the gaussian sampled
-        # trajectories.
-        z = einops.repeat(z, "b d -> n b d", n=self.config.n_gaussian_samples + self.config.n_pi_samples)
-
-        # Model Predictive Path Integral (MPPI) with the cross-entropy method (CEM) as the optimization
-        # algorithm.
-        # The initial mean and standard deviation for the cross-entropy method (CEM).
-        mean = torch.zeros(
-            self.config.horizon, batch_size, self.config.output_shapes["action"][0], device=device
-        )
-        # Maybe warm start CEM with the mean from the previous step.
-        if self._prev_mean is not None:
-            mean[:-1] = self._prev_mean[1:]
-        std = self.config.max_std * torch.ones_like(mean)
-
-        for _ in range(self.config.cem_iterations):
-            # Randomly sample action trajectories for the gaussian distribution.
-            std_normal_noise = torch.randn(
-                self.config.horizon,
-                self.config.n_gaussian_samples,
-                batch_size,
-                self.config.output_shapes["action"][0],
-                device=std.device,
-            )
-            gaussian_actions = torch.clamp(mean.unsqueeze(1) + std.unsqueeze(1) * std_normal_noise, -1, 1)
-
-            # Compute elite actions.
-            actions = torch.cat([gaussian_actions, pi_actions], dim=1)
-            value = self.estimate_value(z, actions).nan_to_num_(0).squeeze()
-            elite_idxs = torch.topk(value, self.config.n_elites, dim=0).indices  # (n_elites, batch)
-            elite_value = value.take_along_dim(elite_idxs, dim=0)  # (n_elites, batch)
-            # (horizon, n_elites, batch, action_dim)
-            elite_actions = actions.take_along_dim(einops.rearrange(elite_idxs, "n b -> 1 n b 1"), dim=1)
-
-            # Update gaussian PDF parameters to be the (weighted) mean and standard deviation of the elites.
-            max_value = elite_value.max(0, keepdim=True)[0]  # (1, batch)
-            # The weighting is a softmax over trajectory values. Note that this is not the same as the usage
-            # of Ω in eqn 4 of the TD-MPC paper. Instead it is the normalized version of it: s = Ω/ΣΩ. This
-            # makes the equations: μ = Σ(s⋅Γ), σ = Σ(s⋅(Γ-μ)²).
-            score = torch.exp(self.config.elite_weighting_temperature * (elite_value - max_value))
-            score /= score.sum(axis=0, keepdim=True)
-            # (horizon, batch, action_dim)
-            mean = torch.sum(einops.rearrange(score, "n b -> n b 1") * elite_actions, dim=1) / (
-                einops.rearrange(score.sum(0), "b -> 1 b 1") + 1e-9
-            )
-            std = torch.sqrt(
-                torch.sum(
-                    einops.rearrange(score, "n b -> n b 1")
-                    * (elite_actions - einops.rearrange(mean, "h b d -> h 1 b d")) ** 2,
-                    dim=1,
-                )
-                / (einops.rearrange(score.sum(0), "b -> 1 b 1") + 1e-9)
-            ).clamp_(self.config.min_std, self.config.max_std)
-
-        # Keep track of the mean for warm-starting subsequent steps.
-        self._prev_mean = mean
-
-        # Randomly select one of the elite actions from the last iteration of MPPI/CEM using the softmax
-        # scores from the last iteration.
-        actions = elite_actions[:, torch.multinomial(score.T, 1).squeeze(), torch.arange(batch_size)]
-        return actions
-
-    @torch.no_grad()
-    def estimate_value(self, z: Tensor, actions: Tensor):
-        """Estimates the value of a trajectory as per eqn 4 of the FOWM paper.
-
-        Args:
-            z: (batch, latent_dim) tensor of initial latent states.
-            actions: (horizon, batch, action_dim) tensor of action trajectories.
-        Returns:
-            (batch,) tensor of values.
-        """
-        # Initialize return and running discount factor.
-        G, running_discount = 0, 1
-        # Iterate over the actions in the trajectory to simulate the trajectory using the latent dynamics
-        # model. Keep track of return.
-        for t in range(actions.shape[0]):
-            # Estimate the next state (latent) and reward.
-            z, reward = self.model.latent_dynamics_and_reward(z, actions[t], discretize_reward=True)
-            # Update the return and running discount.
-            G += running_discount * reward
-            running_discount *= self.config.discount
-
-        # next_action = self.model.pi(z)[0]  # (batch, action_dim)
-        # terminal_values = self.model.Qs(z, next_action, return_type="avg")  # (ensemble, batch)
-
-        return G + running_discount * self.model.Qs(z, self.model.pi(z)[0], return_type="avg")
-
-    def forward(self, batch: dict[str, Tensor]) -> dict[str, Tensor | float]:
-        """Run the batch through the model and compute the loss.
-
-        Returns a dictionary with loss as a tensor, and other information as native floats.
-        """
-        device = get_device_from_parameters(self)
-
-        batch = self.normalize_inputs(batch)
-        if self._use_image:
-            batch = dict(batch)  # shallow copy so that adding a key doesn't modify the original
-            batch["observation.image"] = batch[self.input_image_key]
-        batch = self.normalize_targets(batch)
-
-        info = {}
-
-        # (b, t) -> (t, b)
-        for key in batch:
-            if batch[key].ndim > 1:
-                batch[key] = batch[key].transpose(1, 0)
-
-        action = batch["action"]  # (t, b, action_dim)
-        reward = batch["next.reward"]  # (t, b)
-        observations = {k: v for k, v in batch.items() if k.startswith("observation.")}
-
-        # Apply random image augmentations.
-        if self._use_image and self.config.max_random_shift_ratio > 0:
-            observations["observation.image"] = flatten_forward_unflatten(
-                partial(random_shifts_aug, max_random_shift_ratio=self.config.max_random_shift_ratio),
-                observations["observation.image"],
-            )
-
-        # Get the current observation for predicting trajectories, and all future observations for use in
-        # the latent consistency loss and TD loss.
-        current_observation, next_observations = {}, {}
-        for k in observations:
-            current_observation[k] = observations[k][0]
-            next_observations[k] = observations[k][1:]
-        horizon, batch_size = next_observations[
-            "observation.image" if self._use_image else "observation.environment_state"
-        ].shape[:2]
-
-        # Run latent rollout using the latent dynamics model and policy model.
-        # Note this has shape `horizon+1` because there are `horizon` actions and a current `z`. Each action
-        # gives us a next `z`.
-        batch_size = batch["index"].shape[0]
-        z_preds = torch.empty(horizon + 1, batch_size, self.config.latent_dim, device=device)
-        z_preds[0] = self.model.encode(current_observation)
-        reward_preds = torch.empty(horizon, batch_size, self.config.num_bins, device=device)
-        for t in range(horizon):
-            z_preds[t + 1], reward_preds[t] = self.model.latent_dynamics_and_reward(z_preds[t], action[t])
-
-        # Compute Q value predictions based on the latent rollout.
-        q_preds_ensemble = self.model.Qs(
-            z_preds[:-1], action, return_type="all"
-        )  # (ensemble, horizon, batch)
-        info.update({"Q": q_preds_ensemble.mean().item()})
-
-        # Compute various targets with stopgrad.
-        with torch.no_grad():
-            # Latent state consistency targets for consistency loss.
-            z_targets = self.model.encode(next_observations)
-
-            # Compute the TD-target from a reward and the next observation
-            pi = self.model.pi(z_targets)[0]
-            td_targets = (
-                reward
-                + self.config.discount
-                * self.model.Qs(z_targets, pi, return_type="min", target=True).squeeze()
-            )
-
-        # Compute losses.
-        # Exponentially decay the loss weight with respect to the timestep. Steps that are more distant in the
-        # future have less impact on the loss. Note: unsqueeze will let us broadcast to (seq, batch).
-        temporal_loss_coeffs = torch.pow(
-            self.config.temporal_decay_coeff, torch.arange(horizon, device=device)
-        ).unsqueeze(-1)
-
-        # Compute consistency loss as MSE loss between latents predicted from the rollout and latents
-        # predicted from the (target model's) observation encoder.
-        consistency_loss = (
-            (
-                temporal_loss_coeffs
-                * F.mse_loss(z_preds[1:], z_targets, reduction="none").mean(dim=-1)
-                # `z_preds` depends on the current observation and the actions.
-                * ~batch["observation.state_is_pad"][0]
-                * ~batch["action_is_pad"]
-                # `z_targets` depends on the next observation.
-                * ~batch["observation.state_is_pad"][1:]
-            )
-            .sum(0)
-            .mean()
-        )
-        # Compute the reward loss as MSE loss between rewards predicted from the rollout and the dataset
-        # rewards.
-        reward_loss = (
-            (
-                temporal_loss_coeffs
-                * soft_cross_entropy(reward_preds, reward, self.config).mean(1)
-                * ~batch["next.reward_is_pad"]
-                * ~batch["observation.state_is_pad"][0]
-                * ~batch["action_is_pad"]
-            )
-            .sum(0)
-            .mean()
-        )
-
-        # Compute state-action value loss (TD loss) for all of the Q functions in the ensemble.
-        ce_value_loss = 0.0
-        for i in range(self.config.q_ensemble_size):
-            ce_value_loss += soft_cross_entropy(q_preds_ensemble[i], td_targets, self.config).mean(1)
-
-        q_value_loss = (
-            (
-                temporal_loss_coeffs
-                * ce_value_loss
-                # `q_preds_ensemble` depends on the first observation and the actions.
-                * ~batch["observation.state_is_pad"][0]
-                * ~batch["action_is_pad"]
-                # q_targets depends on the reward and the next observations.
-                * ~batch["next.reward_is_pad"]
-                * ~batch["observation.state_is_pad"][1:]
-            )
-            .sum(0)
-            .mean()
-        )
-
-        # Calculate the advantage weighted regression loss for π as detailed in FOWM 3.1.
-        # We won't need these gradients again so detach.
-        z_preds = z_preds.detach()
-        action_preds, _, log_pis, _ = self.model.pi(z_preds[:-1])
-
-        with torch.no_grad():
-            # avoid unnessecary computation of the gradients during policy optimization
-            # TODO (michel-aractingi): the same logic should be extended when adding task embeddings
-            qs = self.model.Qs(z_preds[:-1], action_preds, return_type="avg")
-            self.scale.update(qs[0])
-            qs = self.scale(qs)
-
-        pi_loss = (
-            (self.config.entropy_coef * log_pis - qs).mean(dim=2)
-            * temporal_loss_coeffs
-            # `action_preds` depends on the first observation and the actions.
-            * ~batch["observation.state_is_pad"][0]
-            * ~batch["action_is_pad"]
-        ).mean()
-
-        loss = (
-            self.config.consistency_coeff * consistency_loss
-            + self.config.reward_coeff * reward_loss
-            + self.config.value_coeff * q_value_loss
-            + pi_loss
-        )
-
-        info.update(
-            {
-                "consistency_loss": consistency_loss.item(),
-                "reward_loss": reward_loss.item(),
-                "Q_value_loss": q_value_loss.item(),
-                "pi_loss": pi_loss.item(),
-                "loss": loss,
-                "sum_loss": loss.item() * self.config.horizon,
-                "pi_scale": float(self.scale.value),
-            }
-        )
-
-        # Undo (b, t) -> (t, b).
-        for key in batch:
-            if batch[key].ndim > 1:
-                batch[key] = batch[key].transpose(1, 0)
-
-        return info
-
-    def update(self):
-        """Update the target model's using polyak averaging."""
-        self.model.update_target_Q()
-
-
-class TDMPC2WorldModel(nn.Module):
-    """Latent dynamics model used in TD-MPC2."""
-
-    def __init__(self, config: TDMPC2Config):
-        super().__init__()
-        self.config = config
-
-        self._encoder = TDMPC2ObservationEncoder(config)
-
-        # Define latent dynamics head
-        self._dynamics = nn.Sequential(
-            NormedLinear(config.latent_dim + config.output_shapes["action"][0], config.mlp_dim),
-            NormedLinear(config.mlp_dim, config.mlp_dim),
-            NormedLinear(config.mlp_dim, config.latent_dim, act=SimNorm(config.simnorm_dim)),
-        )
-
-        # Define reward head
-        self._reward = nn.Sequential(
-            NormedLinear(config.latent_dim + config.output_shapes["action"][0], config.mlp_dim),
-            NormedLinear(config.mlp_dim, config.mlp_dim),
-            nn.Linear(config.mlp_dim, max(config.num_bins, 1)),
-        )
-
-        # Define policy head
-        self._pi = nn.Sequential(
-            NormedLinear(config.latent_dim, config.mlp_dim),
-            NormedLinear(config.mlp_dim, config.mlp_dim),
-            nn.Linear(config.mlp_dim, 2 * config.output_shapes["action"][0]),
-        )
-
-        # Define ensemble of Q functions
-        self._Qs = nn.ModuleList(
-            [
-                nn.Sequential(
-                    NormedLinear(
-                        config.latent_dim + config.output_shapes["action"][0],
-                        config.mlp_dim,
-                        dropout=config.dropout,
-                    ),
-                    NormedLinear(config.mlp_dim, config.mlp_dim),
-                    nn.Linear(config.mlp_dim, max(config.num_bins, 1)),
-                )
-                for _ in range(config.q_ensemble_size)
-            ]
-        )
-
-        self._init_weights()
-
-        self._target_Qs = deepcopy(self._Qs).requires_grad_(False)
-
-        self.log_std_min = torch.tensor(config.log_std_min)
-        self.log_std_dif = torch.tensor(config.log_std_max) - self.log_std_min
-
-        self.bins = torch.linspace(config.vmin, config.vmax, config.num_bins)
-        self.config.bin_size = (config.vmax - config.vmin) / (config.num_bins - 1)
-
-    def _init_weights(self):
-        """Initialize model weights.
-        Custom weight initializations proposed in TD-MPC2.
-
-        """
-
-        def _apply_fn(m):
-            if isinstance(m, nn.Linear):
-                nn.init.trunc_normal_(m.weight, std=0.02)
-                if m.bias is not None:
-                    nn.init.constant_(m.bias, 0)
-            elif isinstance(m, nn.ParameterList):
-                for i, p in enumerate(m):
-                    if p.dim() == 3:  # Linear
-                        nn.init.trunc_normal_(p, std=0.02)  # Weight
-                        nn.init.constant_(m[i + 1], 0)  # Bias
-
-        self.apply(_apply_fn)
-
-        # initialize parameters of the
-        for m in [self._reward, *self._Qs]:
-            assert isinstance(
-                m[-1], nn.Linear
-            ), "Sanity check. The last linear layer needs 0 initialization on weights."
-            nn.init.zeros_(m[-1].weight)
-
-    def to(self, *args, **kwargs):
-        """
-        Overriding `to` method to also move additional tensors to device.
-        """
-        super().to(*args, **kwargs)
-        self.log_std_min = self.log_std_min.to(*args, **kwargs)
-        self.log_std_dif = self.log_std_dif.to(*args, **kwargs)
-        self.bins = self.bins.to(*args, **kwargs)
-        return self
-
-    def train(self, mode):
-        super().train(mode)
-        self._target_Qs.train(False)
-        return self
-
-    def encode(self, obs: dict[str, Tensor]) -> Tensor:
-        """Encodes an observation into its latent representation."""
-        return self._encoder(obs)
-
-    def latent_dynamics_and_reward(
-        self, z: Tensor, a: Tensor, discretize_reward: bool = False
-    ) -> tuple[Tensor, Tensor, bool]:
-        """Predict the next state's latent representation and the reward given a current latent and action.
-
-        Args:
-            z: (*, latent_dim) tensor for the current state's latent representation.
-            a: (*, action_dim) tensor for the action to be applied.
-        Returns:
-            A tuple containing:
-                - (*, latent_dim) tensor for the next state's latent representation.
-                - (*,) tensor for the estimated reward.
-        """
-        x = torch.cat([z, a], dim=-1)
-        reward = self._reward(x).squeeze(-1)
-        if discretize_reward:
-            reward = two_hot_inv(reward, self.bins)
-        return self._dynamics(x), reward
-
-    def latent_dynamics(self, z: Tensor, a: Tensor) -> Tensor:
-        """Predict the next state's latent representation given a current latent and action.
-
-        Args:
-            z: (*, latent_dim) tensor for the current state's latent representation.
-            a: (*, action_dim) tensor for the action to be applied.
-        Returns:
-            (*, latent_dim) tensor for the next state's latent representation.
-        """
-        x = torch.cat([z, a], dim=-1)
-        return self._dynamics(x)
-
-    def pi(self, z: Tensor) -> Tensor:
-        """Samples an action from the learned policy.
-
-        The policy can also have added (truncated) Gaussian noise injected for encouraging exploration when
-        generating rollouts for online training.
-
-        Args:
-            z: (*, latent_dim) tensor for the current state's latent representation.
-            std: The standard deviation of the injected noise.
-        Returns:
-            (*, action_dim) tensor for the sampled action.
-        """
-        mu, log_std = self._pi(z).chunk(2, dim=-1)
-        log_std = self.log_std_min + 0.5 * self.log_std_dif * (torch.tanh(log_std) + 1)
-        eps = torch.randn_like(mu)
-
-        log_pi = gaussian_logprob(eps, log_std)
-        pi = mu + eps * log_std.exp()
-        mu, pi, log_pi = squash(mu, pi, log_pi)
-
-        return pi, mu, log_pi, log_std
-
-    def Qs(self, z: Tensor, a: Tensor, return_type: str = "min", target=False) -> Tensor:  # noqa: N802
-        """Predict state-action value for all of the learned Q functions.
-
-        Args:
-            z: (*, latent_dim) tensor for the current state's latent representation.
-            a: (*, action_dim) tensor for the action to be applied.
-            return_type: either 'min' or 'all' otherwise the average is returned
-        Returns:
-            (q_ensemble, *) tensor for the value predictions of each learned Q function in the ensemble or the average or min
-        """
-        x = torch.cat([z, a], dim=-1)
-
-        if target:
-            out = torch.stack([q(x).squeeze(-1) for q in self._target_Qs], dim=0)
-        else:
-            out = torch.stack([q(x).squeeze(-1) for q in self._Qs], dim=0)
-
-        if return_type == "all":
-            return out
-
-        Q1, Q2 = out[np.random.choice(len(self._Qs), size=2, replace=False)]
-        Q1, Q2 = two_hot_inv(Q1, self.bins), two_hot_inv(Q2, self.bins)
-        return torch.min(Q1, Q2) if return_type == "min" else (Q1 + Q2) / 2
-
-    def update_target_Q(self):
-        """
-        Soft-update target Q-networks using Polyak averaging.
-        """
-        with torch.no_grad():
-            for p, p_target in zip(self._Qs.parameters(), self._target_Qs.parameters(), strict=False):
-                p_target.data.lerp_(p.data, self.config.target_model_momentum)
-
-
-class TDMPC2ObservationEncoder(nn.Module):
-    """Encode image and/or state vector observations."""
-
-    def __init__(self, config: TDMPC2Config):
-        """
-        Creates encoders for pixel and/or state modalities.
-        TODO(alexander-soare): The original work allows for multiple images by concatenating them along the
-            channel dimension. Re-implement this capability.
-        """
-        super().__init__()
-        self.config = config
-
-        # Define the observation encoder whether its pixels or states
-        encoder_dict = {}
-        for obs_key in config.input_shapes:
-            if "observation.image" in config.input_shapes:
-                encoder_module = nn.Sequential(
-                    nn.Conv2d(config.input_shapes[obs_key][0], config.image_encoder_hidden_dim, 7, stride=2),
-                    nn.ReLU(inplace=True),
-                    nn.Conv2d(config.image_encoder_hidden_dim, config.image_encoder_hidden_dim, 5, stride=2),
-                    nn.ReLU(inplace=True),
-                    nn.Conv2d(config.image_encoder_hidden_dim, config.image_encoder_hidden_dim, 3, stride=2),
-                    nn.ReLU(inplace=True),
-                    nn.Conv2d(config.image_encoder_hidden_dim, config.image_encoder_hidden_dim, 3, stride=1),
-                )
-                dummy_batch = torch.zeros(1, *config.input_shapes[obs_key])
-                with torch.inference_mode():
-                    out_shape = encoder_module(dummy_batch).shape[1:]
-                encoder_module.extend(
-                    nn.Sequential(
-                        nn.Flatten(),
-                        NormedLinear(np.prod(out_shape), config.latent_dim, act=SimNorm(config.simnorm_dim)),
-                    )
-                )
-
-            elif (
-                "observation.state" in config.input_shapes
-                or "observation.environment_state" in config.input_shapes
-            ):
-                encoder_module = nn.ModuleList()
-                encoder_module.append(
-                    NormedLinear(config.input_shapes[obs_key][0], config.state_encoder_hidden_dim)
-                )
-                assert config.num_enc_layers > 0
-                for _ in range(config.num_enc_layers - 1):
-                    encoder_module.append(
-                        NormedLinear(config.state_encoder_hidden_dim, config.state_encoder_hidden_dim)
-                    )
-                encoder_module.append(
-                    NormedLinear(
-                        config.state_encoder_hidden_dim, config.latent_dim, act=SimNorm(config.simnorm_dim)
-                    )
-                )
-                encoder_module = nn.Sequential(*encoder_module)
-
-            else:
-                raise NotImplementedError(f"No corresponding encoder module for key {obs_key}.")
-
-            encoder_dict[obs_key.replace(".", "")] = encoder_module
-
-        self.encoder = nn.ModuleDict(encoder_dict)
-
-    def forward(self, obs_dict: dict[str, Tensor]) -> Tensor:
-        """Encode the image and/or state vector.
-
-        Each modality is encoded into a feature vector of size (latent_dim,) and then a uniform mean is taken
-        over all features.
-        """
-        feat = []
-        for obs_key in self.config.input_shapes:
-            if "observation.image" in obs_key:
-                feat.append(
-                    flatten_forward_unflatten(self.encoder[obs_key.replace(".", "")], obs_dict[obs_key])
-                )
-            else:
-                feat.append(self.encoder[obs_key.replace(".", "")](obs_dict[obs_key]))
-        return torch.stack(feat, dim=0).mean(0)
-
-
-def random_shifts_aug(x: Tensor, max_random_shift_ratio: float) -> Tensor:
-    """Randomly shifts images horizontally and vertically.
-
-    Adapted from https://github.com/facebookresearch/drqv2
-    """
-    b, _, h, w = x.size()
-    assert h == w, "non-square images not handled yet"
-    pad = int(round(max_random_shift_ratio * h))
-    x = F.pad(x, tuple([pad] * 4), "replicate")
-    eps = 1.0 / (h + 2 * pad)
-    arange = torch.linspace(
-        -1.0 + eps,
-        1.0 - eps,
-        h + 2 * pad,
-        device=x.device,
-        dtype=torch.float32,
-    )[:h]
-    arange = einops.repeat(arange, "w -> h w 1", h=h)
-    base_grid = torch.cat([arange, arange.transpose(1, 0)], dim=2)
-    base_grid = einops.repeat(base_grid, "h w c -> b h w c", b=b)
-    # A random shift in units of pixels and within the boundaries of the padding.
-    shift = torch.randint(
-        0,
-        2 * pad + 1,
-        size=(b, 1, 1, 2),
-        device=x.device,
-        dtype=torch.float32,
-    )
-    shift *= 2.0 / (h + 2 * pad)
-    grid = base_grid + shift
-    return F.grid_sample(x, grid, padding_mode="zeros", align_corners=False)
-
-
-def flatten_forward_unflatten(fn: Callable[[Tensor], Tensor], image_tensor: Tensor) -> Tensor:
-    """Helper to temporarily flatten extra dims at the start of the image tensor.
-
-    Args:
-        fn: Callable that the image tensor will be passed to. It should accept (B, C, H, W) and return
-            (B, *), where * is any number of dimensions.
-        image_tensor: An image tensor of shape (**, C, H, W), where ** is any number of dimensions, generally
-            different from *.
-    Returns:
-        A return value from the callable reshaped to (**, *).
-    """
-    if image_tensor.ndim == 4:
-        return fn(image_tensor)
-    start_dims = image_tensor.shape[:-3]
-    inp = torch.flatten(image_tensor, end_dim=-4)
-    flat_out = fn(inp)
-    return torch.reshape(flat_out, (*start_dims, *flat_out.shape[1:]))
-
-
-class RunningScale:
-    """Running trimmed scale estimator."""
-
-    def __init__(self, tau):
-        self.tau = tau
-        self._value = torch.ones(1, dtype=torch.float32, device=torch.device("cuda"))
-        self._percentiles = torch.tensor([5, 95], dtype=torch.float32, device=torch.device("cuda"))
-
-    def state_dict(self):
-        return dict(value=self._value, percentiles=self._percentiles)
-
-    def load_state_dict(self, state_dict):
-        self._value.data.copy_(state_dict["value"])
-        self._percentiles.data.copy_(state_dict["percentiles"])
-
-    @property
-    def value(self):
-        return self._value.cpu().item()
-
-    def _percentile(self, x):
-        x_dtype, x_shape = x.dtype, x.shape
-        x = x.view(x.shape[0], -1)
-        in_sorted, _ = torch.sort(x, dim=0)
-        positions = self._percentiles * (x.shape[0] - 1) / 100
-        floored = torch.floor(positions)
-        ceiled = floored + 1
-        ceiled[ceiled > x.shape[0] - 1] = x.shape[0] - 1
-        weight_ceiled = positions - floored
-        weight_floored = 1.0 - weight_ceiled
-        d0 = in_sorted[floored.long(), :] * weight_floored[:, None]
-        d1 = in_sorted[ceiled.long(), :] * weight_ceiled[:, None]
-        return (d0 + d1).view(-1, *x_shape[1:]).type(x_dtype)
-
-    def update(self, x):
-        percentiles = self._percentile(x.detach())
-        value = torch.clamp(percentiles[1] - percentiles[0], min=1.0)
-        self._value.data.lerp_(value, self.tau)
-
-    def __call__(self, x, update=False):
-        if update:
-            self.update(x)
-        return x * (1 / self.value)
-
-    def __repr__(self):
-        return f"RunningScale(S: {self.value})"
--- a/lerobot/common/policies/tdmpc2/tdmpc2_utils.py
+++ b/lerobot/common/policies/tdmpc2/tdmpc2_utils.py
@@ -1,164 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from functorch import combine_state_for_ensemble
-
-
-class Ensemble(nn.Module):
-    """
-    Vectorized ensemble of modules.
-    """
-
-    def __init__(self, modules, **kwargs):
-        super().__init__()
-        modules = nn.ModuleList(modules)
-        fn, params, _ = combine_state_for_ensemble(modules)
-        self.vmap = torch.vmap(fn, in_dims=(0, 0, None), randomness="different", **kwargs)
-        self.params = nn.ParameterList([nn.Parameter(p) for p in params])
-        self._repr = str(modules)
-
-    def forward(self, *args, **kwargs):
-        return self.vmap([p for p in self.params], (), *args, **kwargs)
-
-    def __repr__(self):
-        return "Vectorized " + self._repr
-
-
-class SimNorm(nn.Module):
-    """
-    Simplicial normalization.
-    Adapted from https://arxiv.org/abs/2204.00616.
-    """
-
-    def __init__(self, dim):
-        super().__init__()
-        self.dim = dim
-
-    def forward(self, x):
-        shp = x.shape
-        x = x.view(*shp[:-1], -1, self.dim)
-        x = F.softmax(x, dim=-1)
-        return x.view(*shp)
-
-    def __repr__(self):
-        return f"SimNorm(dim={self.dim})"
-
-
-class NormedLinear(nn.Linear):
-    """
-    Linear layer with LayerNorm, activation, and optionally dropout.
-    """
-
-    def __init__(self, *args, dropout=0.0, act=nn.Mish(inplace=True), **kwargs):
-        super().__init__(*args, **kwargs)
-        self.ln = nn.LayerNorm(self.out_features)
-        self.act = act
-        self.dropout = nn.Dropout(dropout, inplace=True) if dropout else None
-
-    def forward(self, x):
-        x = super().forward(x)
-        if self.dropout:
-            x = self.dropout(x)
-        return self.act(self.ln(x))
-
-    def __repr__(self):
-        repr_dropout = f", dropout={self.dropout.p}" if self.dropout else ""
-        return (
-            f"NormedLinear(in_features={self.in_features}, "
-            f"out_features={self.out_features}, "
-            f"bias={self.bias is not None}{repr_dropout}, "
-            f"act={self.act.__class__.__name__})"
-        )
-
-
-def soft_cross_entropy(pred, target, cfg):
-    """Computes the cross entropy loss between predictions and soft targets."""
-    pred = F.log_softmax(pred, dim=-1)
-    target = two_hot(target, cfg)
-    return -(target * pred).sum(-1, keepdim=True)
-
-
-@torch.jit.script
-def log_std(x, low, dif):
-    return low + 0.5 * dif * (torch.tanh(x) + 1)
-
-
-@torch.jit.script
-def _gaussian_residual(eps, log_std):
-    return -0.5 * eps.pow(2) - log_std
-
-
-@torch.jit.script
-def _gaussian_logprob(residual):
-    return residual - 0.5 * torch.log(2 * torch.pi)
-
-
-def gaussian_logprob(eps, log_std, size=None):
-    """Compute Gaussian log probability."""
-    residual = _gaussian_residual(eps, log_std).sum(-1, keepdim=True)
-    if size is None:
-        size = eps.size(-1)
-    return _gaussian_logprob(residual) * size
-
-
-@torch.jit.script
-def _squash(pi):
-    return torch.log(F.relu(1 - pi.pow(2)) + 1e-6)
-
-
-def squash(mu, pi, log_pi):
-    """Apply squashing function."""
-    mu = torch.tanh(mu)
-    pi = torch.tanh(pi)
-    log_pi -= _squash(pi).sum(-1, keepdim=True)
-    return mu, pi, log_pi
-
-
-@torch.jit.script
-def symlog(x):
-    """
-    Symmetric logarithmic function.
-    Adapted from https://github.com/danijar/dreamerv3.
-    """
-    return torch.sign(x) * torch.log(1 + torch.abs(x))
-
-
-@torch.jit.script
-def symexp(x):
-    """
-    Symmetric exponential function.
-    Adapted from https://github.com/danijar/dreamerv3.
-    """
-    return torch.sign(x) * (torch.exp(torch.abs(x)) - 1)
-
-
-def two_hot(x, cfg):
-    """Converts a batch of scalars to soft two-hot encoded targets for discrete regression."""
-
-    # x shape [horizon, num_features]
-    if cfg.num_bins == 0:
-        return x
-    elif cfg.num_bins == 1:
-        return symlog(x)
-    x = torch.clamp(symlog(x), cfg.vmin, cfg.vmax)
-    bin_idx = torch.floor((x - cfg.vmin) / cfg.bin_size).long()  # shape [num_features]
-    bin_offset = ((x - cfg.vmin) / cfg.bin_size - bin_idx.float()).unsqueeze(-1)  # shape [num_features , 1]
-    soft_two_hot = torch.zeros(
-        *x.shape, cfg.num_bins, device=x.device
-    )  # shape [horizon, num_features, num_bins]
-    soft_two_hot.scatter_(2, bin_idx.unsqueeze(-1), 1 - bin_offset)
-    soft_two_hot.scatter_(2, (bin_idx.unsqueeze(-1) + 1) % cfg.num_bins, bin_offset)
-    return soft_two_hot
-
-
-def two_hot_inv(x, bins):
-    """Converts a batch of soft two-hot encoded vectors to scalars."""
-    num_bins = bins.shape[0]
-    if num_bins == 0:
-        return x
-    elif num_bins == 1:
-        return symexp(x)
-
-    x = F.softmax(x, dim=-1)
-    x = torch.sum(x * bins, dim=-1, keepdim=True)
-    return symexp(x)
--- a/lerobot/common/robot_devices/motors/feetech.py
+++ b/lerobot/common/robot_devices/motors/feetech.py
@@ -37,7 +37,7 @@ HALF_TURN_DEGREE = 180
 # See this link for STS3215 Memory Table:
 # https://docs.google.com/spreadsheets/d/1GVs7W1VS1PqdhA1nW-abeyAHhTUxKUdR/edit?usp=sharing&ouid=116566590112741600240&rtpof=true&sd=true
 # data_name: (address, size_byte)
-SCS_SERIES_CONTROL_TABLE = {
+STS_SERIES_CONTROL_TABLE = {
    "Model": (3, 2),
    "ID": (5, 1),
    "Baud_Rate": (6, 1),
@@ -87,6 +87,70 @@ SCS_SERIES_CONTROL_TABLE = {
    "Maximum_Acceleration": (85, 2),
 }

+SCS_SERIES_CONTROL_TABLE = {
+    "Model": (3, 2),
+    "ID": (5, 1),
+    "Baud_Rate": (6, 1),
+    "Return_Delay": (7, 1),
+    "Response_Status_Level": (8, 1),
+    "Min_Angle_Limit": (9, 2),
+    "Max_Angle_Limit": (11, 2),
+    "Max_Temperature_Limit": (13, 1),
+    "Max_Voltage_Limit": (14, 1),
+    "Min_Voltage_Limit": (15, 1),
+    "Max_Torque_Limit": (16, 2),
+    "Phase": (18, 1),
+    "Unloading_Condition": (19, 1),
+    "LED_Alarm_Condition": (20, 1),
+    "P_Coefficient": (21, 1),
+    "D_Coefficient": (22, 1),
+    "I_Coefficient": (23, 1),
+    "Minimum_Startup_Force": (24, 2),
+    "CW_Dead_Zone": (26, 1),
+    "CCW_Dead_Zone": (27, 1),
+    # "Protection_Current": (28, 2),
+    # "Angular_Resolution": (30, 1),
+    # "Offset": (31, 2),
+    # "Mode": (33, 1),
+    "Protective_Torque": (37, 1),
+    "Protection_Time": (38, 1),
+    "Torque_Enable": (40, 1),
+    "Acceleration": (41, 1),
+    "Goal_Position": (42, 2),
+    "Running_Time": (44, 2),
+    "Goal_Speed": (46, 2),
+    "Lock": (48, 1),
+    "Present_Position": (56, 2),
+    "Present_Speed": (58, 2),
+    "Present_Load": (60, 2),
+    "Present_Voltage": (62, 1),
+    "Present_Temperature": (63, 1),
+    "Sync_Write_Flag": (64, 1),
+    "Status": (65, 1),
+    "Moving": (66, 1),
+    # "Overload_Torque": (36, 1),
+    # "Speed_closed_loop_P_proportional_coefficient": (37, 1),
+    # "Over_Current_Protection_Time": (38, 1),
+    # "Velocity_closed_loop_I_integral_coefficient": (39, 1),
+    # "Acceleration": (41, 1),
+    # "Goal_Time": (44, 2),
+    # "Torque_Limit": (48, 2),
+    # "Present_Current": (69, 2),
+    # # Not in the Memory Table
+    # "Maximum_Acceleration": (85, 2),
+}
+
+STS_SERIES_BAUDRATE_TABLE = {
+    0: 1_000_000,
+    1: 500_000,
+    2: 250_000,
+    3: 128_000,
+    4: 115_200,
+    5: 57_600,
+    6: 38_400,
+    7: 19_200,
+}
+
 SCS_SERIES_BAUDRATE_TABLE = {
    0: 1_000_000,
    1: 500_000,
@@ -103,22 +167,31 @@ CONVERT_UINT32_TO_INT32_REQUIRED = ["Goal_Position", "Present_Position"]


 MODEL_CONTROL_TABLE = {
+    "sts_series": STS_SERIES_CONTROL_TABLE,
    "scs_series": SCS_SERIES_CONTROL_TABLE,
-    "sts3215": SCS_SERIES_CONTROL_TABLE,
+    "sts3215": STS_SERIES_CONTROL_TABLE,
+    "sts3250": STS_SERIES_CONTROL_TABLE,
+    "scs0009": SCS_SERIES_CONTROL_TABLE,
 }

 MODEL_RESOLUTION = {
-    "scs_series": 4096,
+    "sts_series": 4096,
+    "scs_series": 1024,
    "sts3215": 4096,
+    "sts3250": 4096,
+    "scs0009": 1024,
 }

 MODEL_BAUDRATE_TABLE = {
+    "sts_series": STS_SERIES_BAUDRATE_TABLE,
    "scs_series": SCS_SERIES_BAUDRATE_TABLE,
-    "sts3215": SCS_SERIES_BAUDRATE_TABLE,
+    "sts3215": STS_SERIES_BAUDRATE_TABLE,
+    "sts3250": STS_SERIES_BAUDRATE_TABLE,
+    "scs0009": SCS_SERIES_BAUDRATE_TABLE,
 }

 # High number of retries is needed for feetech compared to dynamixel motors.
-NUM_READ_RETRY = 20
+NUM_READ_RETRY = 50
 NUM_WRITE_RETRY = 20


@@ -273,12 +346,18 @@ class FeetechMotorsBus:
        self,
        port: str,
        motors: dict[str, tuple[int, str]],
+        group_sync_read: bool = True,
+        group_sync_write: bool = True,
+        protocol_version: int = 0,
        extra_model_control_table: dict[str, list[tuple]] | None = None,
        extra_model_resolution: dict[str, int] | None = None,
        mock=False,
    ):
        self.port = port
        self.motors = motors
+        self.group_sync_read = group_sync_read
+        self.group_sync_write = group_sync_write
+        self.protocol_version = protocol_version
        self.mock = mock

        self.model_ctrl_table = deepcopy(MODEL_CONTROL_TABLE)
@@ -311,7 +390,7 @@ class FeetechMotorsBus:
            import scservo_sdk as scs

        self.port_handler = scs.PortHandler(self.port)
-        self.packet_handler = scs.PacketHandler(PROTOCOL_VERSION)
+        self.packet_handler = scs.PacketHandler(self.protocol_version)

        try:
            if not self.port_handler.openPort():
@@ -335,7 +414,7 @@ class FeetechMotorsBus:
            import scservo_sdk as scs

        self.port_handler = scs.PortHandler(self.port)
-        self.packet_handler = scs.PacketHandler(PROTOCOL_VERSION)
+        self.packet_handler = scs.PacketHandler(self.protocol_version)

        if not self.port_handler.openPort():
            raise OSError(f"Failed to open port '{self.port}'.")
@@ -661,6 +740,8 @@ class FeetechMotorsBus:
        else:
            import scservo_sdk as scs

+        scs.SCS_SETEND(self.protocol_version)
+
        return_list = True
        if not isinstance(motor_ids, list):
            return_list = False
@@ -699,6 +780,8 @@ class FeetechMotorsBus:
        else:
            import scservo_sdk as scs

+        scs.SCS_SETEND(self.protocol_version)
+
        if not self.is_connected:
            raise RobotDeviceNotConnectedError(
                f"FeetechMotorsBus({self.port}) is not connected. You need to run `motors_bus.connect()`."
@@ -721,31 +804,51 @@ class FeetechMotorsBus:

        assert_same_address(self.model_ctrl_table, models, data_name)
        addr, bytes = self.model_ctrl_table[model][data_name]
-        group_key = get_group_sync_key(data_name, motor_names)

-        if data_name not in self.group_readers:
-            # create new group reader
-            self.group_readers[group_key] = scs.GroupSyncRead(
-                self.port_handler, self.packet_handler, addr, bytes
-            )
+        if self.group_sync_read:
+            group_key = get_group_sync_key(data_name, motor_names)
+
+            if data_name not in self.group_readers:
+                # create new group reader
+                self.group_readers[group_key] = scs.GroupSyncRead(
+                    self.port_handler, self.packet_handler, addr, bytes
+                )
+                for idx in motor_ids:
+                    self.group_readers[group_key].addParam(idx)
+
+            for _ in range(NUM_READ_RETRY):
+                comm = self.group_readers[group_key].txRxPacket()
+                if comm == scs.COMM_SUCCESS:
+                    break
+
+            if comm != scs.COMM_SUCCESS:
+                raise ConnectionError(
+                    f"Read failed due to communication error on port {self.port} for group_key {group_key}: "
+                    f"{self.packet_handler.getTxRxResult(comm)}"
+                )
+
+            values = []
            for idx in motor_ids:
-                self.group_readers[group_key].addParam(idx)
+                value = self.group_readers[group_key].getData(idx, addr, bytes)
+                values.append(value)
+        else:
+            values = []
+            for idx in motor_ids:
+                if bytes == 1:
+                    value, comm, error = self.packet_handler.read1ByteTxRx(self.port_handler, idx, addr)
+                elif bytes == 2:
+                    value, comm, error = self.packet_handler.read2ByteTxRx(self.port_handler, idx, addr)
+                elif bytes == 4:
+                    value, comm, error = self.packet_handler.read4ByteTxRx(self.port_handler, idx, addr)
+                else:
+                    raise ValueError(bytes)

-        for _ in range(NUM_READ_RETRY):
-            comm = self.group_readers[group_key].txRxPacket()
-            if comm == scs.COMM_SUCCESS:
-                break
+                if comm != scs.COMM_SUCCESS:
+                    raise ConnectionError(self.packet_handler.getTxRxResult(comm))
+                elif error != 0:
+                    raise ConnectionError(self.packet_handler.getRxPacketError(error))

-        if comm != scs.COMM_SUCCESS:
-            raise ConnectionError(
-                f"Read failed due to communication error on port {self.port} for group_key {group_key}: "
-                f"{self.packet_handler.getTxRxResult(comm)}"
-            )
-
-        values = []
-        for idx in motor_ids:
-            value = self.group_readers[group_key].getData(idx, addr, bytes)
-            values.append(value)
+                values.append(value)

        values = np.array(values)

@@ -775,6 +878,8 @@ class FeetechMotorsBus:
        else:
            import scservo_sdk as scs

+        scs.SCS_SETEND(self.protocol_version)
+
        if not isinstance(motor_ids, list):
            motor_ids = [motor_ids]
        if not isinstance(values, list):
@@ -811,6 +916,8 @@ class FeetechMotorsBus:
        else:
            import scservo_sdk as scs

+        scs.SCS_SETEND(self.protocol_version)
+
        if motor_names is None:
            motor_names = self.motor_names

@@ -836,27 +943,31 @@ class FeetechMotorsBus:

        assert_same_address(self.model_ctrl_table, models, data_name)
        addr, bytes = self.model_ctrl_table[model][data_name]
-        group_key = get_group_sync_key(data_name, motor_names)

-        init_group = data_name not in self.group_readers
-        if init_group:
-            self.group_writers[group_key] = scs.GroupSyncWrite(
-                self.port_handler, self.packet_handler, addr, bytes
-            )
+        if self.group_sync_write:
+            group_key = get_group_sync_key(data_name, motor_names)

-        for idx, value in zip(motor_ids, values, strict=True):
-            data = convert_to_bytes(value, bytes, self.mock)
+            init_group = data_name not in self.group_readers
            if init_group:
-                self.group_writers[group_key].addParam(idx, data)
-            else:
-                self.group_writers[group_key].changeParam(idx, data)
+                self.group_writers[group_key] = scs.GroupSyncWrite(
+                    self.port_handler, self.packet_handler, addr, bytes
+                )

-        comm = self.group_writers[group_key].txPacket()
-        if comm != scs.COMM_SUCCESS:
-            raise ConnectionError(
-                f"Write failed due to communication error on port {self.port} for group_key {group_key}: "
-                f"{self.packet_handler.getTxRxResult(comm)}"
-            )
+            for idx, value in zip(motor_ids, values, strict=True):
+                data = convert_to_bytes(value, bytes, self.mock)
+                if init_group:
+                    self.group_writers[group_key].addParam(idx, data)
+                else:
+                    self.group_writers[group_key].changeParam(idx, data)
+
+            comm = self.group_writers[group_key].txPacket()
+            if comm != scs.COMM_SUCCESS:
+                raise ConnectionError(
+                    f"Write failed due to communication error on port {self.port} for group_key {group_key}: "
+                    f"{self.packet_handler.getTxRxResult(comm)}"
+                )
+        else:
+            raise NotImplementedError()

        # log the number of seconds it took to write the data to the motors
        delta_ts_name = get_log_name("delta_timestamp_s", "write", data_name, motor_names)
--- a/lerobot/common/robot_devices/robots/feetech_calibration.py
+++ b/lerobot/common/robot_devices/robots/feetech_calibration.py
@@ -64,7 +64,7 @@ def move_until_block(arm, motor_name, positive_direction=True, while_move_hook=N
        # print(f"{present_voltage=}")
        # print(f"{present_temperature=}")

-        if present_speed == 0 and present_current > 40:
+        if present_speed == 0 and present_current > 50:
            count += 1
            if count > 100 or present_current > 300:
                return present_pos
--- a/lerobot/common/robot_devices/robots/manipulator.py
+++ b/lerobot/common/robot_devices/robots/manipulator.py
@@ -338,10 +338,17 @@ class ManipulatorRobot:

                elif self.robot_type in ["so100", "moss"]:
                    from lerobot.common.robot_devices.robots.feetech_calibration import (
+                        run_arm_auto_calibration,
                        run_arm_manual_calibration,
                    )

-                    calibration = run_arm_manual_calibration(arm, self.robot_type, name, arm_type)
+                    # TODO(rcadene): better way to handle mocking + test run_arm_auto_calibration
+                    if arm_type == "leader" or arm.mock:
+                        calibration = run_arm_manual_calibration(arm, self.robot_type, name, arm_type)
+                    elif arm_type == "follower":
+                        calibration = run_arm_auto_calibration(arm, self.robot_type, name, arm_type)
+                    else:
+                        raise ValueError(arm_type)

                print(f"Calibration is done! Saving calibration file '{arm_calib_path}'")
                arm_calib_path.parent.mkdir(parents=True, exist_ok=True)
--- a/lerobot/configs/robot/hopejr.yaml
+++ b/lerobot/configs/robot/hopejr.yaml
@@ -0,0 +1,73 @@
+# [SO-100 robot arm](https://github.com/TheRobotStudio/SO-ARM100)
+
+# Requires installing extras packages
+# With pip: `pip install -e ".[feetech]"`
+# With poetry: `poetry install --sync --extras "feetech"`
+
+# See [tutorial](https://github.com/huggingface/lerobot/blob/main/examples/10_use_so100.md)
+
+_target_: lerobot.common.robot_devices.robots.manipulator.ManipulatorRobot
+robot_type: hopejr
+calibration_dir: .cache/calibration/hopejr
+
+# `max_relative_target` limits the magnitude of the relative positional target vector for safety purposes.
+# Set this to a positive scalar to have the same value for all motors, or a list that is the same length as
+# the number of motors in your follower arms.
+max_relative_target: null
+
+# leader_arms:
+#   main:
+#     _target_: lerobot.common.robot_devices.motors.feetech.FeetechMotorsBus
+#     port: /dev/tty.usbmodem585A0077581
+#     motors:
+#       # name: (index, model)
+#       shoulder_pan: [1, "sts3215"]
+#       shoulder_lift: [2, "sts3215"]
+#       elbow_flex: [3, "sts3215"]
+#       wrist_flex: [4, "sts3215"]
+#       wrist_roll: [5, "sts3215"]
+#       gripper: [6, "sts3215"]
+
+follower_arms:
+  main:
+    _target_: lerobot.common.robot_devices.motors.feetech.FeetechMotorsBus
+    port: /dev/tty.usbserial-2130
+    motors:
+      # name: (index, model)
+      shoulder_pitch: [1, "sts3250"]
+      shoulder_yaw: [2, "sts3215"] # TODO: sts3250
+      shoulder_roll: [3, "sts3215"] # TODO: sts3250
+      elbow_flex: [4, "sts3250"]
+      wrist_roll: [5, "sts3215"]
+      wrist_yaw: [6, "sts3215"]
+      wrist_pitch: [7, "sts3215"]
+      thumb_basel_rotation: [30, "scs0009"]
+      thumb_flexion: [27, "scs0009"]
+      thumb_pinky_side: [26, "scs0009"]
+      thumb_thumb_side: [28, "scs0009"]
+      index_flexor: [25, "scs0009"]
+      index_pinky_side: [31, "scs0009"]
+      index_thumb_side: [32, "scs0009"]
+      middle_flexor: [24, "scs0009"]
+      middle_pinky_side: [33, "scs0009"]
+      middle_thumb_side: [34, "scs0009"]
+      ring_flexor: [21, "scs0009"]
+      ring_pinky_side: [36, "scs0009"]
+      ring_thumb_side: [35, "scs0009"]
+      pinky_flexor: [23, "scs0009"]
+      pinky_pinky_side: [38, "scs0009"]
+      pinky_thumb_side: [37, "scs0009"]
+
+cameras:
+  laptop:
+    _target_: lerobot.common.robot_devices.cameras.opencv.OpenCVCamera
+    camera_index: 0
+    fps: 30
+    width: 640
+    height: 480
+  phone:
+    _target_: lerobot.common.robot_devices.cameras.opencv.OpenCVCamera
+    camera_index: 1
+    fps: 30
+    width: 640
+    height: 480
--- a/lerobot/scripts/train.py
+++ b/lerobot/scripts/train.py
@@ -93,18 +93,6 @@ def make_optimizer_and_scheduler(cfg, policy):
    elif policy.name == "tdmpc":
        optimizer = torch.optim.Adam(policy.parameters(), cfg.training.lr)
        lr_scheduler = None
-
-    elif policy.name == "tdmpc2":
-        params_group = [
-            {"params": policy.model._encoder.parameters(), "lr": cfg.training.lr * cfg.training.enc_lr_scale},
-            {"params": policy.model._dynamics.parameters()},
-            {"params": policy.model._reward.parameters()},
-            {"params": policy.model._Qs.parameters()},
-            {"params": policy.model._pi.parameters(), "eps": 1e-5},
-        ]
-        optimizer = torch.optim.Adam(params_group, lr=cfg.training.lr)
-        lr_scheduler = None
-
    elif cfg.policy.name == "vqbet":
        from lerobot.common.policies.vqbet.modeling_vqbet import VQBeTOptimizer, VQBeTScheduler

--- a/media/gym/aloha_act.gif
+++ b/media/gym/aloha_act.gif
--- a/media/gym/pusht_diffusion.gif
+++ b/media/gym/pusht_diffusion.gif
--- a/media/gym/simxarm_tdmpc.gif
+++ b/media/gym/simxarm_tdmpc.gif
--- a/media/moss/follower_rest.webp
+++ b/media/moss/follower_rest.webp
--- a/media/moss/follower_rotated.webp
+++ b/media/moss/follower_rotated.webp
--- a/media/moss/follower_zero.webp
+++ b/media/moss/follower_zero.webp
--- a/media/so100/follower_rest.webp
+++ b/media/so100/follower_rest.webp
--- a/media/so100/follower_rotated.webp
+++ b/media/so100/follower_rotated.webp
--- a/media/so100/follower_zero.webp
+++ b/media/so100/follower_zero.webp
--- a/tests/mock_scservo_sdk.py
+++ b/tests/mock_scservo_sdk.py
@@ -18,6 +18,10 @@ def convert_to_bytes(value, bytes):
    return value


+def SCS_SETEND(protocol_version):
+    del protocol_version
+
+
 def get_default_motor_values(motor_index):
    return {
        # Key (int) are from SCS_SERIES_CONTROL_TABLE
Author	SHA1	Message	Date
Remi Cadene	f996a13f70	Add test3 test4	2024-12-29 13:30:17 +01:00
Remi Cadene	743ebfa7c1	Cremaillaire HF	2024-12-24 11:33:55 +01:00
Remi Cadene	2c45660d77	WIP	2024-11-27 13:59:37 +01:00
Remi Cadene	9dd4414c6e	fix autocalib moss	2024-10-26 10:43:52 +02:00