init commit

2026-03-16 11:44:10 +00:00
commit 94384a93c9
552 changed files with 363038 additions and 0 deletions
--- a/workflows/simbox/core/cameras/README.md
+++ b/workflows/simbox/core/cameras/README.md
@@ -0,0 +1,118 @@
+# Cameras
+
+Template-based cameras for simbox tasks. All cameras currently use a single generic implementation, `CustomCamera`, which is configured entirely from the task YAML.
+
+## Available cameras
+
+| Camera class    | Notes |
+|-----------------|-------|
+| `CustomCamera`  | Generic pinhole RGB-D camera with configurable intrinsics and pose. |
+
+Importing `CustomCamera` in your task (e.g. `banana.py`) is enough to register it via `@register_camera`.
+
+---
+
+## Customizing a camera configuration
+
+Camera behavior is controlled by the config (`cfg`) passed into `CustomCamera.__init__` in `banana.py`. You typically edit the YAML under `configs/simbox/...`.
+
+### 1. Top-level camera fields
+
+Each camera entry in the YAML should provide:
+
+- **`name`**: Unique camera name (string). Used for prim paths and as the key in `task.cameras`.
+- **`parent`**: Optional prim path (under the task root) that the camera mount is attached to. Empty string (`""`) means no specific parent.
+- **`translation`**: Initial camera translation in world or parent frame, as a list of three floats `[x, y, z]` (meters).
+- **`orientation`**: Initial camera orientation as a quaternion `[w, x, y, z]`.
+- **`camera_axes`**: Axes convention for `set_local_pose` (e.g. `[1.0, 0.0, 0.0]` etc. – follow existing configs).
+
+These values are used in `banana.py` when calling:
+
+```python
+camera.set_local_pose(
+    translation=cfg["translation"],
+    orientation=cfg["orientation"],
+    camera_axes=cfg["camera_axes"],
+)
+```
+
+### 2. Required `params` fields
+
+Inside each camera config there is a `params` dict that controls the optics and intrinsics. `CustomCamera` expects:
+
+- **`pixel_size`** (`float`, microns)
+  Physical pixel size on the sensor. Used to compute horizontal/vertical aperture and focal length.
+
+- **`f_number`** (`float`)
+  Lens f-number. Used in `set_lens_aperture(f_number * 100.0)`.
+
+- **`focus_distance`** (`float`, meters)
+  Focus distance passed to `set_focus_distance`.
+
+- **`camera_params`** (`[fx, fy, cx, cy]`)
+  Intrinsic matrix parameters in pixel units:
+  - `fx`, `fy`: focal lengths in x/y (pixels)
+  - `cx`, `cy`: principal point (pixels)
+
+- **`resolution_width`** (`int`)
+  Image width in pixels.
+
+- **`resolution_height`** (`int`)
+  Image height in pixels.
+
+Optional:
+
+- **`output_mode`** (`"rgb"` or `"diffuse_albedo"`, default `"rgb"`)
+  Controls which color source is used in `get_observations()`.
+
+### 3. How the parameters are used in `CustomCamera`
+
+Given `cfg["params"]`, `CustomCamera` does the following:
+
+- Computes the camera apertures and focal length:
+  - `horizontal_aperture = pixel_size * 1e-3 * width`
+  - `vertical_aperture = pixel_size * 1e-3 * height`
+  - `focal_length_x = fx * pixel_size * 1e-3`
+  - `focal_length_y = fy * pixel_size * 1e-3`
+  - `focal_length = (focal_length_x + focal_length_y) / 2`
+- Sets optical parameters:
+  - `set_focal_length(focal_length / 10.0)`
+  - `set_focus_distance(focus_distance)`
+  - `set_lens_aperture(f_number * 100.0)`
+  - `set_horizontal_aperture(horizontal_aperture / 10.0)`
+  - `set_vertical_aperture(vertical_aperture / 10.0)`
+  - `set_clipping_range(0.05, 1.0e5)`
+  - `set_projection_type("pinhole")`
+- Recomputes intrinsic matrix `K` on the fly:
+
+  ```python
+  fx = width * self.get_focal_length() / self.get_horizontal_aperture()
+  fy = height * self.get_focal_length() / self.get_vertical_aperture()
+  self.is_camera_matrix = np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]])
+  ```
+
+### 4. Outputs from `get_observations()`
+
+`CustomCamera.get_observations()` returns a dict:
+
+- **`color_image`**: RGB image (`H x W x 3`, float32), either from `get_rgba()` or `DiffuseAlbedo` depending on `output_mode`.
+- **`depth_image`**: Depth map from `get_depth()` (same resolution as color).
+- **`camera2env_pose`**: 4x4 transform from camera to environment, computed from USD prims.
+- **`camera_params`**: 3x3 intrinsic matrix `K` as a Python list.
+
+These are the values consumed by tasks (e.g. `banana.py`) for perception and planning.
+
+---
+
+## Summary checklist for a new camera
+
+To add or tweak a camera in a task YAML:
+
+1. **Choose a `name`** and, optionally, a `parent` prim under the task root.
+2. **Set pose**: `translation`, `orientation` (quaternion `[w, x, y, z]`), and `camera_axes`.
+3. Under `params`, provide:
+   - `pixel_size`, `f_number`, `focus_distance`
+   - `camera_params = [fx, fy, cx, cy]`
+   - `resolution_width`, `resolution_height`
+   - optional `output_mode` (`"rgb"` or `"diffuse_albedo"`).
+4. Ensure your task (e.g. `banana.py`) constructs `CustomCamera` with this `cfg` (this is already wired up in the current code).
--- a/workflows/simbox/core/cameras/init.py
+++ b/workflows/simbox/core/cameras/init.py
@@ -0,0 +1,21 @@
+"""Camera module initialization."""
+
+from core.cameras.base_camera import CAMERA_DICT
+
+from .custom_camera import CustomCamera
+
+__all__ = [
+    "CustomCamera",
+    "get_camera_cls",
+    "get_camera_dict",
+]
+
+
+def get_camera_cls(category_name):
+    """Get camera class by category name."""
+    return CAMERA_DICT[category_name]
+
+
+def get_camera_dict():
+    """Get camera dictionary."""
+    return CAMERA_DICT
--- a/workflows/simbox/core/cameras/base_camera.py
+++ b/workflows/simbox/core/cameras/base_camera.py
@@ -0,0 +1,9 @@
+CAMERA_DICT = {}
+
+
+def register_camera(target_class):
+    # key = "_".join(re.sub(r"([A-Z0-9])", r" \1", target_class.__name__).split()).lower()
+    key = target_class.__name__
+    assert key not in CAMERA_DICT
+    CAMERA_DICT[key] = target_class
+    return target_class
--- a/workflows/simbox/core/cameras/custom_camera.py
+++ b/workflows/simbox/core/cameras/custom_camera.py
@@ -0,0 +1,163 @@
+import numpy as np
+import omni.replicator.core as rep
+from core.cameras.base_camera import register_camera
+from omni.isaac.core.prims import XFormPrim
+from omni.isaac.core.utils.prims import get_prim_at_path
+from omni.isaac.core.utils.transformations import (
+    get_relative_transform,
+    pose_from_tf_matrix,
+)
+from omni.isaac.sensor import Camera
+
+
+@register_camera
+class CustomCamera(Camera):
+    """Generic pinhole RGB-D camera used in simbox tasks."""
+
+    def __init__(self, cfg, prim_path, root_prim_path, reference_path, name, *args, **kwargs):
+        """
+        Args:
+            cfg: Config dict with required keys:
+                - params: Dict containing:
+                    - pixel_size: Pixel size in microns
+                    - f_number: F-number
+                    - focus_distance: Focus distance in meters
+                    - camera_params: [fx, fy, cx, cy] camera intrinsics
+                    - resolution_width: Image width
+                    - resolution_height: Image height
+                - output_mode (optional): "rgb" or "diffuse_albedo"
+            prim_path: Camera prim path in USD stage
+            root_prim_path: Root prim path in USD stage
+            reference_path: Reference prim path for camera mounting
+            name: Camera name
+        """
+        # ===== Initialize camera =====
+        super().__init__(
+            prim_path=prim_path,
+            name=name,
+            resolution=(cfg["params"]["resolution_width"], cfg["params"]["resolution_height"]),
+            *args,
+            **kwargs,
+        )
+        self.initialize()
+        self.add_motion_vectors_to_frame()
+        self.add_semantic_segmentation_to_frame()
+        self.add_distance_to_image_plane_to_frame()
+
+        # ===== From cfg =====
+        pixel_size = cfg["params"].get("pixel_size")
+        f_number = cfg["params"].get("f_number")
+        focus_distance = cfg["params"].get("focus_distance")
+        fx, fy, cx, cy = cfg["params"].get("camera_params")
+        width = cfg["params"].get("resolution_width")
+        height = cfg["params"].get("resolution_height")
+        self.output_mode = cfg.get("output_mode", "rgb")
+
+        # ===== Compute and set camera parameters =====
+        horizontal_aperture = pixel_size * 1e-3 * width
+        vertical_aperture = pixel_size * 1e-3 * height
+        focal_length_x = fx * pixel_size * 1e-3
+        focal_length_y = fy * pixel_size * 1e-3
+        focal_length = (focal_length_x + focal_length_y) / 2
+
+        self.set_focal_length(focal_length / 10.0)
+        self.set_focus_distance(focus_distance)
+        self.set_lens_aperture(f_number * 100.0)
+        self.set_horizontal_aperture(horizontal_aperture / 10.0)
+        self.set_vertical_aperture(vertical_aperture / 10.0)
+        self.set_clipping_range(0.05, 1.0e5)
+        self.set_projection_type("pinhole")
+
+        fx = width * self.get_focal_length() / self.get_horizontal_aperture()
+        fy = height * self.get_focal_length() / self.get_vertical_aperture()
+        self.is_camera_matrix = np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]])
+
+        self.reference_path = reference_path
+        self.root_prim_path = root_prim_path
+        self.parent_camera_prim_path = str(self.prim.GetParent().GetPath())
+        self.parent_camera_xform = XFormPrim(self.parent_camera_prim_path)
+
+        if self.output_mode == "diffuse_albedo":
+            self.add_diffuse_albedo_to_frame()
+
+    def add_diffuse_albedo_to_frame(self) -> None:
+        """Attach the diffuse_albedo annotator to this camera."""
+        if "DiffuseAlbedo" not in self._custom_annotators:
+            self._custom_annotators["DiffuseAlbedo"] = rep.AnnotatorRegistry.get_annotator("DiffuseAlbedo")
+            self._custom_annotators["DiffuseAlbedo"].attach([self._render_product_path])
+        self._current_frame["DiffuseAlbedo"] = None
+
+    def remove_diffuse_albedo_from_frame(self) -> None:
+        if self._custom_annotators["DiffuseAlbedo"] is not None:
+            self._custom_annotators["DiffuseAlbedo"].detach([self._render_product_path])
+            self._custom_annotators["DiffuseAlbedo"] = None
+        self._current_frame.pop("DiffuseAlbedo", None)
+
+    def add_specular_albedo_to_frame(self) -> None:
+        """Attach the specular_albedo annotator to this camera."""
+        if self._custom_annotators["SpecularAlbedo"] is None:
+            self._custom_annotators["SpecularAlbedo"] = rep.AnnotatorRegistry.get_annotator("SpecularAlbedo")
+            self._custom_annotators["SpecularAlbedo"].attach([self._render_product_path])
+        self._current_frame["SpecularAlbedo"] = None
+
+    def remove_specular_albedo_from_frame(self) -> None:
+        if self._custom_annotators["SpecularAlbedo"] is not None:
+            self._custom_annotators["SpecularAlbedo"].detach([self._render_product_path])
+            self._custom_annotators["SpecularAlbedo"] = None
+        self._current_frame.pop("SpecularAlbedo", None)
+
+    def add_direct_diffuse_to_frame(self) -> None:
+        """Attach the direct_diffuse annotator to this camera."""
+        if self._custom_annotators["DirectDiffuse"] is None:
+            self._custom_annotators["DirectDiffuse"] = rep.AnnotatorRegistry.get_annotator("DirectDiffuse")
+            self._custom_annotators["DirectDiffuse"].attach([self._render_product_path])
+        self._current_frame["DirectDiffuse"] = None
+
+    def remove_direct_diffuse_from_frame(self) -> None:
+        if self._custom_annotators["DirectDiffuse"] is not None:
+            self._custom_annotators["DirectDiffuse"].detach([self._render_product_path])
+            self._custom_annotators["DirectDiffuse"] = None
+        self._current_frame.pop("DirectDiffuse", None)
+
+    def add_indirect_diffuse_to_frame(self) -> None:
+        """Attach the indirect_diffuse annotator to this camera."""
+        if self._custom_annotators["IndirectDiffuse"] is None:
+            self._custom_annotators["IndirectDiffuse"] = rep.AnnotatorRegistry.get_annotator("IndirectDiffuse")
+            self._custom_annotators["IndirectDiffuse"].attach([self._render_product_path])
+        self._current_frame["IndirectDiffuse"] = None
+
+    def remove_indirect_diffuse_from_frame(self) -> None:
+        if self._custom_annotators["IndirectDiffuse"] is not None:
+            self._custom_annotators["IndirectDiffuse"].detach([self._render_product_path])
+            self._custom_annotators["IndirectDiffuse"] = None
+        self._current_frame.pop("IndirectDiffuse", None)
+
+    def get_observations(self):
+        if self.reference_path:
+            camera_mount2env_pose = get_relative_transform(
+                get_prim_at_path(self.reference_path), get_prim_at_path(self.root_prim_path)
+            )
+            camera_mount2env_pose = pose_from_tf_matrix(camera_mount2env_pose)
+            self.parent_camera_xform.set_local_pose(
+                translation=camera_mount2env_pose[0],
+                orientation=camera_mount2env_pose[1],
+            )
+        camera2env_pose = get_relative_transform(
+            get_prim_at_path(self.prim_path), get_prim_at_path(self.root_prim_path)
+        )
+
+        if self.output_mode == "rgb":
+            color_image = self.get_rgba()[..., :3]
+        elif self.output_mode == "diffuse_albedo":
+            color_image = self._custom_annotators["DiffuseAlbedo"].get_data()[..., :3]
+        else:
+            raise NotImplementedError
+
+        obs = {
+            "color_image": color_image,
+            "depth_image": self.get_depth(),
+            "camera2env_pose": camera2env_pose,
+            "camera_params": self.is_camera_matrix.tolist(),
+        }
+
+        return obs