feat: add test tube pick task with custom assets and grasp annotations

- Add pick_test_tube task: USDC asset repackaging, grasp generation, task config
- Add tools: usdc_to_obj.py, repackage_test_tube.py, fix_test_tube_materials.py
- Add custom_task_guide.md: full Chinese documentation for creating custom tasks
- Add crawled InternDataEngine online docs (23 pages)
- Add grasp generation script (gen_tube_grasp.py) and pipeline config
This commit is contained in:
Tangger
2026-04-05 11:01:59 +08:00
parent 6314603676
commit 3d6b73753a
36 changed files with 18013 additions and 0 deletions

View File

@@ -0,0 +1,736 @@
# Source: https://internrobotics.github.io/InternDataEngine-Docs/concepts/robots.html
# Robots [](#robots)
InternDataEngine supports multiple robot platforms for manipulation tasks. Each robot has a dedicated wrapper class for articulation control and state management.
## Supported Robots [](#supported-robots)
| Robot | Type | DOF | Gripper (DOF) | Arm Model |
| **ARX Lift-2 ** | Dual-arm | 6+6 | Parallel (2) | R5a |
| **Agilex Split Aloha ** | Dual-arm | 6+6 | Parallel (2) | Piper-100 |
| **Genie-1 ** | Dual-arm | 7+7 | Parallel (2) | G1-120s |
| **Franka FR3 ** | Single-arm | 7 | Panda (1) | Franka |
| **Franka Robotiq85 ** | Single-arm | 7 | Robotiq 2F-85 (2) | Franka |
### Robot End Effector and TCP Frame Visualizations [](#robot-end-effector-and-tcp-frame-visualizations)
FR3 (Single-arm, Franka Panda Gripper)
![FR3 EE Head](/InternDataEngine-Docs/franka/ee_head_vis.png)
*FR3 end-effector head frame visualization *
![FR3 TCP Head](/InternDataEngine-Docs/franka/tcp_head_vis.png)
*FR3 TCP (Tool Center Point) head frame visualization *
![FR3 TCP Hand](/InternDataEngine-Docs/franka/tcp_hand_vis.png)
*FR3 TCP hand frame visualization *
Franka Robotiq85 (Single-arm, Robotiq 2F-85 Gripper)
![Franka Robotiq85 EE Head](/InternDataEngine-Docs/frankarobotiq/ee_head_vis.png)
*Franka Robotiq85 end-effector head frame visualization *
![Franka Robotiq85 TCP Head](/InternDataEngine-Docs/frankarobotiq/tcp_head_vis.png)
*Franka Robotiq85 TCP (Tool Center Point) head frame visualization *
![Franka Robotiq85 TCP Hand](/InternDataEngine-Docs/frankarobotiq/tcp_hand_vis.png)
*Franka Robotiq85 TCP hand frame visualization *
Genie-1 (Dual-arm, G1-120s)
![Genie-1 Left EE Head](/InternDataEngine-Docs/genie1/leftee_head_left_vis.png)
*Genie-1 left arm end-effector head frame visualization *
![Genie-1 Left TCP Head](/InternDataEngine-Docs/genie1/lefttcp_head_vis.png)
*Genie-1 left arm TCP head frame visualization *
![Genie-1 Left TCP Hand](/InternDataEngine-Docs/genie1/lefttcp_hand_left_vis.png)
*Genie-1 left arm TCP hand frame visualization *
![Genie-1 Right EE Head](/InternDataEngine-Docs/genie1/rightee_head_vis.png)
*Genie-1 right arm end-effector head frame visualization *
![Genie-1 Right TCP Head](/InternDataEngine-Docs/genie1/righttcp_head_vis.png)
*Genie-1 right arm TCP head frame visualization *
![Genie-1 Right TCP Hand](/InternDataEngine-Docs/genie1/righttcp_hand_right_vis.png)
*Genie-1 right arm TCP hand frame visualization *
ARX Lift-2 (Dual-arm, R5a)
![Lift-2 Left EE Head](/InternDataEngine-Docs/lift2/leftee_head_left_vis.jpeg)
*Lift-2 left arm end-effector head frame visualization *
![Lift-2 Left TCP Head](/InternDataEngine-Docs/lift2/lefttcp_head_vis.jpeg)
*Lift-2 left arm TCP head frame visualization *
![Lift-2 Left TCP Hand](/InternDataEngine-Docs/lift2/lefttcp_hand_left_vis.jpeg)
*Lift-2 left arm TCP hand frame visualization *
![Lift-2 Right EE Head](/InternDataEngine-Docs/lift2/rightee_head_vis.jpeg)
*Lift-2 right arm end-effector head frame visualization *
![Lift-2 Right TCP Head](/InternDataEngine-Docs/lift2/righttcp_head_vis.jpeg)
*Lift-2 right arm TCP head frame visualization *
![Lift-2 Right TCP Hand](/InternDataEngine-Docs/lift2/righttcp_hand_right_vis.jpeg)
*Lift-2 right arm TCP hand frame visualization *
Agilex Split Aloha (Dual-arm, Piper-100)
![Split Aloha Right EE Head](/InternDataEngine-Docs/split_aloha/rightee_head_vis.png)
*Split Aloha right arm end-effector head frame visualization *
![Split Aloha Right TCP Head](/InternDataEngine-Docs/split_aloha/righttcp_head_vis.png)
*Split Aloha right arm TCP head frame visualization *
![Split Aloha Right TCP Hand](/InternDataEngine-Docs/split_aloha/righttcp_hand_right_vis.png)
*Split Aloha right arm TCP hand frame visualization *
## Robot Configuration [](#robot-configuration)
Robot configuration is split into two parts: **task-level configuration **(in task YAML) and **robot-specific parameters **(in separate robot YAML files).
### Part 1: Task-Level Configuration (Task YAML) [](#part-1-task-level-configuration-task-yaml)
Configure robot instance in task YAML files:
yaml
```
robots:
- name: "lift2" # Robot identifier
robot_config_file: workflows/simbox/core/configs/robots/lift2.yaml # Path to robot-specific config
euler: [0.0, 0.0, 90.0] # Initial orientation [roll, pitch, yaw] in degrees
ignore_substring: ["material", "table", "gso_box"] # Collision filter substrings
```
1
2
3
4
5
### Part 2: Robot-Specific Parameters (Robot YAML) [](#part-2-robot-specific-parameters-robot-yaml)
Define robot hardware parameters in a separate YAML file (e.g., `workflows/simbox/core/configs/robots/lift2.yaml `):
yaml
```
# Robot info
target_class: Lift2 # Python class for robot wrapper
path: "lift2/robot_invisible.usd" # USD file path relative to asset root
# CuRobo kinematics config files (one per arm for dual-arm robots)
robot_file:
- workflows/simbox/curobo/src/curobo/content/configs/robot/r5a_left_arm.yml
- workflows/simbox/curobo/src/curobo/content/configs/robot/r5a_right_arm.yml
# Gripper parameters
gripper_max_width: 0.088 # Maximum gripper opening width (meters)
gripper_min_width: 0.0 # Minimum gripper closing width (meters)
tcp_offset: 0.125 # Tool center point offset from end-effector (meters)
# Solver parameters for physics simulation
solver_position_iteration_count: 128 # Position solver iterations
solver_velocity_iteration_count: 4 # Velocity solver iterations
stabilization_threshold: 0.005 # Stabilization threshold
# Joint indices in articulation
left_joint_indices: [10, 12, 14, 16, 18, 20] # Left arm joint indices
right_joint_indices: [9, 11, 13, 15, 17, 19] # Right arm joint indices
left_gripper_indices: [23] # Left end-effector (gripper) joint index
right_gripper_indices: [21] # Right end-effector (gripper) joint index
lift_indices: [6] # Lift joint index
# End-effector paths
fl_ee_path: "lift2/lift2/fl/link6" # Front-left end-effector prim path
fr_ee_path: "lift2/lift2/fr/link6" # Front-right end-effector prim path
fl_base_path: "lift2/lift2/fl/base_link" # Front-left base prim path
fr_base_path: "lift2/lift2/fr/base_link" # Front-right base prim path
# Gripper keypoints for visualization
fl_gripper_keypoints:
tool_head: [0.135, 0.0, 0.0, 1]
tool_tail: [0.085, 0.0, 0.0, 1]
tool_side: [0.135, -0.044, 0.0, 1]
fr_gripper_keypoints:
tool_head: [0.135, 0.0, 0.0, 1]
tool_tail: [0.085, 0.0, 0.0, 1]
tool_side: [0.135, -0.044, 0.0, 1]
# Collision filter paths
fl_filter_paths: # Paths to filter from collision (gripper fingers)
- "lift2/lift2/fl/link7"
- "lift2/lift2/fl/link8"
fr_filter_paths:
- "lift2/lift2/fr/link7"
- "lift2/lift2/fr/link8"
fl_forbid_collision_paths: # Paths forbidden for self-collision
- "lift2/lift2/fl/link2"
- "lift2/lift2/fl/link3"
- "lift2/lift2/fl/link4"
- "lift2/lift2/fl/link5"
fr_forbid_collision_paths:
- "lift2/lift2/fr/link2"
- "lift2/lift2/fr/link3"
- "lift2/lift2/fr/link4"
- "lift2/lift2/fr/link5"
# Pose processing parameters
R_ee_graspnet: [[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0]] # Grasp rotation correction
ee_axis: "x" # End-effector approach axis (x/y/z)
# Default joint home positions (radians)
left_joint_home: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
right_joint_home: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
left_joint_home_std: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # Randomization std for left arm
right_joint_home_std: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # Randomization std for right arm
left_gripper_home: [0.044] # Left gripper default width (meters)
right_gripper_home: [0.044] # Right gripper default width (meters)
lift_home: [0.46] # Lift joint default position (meters)
```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
## Understanding Robot Parameters [](#understanding-robot-parameters)
### Task-Level Fields [](#task-level-fields)
- **name **( str ): Robot identifier used in skills and cameras.
- **robot_config_file **( str ): Path to robot-specific YAML configuration file.
- **euler **( list ): Initial robot orientation in degrees [roll, pitch, yaw].
- **ignore_substring **( list ): Collision filter substrings to ignore during simulation.
### Robot-Specific Fields [](#robot-specific-fields)
Some fields require detailed explanation due to their importance in grasp pose processing and robot kinematics:
R_ee_graspnet
A 3×3 rotation matrix that transforms the grasp pose orientation from the pre-defined graspnet frame to the robot's end-effector frame. Our generated grasp pose follows the frame definition from GraspNet.
![GraspNet Gripper Frame Definition](/InternDataEngine-Docs/graspnet_def.png)
*GraspNet gripper frame definition (source: graspnetAPI) *
*Source: [graspnet/graspnetAPI](https://github.com/graspnet/graspnetAPI?tab=readme-ov-file)*
The following examples illustrate how `R_ee_graspnet `is configured for different robots. You can verify these values by comparing the GraspNet frame definition with each robot's end-effector orientation shown in the visualizations above.
Config Example:
yaml
```
# FR3
R_ee_graspnet: [[0.0, 0.0, -1.0], [0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
# Genie-1
R_ee_graspnet: [[0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [1.0, 0.0, 0.0]]
# ARX Lift-2
R_ee_graspnet: [[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0]]
# Agilex Split Aloha
R_ee_graspnet: [[0.0, -1.0, 0.0], [0.0, 0.0, -1.0], [1.0, 0.0, 0.0]]
```
1
2
3
4
5
6
7
8
9
10
11
fl_ee_path / fr_ee_path
The USD prim path to the end-effector link (typically the last link of the arm **before **the TCP).
Warning
This link frame has a fixed transformation to the TCP frame. This path should be **aligned with the `ee_link `in the CuRobo config file **.
Usage:
- Compute the end-effector pose relative to the robot base via `get_relative_transform `
- Define where gripper keypoints are attached
- Serve as the reference frame for TCP (Tool Center Point) calculations
Config Example:
yaml
```
# FR3
fl_ee_path: "fr3/panda_hand" # relative to root robot prim path
# Franka Robotiq
fl_ee_path: "arm/panda_link8"
# Agilex Split Aloha
fl_ee_path: "split_aloha_mid_360_with_piper/split_aloha_mid_360_with_piper/fl/link6"
fr_ee_path: "split_aloha_mid_360_with_piper/split_aloha_mid_360_with_piper/fr/link6"
```
1
2
3
4
5
6
7
8
9
ee_axis
The axis along from end-effector origin to gripper TCP. Valid values are `"x" `, `"y" `, or `"z" `.
Usage:
- Used in `pose_post_process_fn `to calculate the actual EE position from TCP
- Affects the 180° rotation variant generation for grasp pose diversity
Config Example:
yaml
```
# FR3
ee_axis: "z"
# ARX Lift-2
ee_axis: "x"
# Agilex Split Aloha
ee_axis: "z"
```
1
2
3
4
5
6
7
8
tcp_offset
The distance from the end-effector frame origin to the Tool Center Point (TCP). The TCP is the point where the gripper fingertips meet when closed, which is the actual grasping point.
Usage:
- During grasp pose processing, the EE position is calculated as: `ee_center = tcp_center + approach_axis * (depth - tcp_offset) `
- This offset accounts for the physical distance between the robot's end-effector frame and the actual grasping point
fl_gripper_keypoints / fr_gripper_keypoints
3D keypoints defined in the end-effector frame for articulated object manipulation planning.
Assume the gripper is oriented upright facing the user. The keypoints are defined as follows:
- **`tool_head `**: The point at the midpoint of the gripper fingertips, along the approach direction.
- **`tool_tail `**: The point at the midpoint of the gripper finger bases, along the same approach direction.
- **`tool_side `**: A point on the side of the right fingertip, used to indicate the gripper width.
![Gripper Keypoints Visualization](/InternDataEngine-Docs/gripper_kps.jpg)
*Gripper keypoints visualization showing tool_head, tool_tail, and tool_side *
Config Example:
yaml
```
# ARX Lift-2
fl_gripper_keypoints:
tool_head: [0.135, 0.0, 0.0, 1] # Gripper fingertip (approach direction)
tool_tail: [0.085, 0.0, 0.0, 1] # Gripper base
tool_side: [0.135, -0.044, 0.0, 1] # Side point for width visualization
```
1
2
3
4
5
### Other Fields [](#other-fields)
- **target_class **( str ): Python class for robot wrapper (e.g., `Lift2 `, `FR3 `).
- **path **( str ): USD file path relative to asset root.
- **robot_file **( list ): CuRobo kinematics config file(s) - one per arm.
- **gripper_max_width **( float ): Maximum gripper opening width (meters).
- **gripper_min_width **( float ): Minimum gripper closing width (meters).
- **solver_position_iteration_count **( int ): Physics solver position iterations.
- **solver_velocity_iteration_count **( int ): Physics solver velocity iterations.
- **stabilization_threshold **( float ): Physics stabilization threshold.
- **left_joint_indices **( list ): Joint indices for left arm in articulation.
- **right_joint_indices **( list ): Joint indices for right arm in articulation.
- **left_gripper_indices **( list ): Gripper joint index for left arm.
- **right_gripper_indices **( list ): Gripper joint index for right arm.
- **lift_indices **( list ): Lift joint indices (for robots with lift mechanism).
- **fl_base_path **( str ): Left arm base prim path.
- **fr_base_path **( str ): Right arm base prim path.
- **fl_filter_paths **( list ): Collision filter prims' paths for left arm.
- **fr_filter_paths **( list ): Collision filter prims' paths for right arm.
- **fl_forbid_collision_paths **( list ): Forbidden collision prims' paths for left arm.
- **fr_forbid_collision_paths **( list ): Forbidden collision prims' paths for right arm.
- **left_joint_home **( list ): Default joint positions for left arm (radians).
- **right_joint_home **( list ): Default joint positions for right arm (radians).
- **left_joint_home_std **( list ): Standard deviation for randomizing left arm home position.
- **right_joint_home_std **( list ): Standard deviation for randomizing right arm home position.
- **left_gripper_home **( list ): Default gripper joint value for left gripper (Isaac).
- **right_gripper_home **( list ): Default gripper joint value for right gripper (Isaac).
- **lift_home **( list ): Default lift joint position (meters).
## Robot Wrappers [](#robot-wrappers)
Robot wrappers ( `workflows/simbox/core/robots/ `) provide a unified interface for:
- Articulation control
- Gripper interface
- State / observation management
- Grasp pose post-processing
All concrete robots (e.g., `FR3 `, `FrankaRobotiq `, `Genie1 `, `Lift2 `, `SplitAloha `) share the same `TemplateRobot `implementation, with differences configured through YAML files and minimal subclass code.
Template Code Example:
python
```
from copy import deepcopy
import numpy as np
from core.robots.base_robot import register_robot
from omni.isaac.core.robots.robot import Robot
from omni.isaac.core.utils.prims import create_prim, get_prim_at_path
from omni.isaac.core.utils.transformations import (
get_relative_transform,
tf_matrix_from_pose,
)
from scipy.interpolate import interp1d
@register_robot
class TemplateRobot(Robot):
"""
Template class for manipulator robots.
All important parameters should be prepared in cfg before instantiation.
The cfg is merged from: robot_config_file -> task_config_robots
"""
def __init__(self, asset_root: str, root_prim_path: str, cfg: dict, *args, **kwargs):
self.asset_root = asset_root
self.cfg = cfg
# Create prim from USD
usd_path = f"{asset_root}/{cfg['path']}"
prim_path = f"{root_prim_path}/{cfg['name']}"
create_prim(usd_path=usd_path, prim_path=prim_path)
super().__init__(prim_path, cfg["name"], *args, **kwargs)
self.robot_prim_path = prim_path
# Gripper parameters (from cfg)
self.gripper_max_width = cfg["gripper_max_width"]
self.gripper_min_width = cfg["gripper_min_width"]
# Solver parameters
self.set_solver_position_iteration_count(cfg["solver_position_iteration_count"])
self.set_stabilization_threshold(cfg["stabilization_threshold"])
self.set_solver_velocity_iteration_count(cfg["solver_velocity_iteration_count"])
# Setup from config
self._setup_joint_indices()
self._setup_paths()
self._setup_gripper_keypoints()
self._setup_collision_paths()
self._load_extra_depth(usd_path)
def initialize(self, *args, **kwargs):
super().initialize()
self._articulation_view.initialize()
self._setup_joint_velocities()
self._setup_joint_homes()
self._set_initial_positions()
def apply_action(self, joint_positions, joint_indices, *args, **kwargs):
self._articulation_view.set_joint_position_targets(joint_positions, joint_indices=joint_indices)
def get_observations(self) -> dict:
joint_state = self.get_joints_state()
qpos, qvel = joint_state.positions, joint_state.velocities
T_base_ee_fl = get_relative_transform(
get_prim_at_path(self.fl_ee_path), get_prim_at_path(self.fl_base_path)
)
T_world_base = tf_matrix_from_pose(*self.get_local_pose())
obs = self._build_observations(qpos, qvel, T_base_ee_fl, T_world_base)
return obs
def pose_post_process_fn(
self, poses, *args, lr_arm="left", grasp_scale=1, tcp_offset=None, constraints=None, **kwargs
):
if poses.shape[-2:] == (4, 4):
return poses
R_ee_graspnet = self._get_R_ee_graspnet()
n_grasps = poses.shape[0]
T_obj_tcp = np.repeat(np.eye(4)[np.newaxis, :, :], n_grasps, axis=0)
R_ee_graspnet = np.array(R_ee_graspnet)
T_obj_tcp[:, :3, :3] = np.matmul(poses[:, 4:13].reshape(-1, 3, 3), R_ee_graspnet.T)
T_obj_tcp[:, :3, 3] = poses[:, 13:16] * grasp_scale
scores = poses[:, 0]
widths = np.clip(poses[:, 1:2], self.gripper_min_width, self.gripper_max_width)
depths = poses[:, 3:4]
if tcp_offset is None:
tcp_offset = self.tcp_offset
if self._gripper_ed_func is not None:
depths = depths + self._gripper_ed_func(widths)
# ... see full implementation in workflows/simbox/core/robots/template_robot.py
```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
__init__(self, asset_root: str , root_prim_path: str , cfg: dict , *args, **kwargs)
Create a robot instance from USD and initialize all geometry and dynamics-related paths/parameters based on configuration.
Parameters:
- **asset_root **( str ): Root directory for robot assets.
- **root_prim_path **( str ): Root prim path where the robot is mounted in the USD stage.
- **cfg **( dict ): Merged robot configuration (from robot YAML + task YAML).
- ***args, **kwargs **: Passed to the `Robot `base class.
initialize(self, *args, **kwargs)
Perform one-time initialization after the physics engine is ready, including joint velocity limits, home poses, and initial joint positions.
apply_action(self, joint_positions, joint_indices, *args, **kwargs)
Send joint position targets to the Isaac articulation. This is one of the main interfaces between upper-level controllers/policies and the robot.
Parameters:
- **joint_positions **( np.ndarray ): Target joint positions.
- **joint_indices **( np.ndarray ): Joint indices to control.
- ***args, **kwargs **: Reserved for future extensions (e.g., velocity, torque control).
get_observations(self)
Collect the robot's current state for use as observation input by upper-level policies/planning modules.
Returns:
- **dict **: Observation dictionary for policy/planning.
pose_post_process_fn(self, poses, *args, lr_arm="left", grasp_scale=1, tcp_offset=None, constraints=None, **kwargs)
Convert grasp poses from a graspnet-style annotations (e.g., `(score, width, depth, R, t) `) to robot-specific end-effector pose sets, including TCP offset, grasp width clipping, and optional spatial constraints.
Core Logic:
- If `poses `is already a transformation matrix of shape `(N, 4, 4) `, return directly.
- Use `R_ee_graspnet = self._get_R_ee_graspnet() `to correct the grasp rotation and construct `T_obj_tcp `:
- Rotation: `R_tcp = poses_rot @ R_ee_graspnet^T `.
- Translation: `t_tcp = poses[:, 13:16] * grasp_scale `.
- Clip grasp widths `widths `to `[gripper_min_width, gripper_max_width] `, and optionally correct insertion depth `depths `using `_gripper_ed_func `.
- Use `tcp_offset `and the end-effector axis (determined by `ee_axis `) to transform TCP to actual EE transform `T_obj_ee `.
- If `constraints = [axis, min_ratio, max_ratio] `is provided, filter grasps along the specified axis, keeping only those within the given range.
- Call `_apply_rotation_variant `to generate a 180° rotation variant around the grasp axis, returning two sets of candidate grasp poses and their scores.
Parameters:
- **poses **( np.ndarray ): Raw grasp annotations, typically shape `(N, 16) `.
- **lr_arm **( str ): Left/right arm marker ("left" or "right", currently mainly used by upper layers).
- **grasp_scale **( float ): Grasp position scale factor.
- **tcp_offset **( float , optional): TCP offset relative to EE; defaults to `self.tcp_offset `from config.
- **constraints **( list , optional): Spatial filtering constraint of the form `[axis, min_ratio, max_ratio] `.
- ***args, **kwargs **: Reserved for extensions.
Returns:
- **tuple **:
- First item: `np.ndarray `of EE poses, shape approximately `(2N_filtered, 4, 4) `.
- Second item: `np.ndarray `of corresponding scores, shape approximately `(2N_filtered,) `.
## References [](#references)
- [Isaac Sim Robot Manipulators Documentation](https://docs.isaacsim.omniverse.nvidia.com/5.1.0/py/source/extensions/isaacsim.robot.manipulators/docs/index.html)