2026-04-03 15:24:41 +08:00
2026-03-17 02:58:03 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00
2026-03-16 11:44:10 +00:00

InternDataEngine

High-Fidelity Synthetic Data Generator for Robotic Manipulation

Paper InternData-A1 Paper Nimbus Paper InternVLA-M1 Data InternData-A1 Data InternData-M1 Docs

About

InternDataEngine Overview

InternDataEngine is a synthetic data generation engine for embodied AI, built on NVIDIA Isaac Sim. It unifies high-fidelity physical interaction (InternData-A1), semantic task and scene generation (InternData-M1), and high-throughput scheduling (Nimbus) to deliver realistic, task-aligned, and scalable robotic manipulation data.

Key capabilities:

  • Realistic physical interaction -- Unified simulation of rigid, articulated, deformable, and fluid objects across single-arm, dual-arm, and humanoid robots. Supports long-horizon, skill-composed manipulation for sim-to-real transfer.
  • Diverse data generation -- Multi-dimensional domain randomization (layout, texture, structure, lighting) with rich multimodal annotations (bounding boxes, segmentation masks, keypoints).
  • Efficient large-scale production -- Nimbus-powered asynchronous pipelines that decouple planning, rendering, and storage, achieving 2-3x end-to-end throughput with cluster-level load balancing and fault tolerance.

Prerequisites

Dependency Version
NVIDIA Isaac Sim 5.0.0 (Kit 107.x)
CUDA Toolkit >= 12.8
Python 3.10
GPU NVIDIA RTX (tested on RTX PRO 6000 Blackwell)

For detailed environment setup (conda, CUDA, PyTorch, curobo), see install.md.

If migrating from Isaac Sim 4.5.0, see migrate/migrate.md for known issues and fixes.

Quick Start

1. Install

# Create conda environment
conda create -n banana500 python=3.10
conda activate banana500

# Install CUDA 12.8 and set up Isaac Sim 5.0.0
conda install -y cuda-toolkit=12.8
source ~/isaacsim500/setup_conda_env.sh
export CUDA_HOME="$CONDA_PREFIX"

# Install PyTorch (CUDA 12.8)
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128

# Install project dependencies
pip install -r requirements.txt

# Install curobo (motion planning)
cd workflows/simbox/curobo
export TORCH_CUDA_ARCH_LIST="12.0+PTX"  # Set to your GPU's compute capability
pip install -e .[isaacsim] --no-build-isolation
cd ../../..

See install.md for the full step-by-step guide including troubleshooting.

2. Run Data Generation

# Full pipeline: plan trajectories + render + save
python launcher.py --config configs/simbox/de_plan_and_render_template.yaml

Output is saved to output/simbox_plan_and_render/ including:

  • demo.mp4 -- rendered video from robot cameras
  • LMDB data files for model training

3. Available Pipeline Configs

Config Description
de_plan_and_render_template.yaml Full pipeline: plan + render + save
de_plan_template.yaml Plan trajectories only (no rendering)
de_render_template.yaml Render from existing plans
de_plan_with_render_template.yaml Plan with live rendering preview
de_pipe_template.yaml Pipelined mode for throughput

4. Configuration

The main config file (configs/simbox/de_plan_and_render_template.yaml) controls:

simulator:
  headless: True              # Set False for GUI debugging
  renderer: "RayTracedLighting"  # Or "PathTracing" for higher quality
  physics_dt: 1/30
  rendering_dt: 1/30

Task configs are in workflows/simbox/core/configs/tasks/. The example task (sort_the_rubbish) demonstrates a dual-arm pick-and-place scenario.

Project Structure

InternDataEngine/
  configs/simbox/             # Pipeline configuration files
  launcher.py                 # Main entry point
  nimbus_extension/           # Nimbus framework components
  workflows/simbox/
    core/
      configs/                # Task, robot, arena, camera configs
      controllers/            # Motion planning (curobo integration)
      skills/                 # Manipulation skills (pick, place, etc.)
      tasks/                  # Task definitions
    example_assets/           # Example USD assets (robots, objects, tables)
    curobo/                   # GPU-accelerated motion planning library
  migrate/                   # Migration tools and documentation
  output/                     # Generated data output

Documentation

License and Citation

This project is based on InternDataEngine by InternRobotics, licensed under CC BY-NC-SA 4.0.

If this project helps your research, please cite the following papers:

@article{tian2025interndata,
  title={Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy},
  author={Tian, Yang and Yang, Yuyin and Xie, Yiman and Cai, Zetao and Shi, Xu and Gao, Ning and Liu, Hangxu and Jiang, Xuekun and Qiu, Zherui and Yuan, Feng and others},
  journal={arXiv preprint arXiv:2511.16651},
  year={2025}
}

@article{he2026nimbus,
  title={Nimbus: A Unified Embodied Synthetic Data Generation Framework},
  author={He, Zeyu and Zhang, Yuchang and Zhou, Yuanzhen and Tao, Miao and Li, Hengjie and Tian, Yang and Zeng, Jia and Wang, Tai and Cai, Wenzhe and Chen, Yilun and others},
  journal={arXiv preprint arXiv:2601.21449},
  year={2026}
}

@article{chen2025internvla,
  title={Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy},
  author={Chen, Xinyi and Chen, Yilun and Fu, Yanwei and Gao, Ning and Jia, Jiaya and Jin, Weiyang and Li, Hao and Mu, Yao and Pang, Jiangmiao and Qiao, Yu and others},
  journal={arXiv preprint arXiv:2510.13778},
  year={2025}
}
Description
No description provided
Readme 177 MiB
Languages
Python 98.1%
Shell 1.1%
Dockerfile 0.4%
Jupyter Notebook 0.4%