164 lines
6.8 KiB
Markdown
164 lines
6.8 KiB
Markdown
<div align="center">
|
|
|
|
# InternDataEngine
|
|
|
|
**High-Fidelity Synthetic Data Generator for Robotic Manipulation**
|
|
|
|
</div>
|
|
|
|
<div align="center">
|
|
|
|
[](https://arxiv.org/abs/2511.16651)
|
|
[](https://arxiv.org/abs/2601.21449)
|
|
[](https://arxiv.org/abs/2510.13778)
|
|
[](https://huggingface.co/datasets/InternRobotics/InternData-A1)
|
|
[](https://huggingface.co/datasets/InternRobotics/InternData-M1)
|
|
[](https://internrobotics.github.io/InternDataEngine-Docs/)
|
|
|
|
</div>
|
|
|
|
## About
|
|
|
|
<div align="center">
|
|
<img src="./docs/images/intern_data_engine.jpeg" alt="InternDataEngine Overview" width="80%">
|
|
</div>
|
|
|
|
InternDataEngine is a synthetic data generation engine for embodied AI, built on NVIDIA Isaac Sim. It unifies high-fidelity physical interaction (InternData-A1), semantic task and scene generation (InternData-M1), and high-throughput scheduling (Nimbus) to deliver realistic, task-aligned, and scalable robotic manipulation data.
|
|
|
|
**Key capabilities:**
|
|
|
|
- **Realistic physical interaction** -- Unified simulation of rigid, articulated, deformable, and fluid objects across single-arm, dual-arm, and humanoid robots. Supports long-horizon, skill-composed manipulation for sim-to-real transfer.
|
|
- **Diverse data generation** -- Multi-dimensional domain randomization (layout, texture, structure, lighting) with rich multimodal annotations (bounding boxes, segmentation masks, keypoints).
|
|
- **Efficient large-scale production** -- Nimbus-powered asynchronous pipelines that decouple planning, rendering, and storage, achieving 2-3x end-to-end throughput with cluster-level load balancing and fault tolerance.
|
|
|
|
## Prerequisites
|
|
|
|
| Dependency | Version |
|
|
|------------|---------|
|
|
| NVIDIA Isaac Sim | 5.0.0 (Kit 107.x) |
|
|
| CUDA Toolkit | >= 12.8 |
|
|
| Python | 3.10 |
|
|
| GPU | NVIDIA RTX (tested on RTX PRO 6000 Blackwell) |
|
|
|
|
> For detailed environment setup (conda, CUDA, PyTorch, curobo), see [install.md](install.md).
|
|
>
|
|
> If migrating from Isaac Sim 4.5.0, see [migrate/migrate.md](migrate/migrate.md) for known issues and fixes.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install
|
|
|
|
```bash
|
|
# Create conda environment
|
|
conda create -n banana500 python=3.10
|
|
conda activate banana500
|
|
|
|
# Install CUDA 12.8 and set up Isaac Sim 5.0.0
|
|
conda install -y cuda-toolkit=12.8
|
|
source ~/isaacsim500/setup_conda_env.sh
|
|
export CUDA_HOME="$CONDA_PREFIX"
|
|
|
|
# Install PyTorch (CUDA 12.8)
|
|
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
|
|
|
|
# Install project dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Install curobo (motion planning)
|
|
cd workflows/simbox/curobo
|
|
export TORCH_CUDA_ARCH_LIST="12.0+PTX" # Set to your GPU's compute capability
|
|
pip install -e .[isaacsim] --no-build-isolation
|
|
cd ../../..
|
|
```
|
|
|
|
See [install.md](install.md) for the full step-by-step guide including troubleshooting.
|
|
|
|
### 2. Run Data Generation
|
|
|
|
```bash
|
|
# Full pipeline: plan trajectories + render + save
|
|
python launcher.py --config configs/simbox/de_plan_and_render_template.yaml
|
|
```
|
|
|
|
Output is saved to `output/simbox_plan_and_render/` including:
|
|
- `demo.mp4` -- rendered video from robot cameras
|
|
- LMDB data files for model training
|
|
|
|
### 3. Available Pipeline Configs
|
|
|
|
| Config | Description |
|
|
|--------|-------------|
|
|
| `de_plan_and_render_template.yaml` | Full pipeline: plan + render + save |
|
|
| `de_plan_template.yaml` | Plan trajectories only (no rendering) |
|
|
| `de_render_template.yaml` | Render from existing plans |
|
|
| `de_plan_with_render_template.yaml` | Plan with live rendering preview |
|
|
| `de_pipe_template.yaml` | Pipelined mode for throughput |
|
|
|
|
### 4. Configuration
|
|
|
|
The main config file (`configs/simbox/de_plan_and_render_template.yaml`) controls:
|
|
|
|
```yaml
|
|
simulator:
|
|
headless: True # Set False for GUI debugging
|
|
renderer: "RayTracedLighting" # Or "PathTracing" for higher quality
|
|
physics_dt: 1/30
|
|
rendering_dt: 1/30
|
|
```
|
|
|
|
Task configs are in `workflows/simbox/core/configs/tasks/`. The example task (`sort_the_rubbish`) demonstrates a dual-arm pick-and-place scenario.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
InternDataEngine/
|
|
configs/simbox/ # Pipeline configuration files
|
|
launcher.py # Main entry point
|
|
nimbus_extension/ # Nimbus framework components
|
|
workflows/simbox/
|
|
core/
|
|
configs/ # Task, robot, arena, camera configs
|
|
controllers/ # Motion planning (curobo integration)
|
|
skills/ # Manipulation skills (pick, place, etc.)
|
|
tasks/ # Task definitions
|
|
example_assets/ # Example USD assets (robots, objects, tables)
|
|
curobo/ # GPU-accelerated motion planning library
|
|
migrate/ # Migration tools and documentation
|
|
output/ # Generated data output
|
|
```
|
|
|
|
## Documentation
|
|
|
|
- [Installation Guide](install.md) -- Environment setup and dependency installation
|
|
- [Migration Guide](migrate/migrate.md) -- Isaac Sim 4.5.0 to 5.0.0 migration notes and tools
|
|
- [Online Documentation](https://internrobotics.github.io/InternDataEngine-Docs/) -- Full API docs, tutorials, and advanced usage
|
|
|
|
## License and Citation
|
|
|
|
This project is based on [InternDataEngine](https://github.com/InternRobotics/InternDataEngine) by InternRobotics, licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).
|
|
|
|
If this project helps your research, please cite the following papers:
|
|
|
|
```BibTeX
|
|
@article{tian2025interndata,
|
|
title={Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy},
|
|
author={Tian, Yang and Yang, Yuyin and Xie, Yiman and Cai, Zetao and Shi, Xu and Gao, Ning and Liu, Hangxu and Jiang, Xuekun and Qiu, Zherui and Yuan, Feng and others},
|
|
journal={arXiv preprint arXiv:2511.16651},
|
|
year={2025}
|
|
}
|
|
|
|
@article{he2026nimbus,
|
|
title={Nimbus: A Unified Embodied Synthetic Data Generation Framework},
|
|
author={He, Zeyu and Zhang, Yuchang and Zhou, Yuanzhen and Tao, Miao and Li, Hengjie and Tian, Yang and Zeng, Jia and Wang, Tai and Cai, Wenzhe and Chen, Yilun and others},
|
|
journal={arXiv preprint arXiv:2601.21449},
|
|
year={2026}
|
|
}
|
|
|
|
@article{chen2025internvla,
|
|
title={Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy},
|
|
author={Chen, Xinyi and Chen, Yilun and Fu, Yanwei and Gao, Ning and Jia, Jiaya and Jin, Weiyang and Li, Hao and Mu, Yao and Pang, Jiangmiao and Qiao, Yu and others},
|
|
journal={arXiv preprint arXiv:2510.13778},
|
|
year={2025}
|
|
}
|
|
```
|