Files
issacdataengine/README.md
Tangger 6b78ba0d6f docs: rewrite README with quick start, project structure, and migration refs
- Rewrite README with clear prerequisites, quick start guide, and project structure
- Add references to install.md and migerate/migerate.md
- Add pipeline config table and configuration examples
- Preserve original license (CC BY-NC-SA 4.0) and paper citations
- Update remote: matai as origin, merge 5.0.0 into master

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:17:43 +08:00

164 lines
6.8 KiB
Markdown

<div align="center">
# InternDataEngine
**High-Fidelity Synthetic Data Generator for Robotic Manipulation**
</div>
<div align="center">
[![Paper InternData-A1](https://img.shields.io/badge/Paper-InternData--A1-red.svg)](https://arxiv.org/abs/2511.16651)
[![Paper Nimbus](https://img.shields.io/badge/Paper-Nimbus-red.svg)](https://arxiv.org/abs/2601.21449)
[![Paper InternVLA-M1](https://img.shields.io/badge/Paper-InternVLA--M1-red.svg)](https://arxiv.org/abs/2510.13778)
[![Data InternData-A1](https://img.shields.io/badge/Data-InternData--A1-blue?logo=huggingface)](https://huggingface.co/datasets/InternRobotics/InternData-A1)
[![Data InternData-M1](https://img.shields.io/badge/Data-InternData--M1-blue?logo=huggingface)](https://huggingface.co/datasets/InternRobotics/InternData-M1)
[![Docs](https://img.shields.io/badge/Docs-Online-green.svg)](https://internrobotics.github.io/InternDataEngine-Docs/)
</div>
## About
<div align="center">
<img src="./docs/images/intern_data_engine.jpeg" alt="InternDataEngine Overview" width="80%">
</div>
InternDataEngine is a synthetic data generation engine for embodied AI, built on NVIDIA Isaac Sim. It unifies high-fidelity physical interaction (InternData-A1), semantic task and scene generation (InternData-M1), and high-throughput scheduling (Nimbus) to deliver realistic, task-aligned, and scalable robotic manipulation data.
**Key capabilities:**
- **Realistic physical interaction** -- Unified simulation of rigid, articulated, deformable, and fluid objects across single-arm, dual-arm, and humanoid robots. Supports long-horizon, skill-composed manipulation for sim-to-real transfer.
- **Diverse data generation** -- Multi-dimensional domain randomization (layout, texture, structure, lighting) with rich multimodal annotations (bounding boxes, segmentation masks, keypoints).
- **Efficient large-scale production** -- Nimbus-powered asynchronous pipelines that decouple planning, rendering, and storage, achieving 2-3x end-to-end throughput with cluster-level load balancing and fault tolerance.
## Prerequisites
| Dependency | Version |
|------------|---------|
| NVIDIA Isaac Sim | 5.0.0 (Kit 107.x) |
| CUDA Toolkit | >= 12.8 |
| Python | 3.10 |
| GPU | NVIDIA RTX (tested on RTX PRO 6000 Blackwell) |
> For detailed environment setup (conda, CUDA, PyTorch, curobo), see [install.md](install.md).
>
> If migrating from Isaac Sim 4.5.0, see [migerate/migerate.md](migerate/migerate.md) for known issues and fixes.
## Quick Start
### 1. Install
```bash
# Create conda environment
conda create -n banana500 python=3.10
conda activate banana500
# Install CUDA 12.8 and set up Isaac Sim 5.0.0
conda install -y cuda-toolkit=12.8
source ~/isaacsim500/setup_conda_env.sh
export CUDA_HOME="$CONDA_PREFIX"
# Install PyTorch (CUDA 12.8)
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
# Install project dependencies
pip install -r requirements.txt
# Install curobo (motion planning)
cd workflows/simbox/curobo
export TORCH_CUDA_ARCH_LIST="12.0+PTX" # Set to your GPU's compute capability
pip install -e .[isaacsim] --no-build-isolation
cd ../../..
```
See [install.md](install.md) for the full step-by-step guide including troubleshooting.
### 2. Run Data Generation
```bash
# Full pipeline: plan trajectories + render + save
python launcher.py --config configs/simbox/de_plan_and_render_template.yaml
```
Output is saved to `output/simbox_plan_and_render/` including:
- `demo.mp4` -- rendered video from robot cameras
- LMDB data files for model training
### 3. Available Pipeline Configs
| Config | Description |
|--------|-------------|
| `de_plan_and_render_template.yaml` | Full pipeline: plan + render + save |
| `de_plan_template.yaml` | Plan trajectories only (no rendering) |
| `de_render_template.yaml` | Render from existing plans |
| `de_plan_with_render_template.yaml` | Plan with live rendering preview |
| `de_pipe_template.yaml` | Pipelined mode for throughput |
### 4. Configuration
The main config file (`configs/simbox/de_plan_and_render_template.yaml`) controls:
```yaml
simulator:
headless: True # Set False for GUI debugging
renderer: "RayTracedLighting" # Or "PathTracing" for higher quality
physics_dt: 1/30
rendering_dt: 1/30
```
Task configs are in `workflows/simbox/core/configs/tasks/`. The example task (`sort_the_rubbish`) demonstrates a dual-arm pick-and-place scenario.
## Project Structure
```
InternDataEngine/
configs/simbox/ # Pipeline configuration files
launcher.py # Main entry point
nimbus_extension/ # Nimbus framework components
workflows/simbox/
core/
configs/ # Task, robot, arena, camera configs
controllers/ # Motion planning (curobo integration)
skills/ # Manipulation skills (pick, place, etc.)
tasks/ # Task definitions
example_assets/ # Example USD assets (robots, objects, tables)
curobo/ # GPU-accelerated motion planning library
migerate/ # Migration tools and documentation
output/ # Generated data output
```
## Documentation
- [Installation Guide](install.md) -- Environment setup and dependency installation
- [Migration Guide](migerate/migerate.md) -- Isaac Sim 4.5.0 to 5.0.0 migration notes and tools
- [Online Documentation](https://internrobotics.github.io/InternDataEngine-Docs/) -- Full API docs, tutorials, and advanced usage
## License and Citation
This project is based on [InternDataEngine](https://github.com/InternRobotics/InternDataEngine) by InternRobotics, licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).
If this project helps your research, please cite the following papers:
```BibTeX
@article{tian2025interndata,
title={Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy},
author={Tian, Yang and Yang, Yuyin and Xie, Yiman and Cai, Zetao and Shi, Xu and Gao, Ning and Liu, Hangxu and Jiang, Xuekun and Qiu, Zherui and Yuan, Feng and others},
journal={arXiv preprint arXiv:2511.16651},
year={2025}
}
@article{he2026nimbus,
title={Nimbus: A Unified Embodied Synthetic Data Generation Framework},
author={He, Zeyu and Zhang, Yuchang and Zhou, Yuanzhen and Tao, Miao and Li, Hengjie and Tian, Yang and Zeng, Jia and Wang, Tai and Cai, Wenzhe and Chen, Yilun and others},
journal={arXiv preprint arXiv:2601.21449},
year={2026}
}
@article{chen2025internvla,
title={Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy},
author={Chen, Xinyi and Chen, Yilun and Fu, Yanwei and Gao, Ning and Jia, Jiaya and Jin, Weiyang and Li, Hao and Mu, Yao and Pang, Jiangmiao and Qiao, Yu and others},
journal={arXiv preprint arXiv:2510.13778},
year={2025}
}
```