lerobot/source at 8bd0aec618413a760005c7ebbc09f1b5c9169b0e - lerobot - Git of MAIC

tangger/lerobot

Files

History

Pepijn e82e7a02e9 feat(train): add accelerate for multi gpu training (#2154 )

* Enhance training and logging functionality with accelerator support

- Added support for multi-GPU training by introducing an `accelerator` parameter in training functions.
- Updated `update_policy` to handle gradient updates based on the presence of an accelerator.
- Modified logging to prevent duplicate messages in non-main processes.
- Enhanced `set_seed` and `get_safe_torch_device` functions to accommodate accelerator usage.
- Updated `MetricsTracker` to account for the number of processes when calculating metrics.
- Introduced a new feature in `pyproject.toml` for the `accelerate` library dependency.

* Initialize logging in training script for both main and non-main processes

- Added `init_logging` calls to ensure proper logging setup when using the accelerator and in standard training mode.
- This change enhances the clarity and consistency of logging during training sessions.

* add docs and only push model once

* Place  logging under accelerate and update docs

* fix pre commit

* only log in main process

* main logging

* try with local rank

* add tests

* change runner

* fix test

* dont push to hub in multi gpu tests

* pre download dataset in tests

* small fixes

* fix path optimizer state

* update docs, and small improvements in train

* simplify accelerate main process detection

* small improvements in train

* fix OOM bug

* change accelerate detection

* add some debugging

* always use accelerate

* cleanup update method

* cleanup

* fix bug

* scale lr decay if we reduce steps

* cleanup logging

* fix formatting

* encorperate feedback pr

* add min memory to cpu tests

* use accelerate to determin logging

* fix precommit and fix tests

* chore: minor details

---------

Co-authored-by: AdilZouitine <adilzouitinegm@gmail.com>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>

2025-10-16 17:41:55 +02:00

..

_toctree.yml

feat(train): add accelerate for multi gpu training (#2154 )

2025-10-16 17:41:55 +02:00

act.mdx

Add act documentation (#2139 )

2025-10-08 20:07:14 +02:00

async.mdx

fix(async): Add pre and post processing to async inference and update docs (#2132 )

2025-10-07 15:10:31 +02:00

backwardcomp.mdx

feat(processors): use pipelines across the codebase (#1452 )

2025-09-18 15:25:26 +02:00

cameras.mdx

chore(docs): prioritize use of entry points in docs + fix nightly badge (#1692 )

2025-08-07 14:25:44 +02:00

contributing.md

Hardware API redesign (#777 )

2025-06-05 17:48:43 +02:00

debug_processor_pipeline.mdx

feat(processors): use pipelines across the codebase (#1452 )

2025-09-18 15:25:26 +02:00

feetech.mdx

Add feetech firmware update docs (#1793 )

2025-08-28 11:18:54 +02:00

hilserl_sim.mdx

chore(rl): move rl related code to its directory at top level (#2002 )

2025-09-23 16:32:34 +02:00

hilserl.mdx

chore: remove unused code (#2062 )

2025-09-29 10:49:36 +02:00

hope_jr.mdx

chore(docs): prioritize use of entry points in docs + fix nightly badge (#1692 )

2025-08-07 14:25:44 +02:00

il_robots.mdx

fix outdated example in docs (#2182 )

2025-10-13 16:43:23 +02:00

il_sim.mdx

chore(rl): move rl related code to its directory at top level (#2002 )

2025-09-23 16:32:34 +02:00

implement_your_own_processor.mdx

feat(processors): use pipelines across the codebase (#1452 )

2025-09-18 15:25:26 +02:00

index.mdx

Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520 )

2025-07-17 14:30:20 +02:00

installation.mdx

feat(sim): add metaworld env (#2088 )

2025-10-14 17:21:18 +02:00

integrate_hardware.mdx

fic(docs): local docs links (#2149 )

2025-10-09 15:20:07 +02:00

introduction_processors.mdx

fic(docs): local docs links (#2149 )

2025-10-09 15:20:07 +02:00

koch.mdx

fix(docs): update outdated links (#2026 )

2025-09-24 16:17:39 +02:00

lekiwi.mdx

fix(docs): update outdated links (#2026 )

2025-09-24 16:17:39 +02:00

lerobot-dataset-v3.mdx

Add missing finalize calls in example (#2175 )

2025-10-11 21:15:43 +02:00

libero.mdx

feat(sim): add metaworld env (#2088 )

2025-10-14 17:21:18 +02:00

metaworld.mdx

feat(sim): add metaworld env (#2088 )

2025-10-14 17:21:18 +02:00

multi_gpu_training.mdx

feat(train): add accelerate for multi gpu training (#2154 )

2025-10-16 17:41:55 +02:00

notebooks.mdx

Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520 )

2025-07-17 14:30:20 +02:00

phone_teleop.mdx

fic(docs): local docs links (#2149 )

2025-10-09 15:20:07 +02:00

pi0.mdx

Improve docs pi (#2110 )

2025-10-03 12:06:18 +02:00

pi05.mdx

Improve docs pi (#2110 )

2025-10-03 12:06:18 +02:00

policy_act_README.md

Update readme (#1570 )

2025-08-01 17:39:39 +02:00

policy_diffusion_README.md

Update readme (#1570 )

2025-08-01 17:39:39 +02:00

policy_smolvla_README.md

Update readme (#1570 )

2025-08-01 17:39:39 +02:00

policy_tdmpc_README.md

Update readme (#1570 )

2025-08-01 17:39:39 +02:00

policy_vqbet_README.md

Update readme (#1570 )

2025-08-01 17:39:39 +02:00

porting_datasets_v3.mdx

Dataset v3 (#1412 )

2025-09-15 09:53:30 +02:00

processors_robots_teleop.mdx

chore: remove unused code (#2062 )

2025-09-29 10:49:36 +02:00

reachy2.mdx

2 add reachy 2 to updated lerobot (#1767 )

2025-09-05 11:03:14 +02:00

smolvla.mdx

Add OpenPi, Pi0 and Pi0.5 (#1910 )

2025-10-02 13:14:45 +02:00

so100.mdx

fix(docs): update outdated links (#2026 )

2025-09-24 16:17:39 +02:00

so101.mdx

fix(docs): update outdated links (#2026 )

2025-09-24 16:17:39 +02:00

using_dataset_tools.mdx

Dataset tools (#2100 )

2025-10-10 12:32:07 +02:00