Update pre-commit-config.yaml + pyproject.toml + ceil rerun & transformer dependencies version (#1520)

* chore: update .gitignore

* chore: update pre-commit

* chore(deps): update pyproject

* fix(ci): multiple fixes

* chore: pre-commit apply

* chore: address review comments

* Update pyproject.toml

Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(deps): add todo

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Ben Zhang <5977478+ben-z@users.noreply.github.com>
This commit is contained in:
Steven Palma
2025-07-17 14:30:20 +02:00
committed by GitHub
parent 0938a1d816
commit 378e1f0338
78 changed files with 1450 additions and 636 deletions

View File

@@ -5,17 +5,18 @@ In this tutorial, we'll show how to use asynchronous inference (_async inference
**Try async inference with all the policies** supported by LeRobot!
**What you'll learn:**
1. Why asynchronous inference matters and how it compares to, more traditional, sequential inference.
2. How to spin-up a `PolicyServer` and connect a `RobotClient` from the same machine, and even over the network.
3. How to tune key parameters (`actions_per_chunk`, `chunk_size_threshold`) for your robot and policy.
If you get stuck, hop into our [Discord community](https://discord.gg/s3KuuzsPFb)!
In a nutshell: with *async inference*, your robot keeps acting while the policy server is already busy computing the next chunk of actions---eliminating "wait-for-inference" lags and unlocking smoother, more reactive behaviours.
In a nutshell: with _async inference_, your robot keeps acting while the policy server is already busy computing the next chunk of actions---eliminating "wait-for-inference" lags and unlocking smoother, more reactive behaviours.
This is fundamentally different from synchronous inference (sync), where the robot stays idle while the policy computes the next chunk of actions.
---
## Getting started with async inference
You can read more information on asynchronous inference in our [blogpost](https://huggingface.co/blog/async-robot-inference). This guide is designed to help you quickly set up and run asynchronous inference in your environment.
@@ -53,40 +54,53 @@ python src/lerobot/scripts/server/robot_client.py \
--aggregate_fn_name=weighted_average \ # CLIENT: the function to aggregate actions on overlapping portions
--debug_visualize_queue_size=True # CLIENT: whether to visualize the queue size at runtime
```
In summary, you need to specify instructions for:
- `SERVER`: the address and port of the policy server
- `ROBOT`: the type of robot to connect to, the port to connect to, and the local `id` of the robot
- `POLICY`: the type of policy to run, and the model name/path on server to the checkpoint to run. You also need to specify which device should the sever be using, and how many actions to output at once (capped at the policy max actions value).
- `CLIENT`: the threshold for the chunk size before sending a new observation to the server, and the function to aggregate actions on overlapping portions. Optionally, you can also visualize the queue size at runtime, to help you tune the `CLIENT` parameters.
Importantly,
- `actions_per_chunk` and `chunk_size_threshold` are key parameters to tune for your setup.
- `aggregate_fn_name` is the function to aggregate actions on overlapping portions. You can either add a new one to a registry of functions, or add your own in `robot_client.py` (see [here](NOTE:addlinktoLOC))
- `debug_visualize_queue_size` is a useful tool to tune the `CLIENT` parameters.
Done! You should see your robot moving around by now 😉
---
## Done! You should see your robot moving around by now 😉
## Async vs. synchronous inference
Synchronous inference relies on interleaving action chunk prediction and action execution. This inherently results in *idle frames*, frames where the robot awaits idle the policy's output: a new action chunk.
Synchronous inference relies on interleaving action chunk prediction and action execution. This inherently results in _idle frames_, frames where the robot awaits idle the policy's output: a new action chunk.
In turn, inference is plagued by evident real-time lags, where the robot simply stops acting due to the lack of available actions.
With robotics models increasing in size, this problem risks becoming only more severe.
<p align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/sync.png" width="80%"></img>
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/sync.png"
width="80%"
></img>
</p>
<p align="center">
<i>Synchronous inference</i> makes the robot idle while the policy is
computing the next chunk of actions.
</p>
<p align="center"><i>Synchronous inference</i> makes the robot idle while the policy is computing the next chunk of actions.</p>
To overcome this, we design async inference, a paradigm where action planning and execution are decoupled, resulting in (1) higher adaptability and, most importantly, (2) no idle frames.
Crucially, with async inference, the next action chunk is computed *before* the current one is exhausted, resulting in no idleness.
Crucially, with async inference, the next action chunk is computed _before_ the current one is exhausted, resulting in no idleness.
Higher adaptability is ensured by aggregating the different action chunks on overlapping portions, obtaining an up-to-date plan and a tighter control loop.
<p align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/async.png" width="80%"></img>
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/async.png"
width="80%"
></img>
</p>
<p align="center">
<i>Asynchronous inference</i> results in no idleness because the next chunk is
computed before the current chunk is exhausted.
</p>
<p align="center"><i>Asynchronous inference</i> results in no idleness because the next chunk is computed before the current chunk is exhausted.</p>
---
@@ -105,6 +119,8 @@ python -m lerobot.scripts.server.policy_server \
```
</hfoption>
<hfoption id="API example">
<!-- prettier-ignore-start -->
```python
from lerobot.scripts.server.configs import PolicyServerConfig
from lerobot.scripts.server.policy_server import serve
@@ -115,6 +131,8 @@ config = PolicyServerConfig(
)
serve(config)
```
<!-- prettier-ignore-end -->
</hfoption>
</hfoptions>
@@ -147,6 +165,8 @@ python src/lerobot/scripts/server/robot_client.py \
```
</hfoption>
<hfoption id="API example">
<!-- prettier-ignore-start -->
```python
import threading
from lerobot.robots.so100_follower import SO100FollowerConfig
@@ -201,6 +221,8 @@ if client.start():
# (Optionally) plot the action queue size
visualize_action_queue_size(client.action_queue_size)
```
<!-- prettier-ignore-end -->
</hfoption>
</hfoptions>
@@ -216,20 +238,30 @@ The following two parameters are key in every setup:
</thead>
<tbody>
<tr>
<td><code>actions_per_chunk</code></td>
<td>
<code>actions_per_chunk</code>
</td>
<td>50</td>
<td>How many actions the policy outputs at once. Typical values: 10-50.</td>
<td>
How many actions the policy outputs at once. Typical values: 10-50.
</td>
</tr>
<tr>
<td><code>chunk_size_threshold</code></td>
<td>
<code>chunk_size_threshold</code>
</td>
<td>0.7</td>
<td>When the queue is ≤ 50% full, the client sends a fresh observation. Value in [0, 1].</td>
<td>
When the queue is ≤ 50% full, the client sends a fresh observation.
Value in [0, 1].
</td>
</tr>
</tbody>
</table>
<Tip>
Different values of `actions_per_chunk` and `chunk_size_threshold` do result in different behaviours.
Different values of `actions_per_chunk` and `chunk_size_threshold` do result
in different behaviours.
</Tip>
On the one hand, increasing the value of `actions_per_chunk` will result in reducing the likelihood of ending up with no actions to execute, as more actions will be available when the new chunk is computed.
@@ -249,10 +281,18 @@ We found the default values of `actions_per_chunk` and `chunk_size_threshold` to
- We found values around 0.5-0.6 to work well. If you want to tweak this, spin up a `RobotClient` setting the `--debug-visualize-queue-size` to `True`. This will plot the action queue size evolution at runtime, and you can use it to find the value of `chunk_size_threshold` that works best for your setup.
<p align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/queues.png" width="80%"></img>
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/async-inference/queues.png"
width="80%"
></img>
</p>
<p align="center">
<i>
The action queue size is plotted at runtime when the
`--debug-visualize-queue-size` flag is passed, for various levels of
`chunk_size_threshold` (`g` in the SmolVLA paper).
</i>
</p>
<p align="center"><i>The action queue size is plotted at runtime when the `--debug-visualize-queue-size` flag is passed, for various levels of `chunk_size_threshold` (`g` in the SmolVLA paper).</i></p>
---