chore(rl): move rl related code to its directory at top level (#2002)

* chore(rl): move rl related code to its directory at top level

* chore(style): apply pre-commit to renamed headers

* test(rl): fix rl imports

* docs(rl): update rl headers doc
This commit is contained in:
Steven Palma
2025-09-23 16:32:34 +02:00
committed by GitHub
parent 9d0cf64da6
commit d6a32e9742
12 changed files with 44 additions and 41 deletions

View File

@@ -518,7 +518,7 @@ During the online training, press `space` to take over the policy and `space` ag
Start the recording process, an example of the config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json): Start the recording process, an example of the config file can be found [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/env_config_so100.json):
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config_so100.json python -m lerobot.rl.gym_manipulator --config_path src/lerobot/configs/env_config_so100.json
``` ```
During recording: During recording:
@@ -549,7 +549,7 @@ Note: If you already know the crop parameters, you can skip this step and just s
Use the `crop_dataset_roi.py` script to interactively select regions of interest in your camera images: Use the `crop_dataset_roi.py` script to interactively select regions of interest in your camera images:
```bash ```bash
python -m lerobot.scripts.rl.crop_dataset_roi --repo-id username/pick_lift_cube python -m lerobot.rl.crop_dataset_roi --repo-id username/pick_lift_cube
``` ```
1. For each camera view, the script will display the first frame 1. For each camera view, the script will display the first frame
@@ -618,7 +618,7 @@ Before training, you need to collect a dataset with labeled examples. The `recor
To collect a dataset, you need to modify some parameters in the environment configuration based on HILSerlRobotEnvConfig. To collect a dataset, you need to modify some parameters in the environment configuration based on HILSerlRobotEnvConfig.
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/reward_classifier_train_config.json python -m lerobot.rl.gym_manipulator --config_path src/lerobot/configs/reward_classifier_train_config.json
``` ```
**Key Parameters for Data Collection** **Key Parameters for Data Collection**
@@ -764,7 +764,7 @@ or set the argument in the json config file.
Run `gym_manipulator.py` to test the model. Run `gym_manipulator.py` to test the model.
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/env_config.json python -m lerobot.rl.gym_manipulator --config_path path/to/env_config.json
``` ```
The reward classifier will automatically provide rewards based on the visual input from the robot's cameras. The reward classifier will automatically provide rewards based on the visual input from the robot's cameras.
@@ -777,7 +777,7 @@ The reward classifier will automatically provide rewards based on the visual inp
2. **Collect a dataset**: 2. **Collect a dataset**:
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json python -m lerobot.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
``` ```
3. **Train the classifier**: 3. **Train the classifier**:
@@ -788,7 +788,7 @@ The reward classifier will automatically provide rewards based on the visual inp
4. **Test the classifier**: 4. **Test the classifier**:
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json python -m lerobot.rl.gym_manipulator --config_path src/lerobot/configs/env_config.json
``` ```
### Training with Actor-Learner ### Training with Actor-Learner
@@ -810,7 +810,7 @@ Create a training configuration file (example available [here](https://huggingfa
First, start the learner server process: First, start the learner server process:
```bash ```bash
python -m lerobot.scripts.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json python -m lerobot.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json
``` ```
The learner: The learner:
@@ -825,7 +825,7 @@ The learner:
In a separate terminal, start the actor process with the same configuration: In a separate terminal, start the actor process with the same configuration:
```bash ```bash
python -m lerobot.scripts.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json python -m lerobot.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json
``` ```
The actor: The actor:

View File

@@ -91,7 +91,7 @@ Important parameters:
To run the environment, set mode to null: To run the environment, set mode to null:
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json python -m lerobot.rl.gym_manipulator --config_path path/to/gym_hil_env.json
``` ```
### Recording a Dataset ### Recording a Dataset
@@ -118,7 +118,7 @@ To collect a dataset, set the mode to `record` whilst defining the repo_id and n
``` ```
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json python -m lerobot.rl.gym_manipulator --config_path path/to/gym_hil_env.json
``` ```
### Training a Policy ### Training a Policy
@@ -126,13 +126,13 @@ python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.j
To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/rl/gym_hil/train_config.json) and run the actor and learner servers: To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/lerobot/config_examples/resolve/main/rl/gym_hil/train_config.json) and run the actor and learner servers:
```bash ```bash
python -m lerobot.scripts.rl.actor --config_path path/to/train_gym_hil_env.json python -m lerobot.rl.actor --config_path path/to/train_gym_hil_env.json
``` ```
In a different terminal, run the learner server: In a different terminal, run the learner server:
```bash ```bash
python -m lerobot.scripts.rl.learner --config_path path/to/train_gym_hil_env.json python -m lerobot.rl.learner --config_path path/to/train_gym_hil_env.json
``` ```
The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots. The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.

View File

@@ -61,14 +61,14 @@ Then we can run this command to start:
<hfoption id="Linux"> <hfoption id="Linux">
```bash ```bash
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json python -m lerobot.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json
``` ```
</hfoption> </hfoption>
<hfoption id="MacOS"> <hfoption id="MacOS">
```bash ```bash
mjpython -m lerobot.scripts.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json mjpython -m lerobot.rl.gym_manipulator --config_path path/to/env_config_gym_hil_il.json
``` ```
</hfoption> </hfoption>
@@ -198,14 +198,14 @@ Then you can run this command to visualize your trained policy
<hfoption id="Linux"> <hfoption id="Linux">
```bash ```bash
python -m lerobot.scripts.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json python -m lerobot.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json
``` ```
</hfoption> </hfoption>
<hfoption id="MacOS"> <hfoption id="MacOS">
```bash ```bash
mjpython -m lerobot.scripts.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json mjpython -m lerobot.rl.eval_policy --config_path=path/to/eval_config_gym_hil.json
``` ```
</hfoption> </hfoption>

View File

@@ -24,7 +24,7 @@ Examples of usage:
- Start an actor server for real robot training with human-in-the-loop intervention: - Start an actor server for real robot training with human-in-the-loop intervention:
```bash ```bash
python -m lerobot.scripts.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json python -m lerobot.rl.actor --config_path src/lerobot/configs/train_config_hilserl_so100.json
``` ```
**NOTE**: The actor server requires a running learner server to connect to. Ensure the learner **NOTE**: The actor server requires a running learner server to connect to. Ensure the learner
@@ -64,12 +64,6 @@ from lerobot.policies.factory import make_policy
from lerobot.policies.sac.modeling_sac import SACPolicy from lerobot.policies.sac.modeling_sac import SACPolicy
from lerobot.processor import TransitionKey from lerobot.processor import TransitionKey
from lerobot.robots import so100_follower # noqa: F401 from lerobot.robots import so100_follower # noqa: F401
from lerobot.scripts.rl.gym_manipulator import (
create_transition,
make_processors,
make_robot_env,
step_env_and_process_transition,
)
from lerobot.teleoperators import gamepad, so101_leader # noqa: F401 from lerobot.teleoperators import gamepad, so101_leader # noqa: F401
from lerobot.teleoperators.utils import TeleopEvents from lerobot.teleoperators.utils import TeleopEvents
from lerobot.transport import services_pb2, services_pb2_grpc from lerobot.transport import services_pb2, services_pb2_grpc
@@ -96,6 +90,13 @@ from lerobot.utils.utils import (
init_logging, init_logging,
) )
from .gym_manipulator import (
create_transition,
make_processors,
make_robot_env,
step_env_and_process_transition,
)
ACTOR_SHUTDOWN_TIMEOUT = 30 ACTOR_SHUTDOWN_TIMEOUT = 30
# Main entry point # Main entry point

View File

@@ -25,12 +25,13 @@ from lerobot.robots import ( # noqa: F401
make_robot_from_config, make_robot_from_config,
so100_follower, so100_follower,
) )
from lerobot.scripts.rl.gym_manipulator import make_robot_env
from lerobot.teleoperators import ( from lerobot.teleoperators import (
gamepad, # noqa: F401 gamepad, # noqa: F401
so101_leader, # noqa: F401 so101_leader, # noqa: F401
) )
from .gym_manipulator import make_robot_env
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)

View File

@@ -25,7 +25,7 @@ Examples of usage:
- Start a learner server for training: - Start a learner server for training:
```bash ```bash
python -m lerobot.scripts.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json python -m lerobot.rl.learner --config_path src/lerobot/configs/train_config_hilserl_so100.json
``` ```
**NOTE**: Start the learner server before launching the actor server. The learner opens a gRPC server **NOTE**: Start the learner server before launching the actor server. The learner opens a gRPC server
@@ -73,7 +73,6 @@ from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.policies.factory import make_policy from lerobot.policies.factory import make_policy
from lerobot.policies.sac.modeling_sac import SACPolicy from lerobot.policies.sac.modeling_sac import SACPolicy
from lerobot.robots import so100_follower # noqa: F401 from lerobot.robots import so100_follower # noqa: F401
from lerobot.scripts.rl import learner_service
from lerobot.teleoperators import gamepad, so101_leader # noqa: F401 from lerobot.teleoperators import gamepad, so101_leader # noqa: F401
from lerobot.teleoperators.utils import TeleopEvents from lerobot.teleoperators.utils import TeleopEvents
from lerobot.transport import services_pb2_grpc from lerobot.transport import services_pb2_grpc
@@ -100,6 +99,8 @@ from lerobot.utils.utils import (
) )
from lerobot.utils.wandb_utils import WandBLogger from lerobot.utils.wandb_utils import WandBLogger
from .learner_service import MAX_WORKERS, SHUTDOWN_TIMEOUT, LearnerService
LOG_PREFIX = "[LEARNER]" LOG_PREFIX = "[LEARNER]"
@@ -639,7 +640,7 @@ def start_learner(
# TODO: Check if its useful # TODO: Check if its useful
_ = ProcessSignalHandler(False, display_pid=True) _ = ProcessSignalHandler(False, display_pid=True)
service = learner_service.LearnerService( service = LearnerService(
shutdown_event=shutdown_event, shutdown_event=shutdown_event,
parameters_queue=parameters_queue, parameters_queue=parameters_queue,
seconds_between_pushes=cfg.policy.actor_learner_config.policy_parameters_push_frequency, seconds_between_pushes=cfg.policy.actor_learner_config.policy_parameters_push_frequency,
@@ -649,7 +650,7 @@ def start_learner(
) )
server = grpc.server( server = grpc.server(
ThreadPoolExecutor(max_workers=learner_service.MAX_WORKERS), ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[ options=[
("grpc.max_receive_message_length", MAX_MESSAGE_SIZE), ("grpc.max_receive_message_length", MAX_MESSAGE_SIZE),
("grpc.max_send_message_length", MAX_MESSAGE_SIZE), ("grpc.max_send_message_length", MAX_MESSAGE_SIZE),
@@ -670,7 +671,7 @@ def start_learner(
shutdown_event.wait() shutdown_event.wait()
logging.info("[LEARNER] Stopping gRPC server...") logging.info("[LEARNER] Stopping gRPC server...")
server.stop(learner_service.SHUTDOWN_TIMEOUT) server.stop(SHUTDOWN_TIMEOUT)
logging.info("[LEARNER] gRPC server stopped") logging.info("[LEARNER] gRPC server stopped")

View File

@@ -65,7 +65,7 @@ def close_service_stub(channel, server):
@require_package("grpc") @require_package("grpc")
def test_establish_learner_connection_success(): def test_establish_learner_connection_success():
from lerobot.scripts.rl.actor import establish_learner_connection from lerobot.rl.actor import establish_learner_connection
"""Test successful connection establishment.""" """Test successful connection establishment."""
stub, _servicer, channel, server = create_learner_service_stub() stub, _servicer, channel, server = create_learner_service_stub()
@@ -82,7 +82,7 @@ def test_establish_learner_connection_success():
@require_package("grpc") @require_package("grpc")
def test_establish_learner_connection_failure(): def test_establish_learner_connection_failure():
from lerobot.scripts.rl.actor import establish_learner_connection from lerobot.rl.actor import establish_learner_connection
"""Test connection failure.""" """Test connection failure."""
stub, servicer, channel, server = create_learner_service_stub() stub, servicer, channel, server = create_learner_service_stub()
@@ -101,7 +101,7 @@ def test_establish_learner_connection_failure():
@require_package("grpc") @require_package("grpc")
def test_push_transitions_to_transport_queue(): def test_push_transitions_to_transport_queue():
from lerobot.scripts.rl.actor import push_transitions_to_transport_queue from lerobot.rl.actor import push_transitions_to_transport_queue
from lerobot.transport.utils import bytes_to_transitions from lerobot.transport.utils import bytes_to_transitions
from tests.transport.test_transport_utils import assert_transitions_equal from tests.transport.test_transport_utils import assert_transitions_equal
@@ -137,7 +137,7 @@ def test_push_transitions_to_transport_queue():
@require_package("grpc") @require_package("grpc")
@pytest.mark.timeout(3) # force cross-platform watchdog @pytest.mark.timeout(3) # force cross-platform watchdog
def test_transitions_stream(): def test_transitions_stream():
from lerobot.scripts.rl.actor import transitions_stream from lerobot.rl.actor import transitions_stream
"""Test transitions stream functionality.""" """Test transitions stream functionality."""
shutdown_event = Event() shutdown_event = Event()
@@ -169,7 +169,7 @@ def test_transitions_stream():
@require_package("grpc") @require_package("grpc")
@pytest.mark.timeout(3) # force cross-platform watchdog @pytest.mark.timeout(3) # force cross-platform watchdog
def test_interactions_stream(): def test_interactions_stream():
from lerobot.scripts.rl.actor import interactions_stream from lerobot.rl.actor import interactions_stream
from lerobot.transport.utils import bytes_to_python_object, python_object_to_bytes from lerobot.transport.utils import bytes_to_python_object, python_object_to_bytes
"""Test interactions stream functionality.""" """Test interactions stream functionality."""

View File

@@ -90,13 +90,13 @@ def cfg():
@require_package("grpc") @require_package("grpc")
@pytest.mark.timeout(10) # force cross-platform watchdog @pytest.mark.timeout(10) # force cross-platform watchdog
def test_end_to_end_transitions_flow(cfg): def test_end_to_end_transitions_flow(cfg):
from lerobot.scripts.rl.actor import ( from lerobot.rl.actor import (
establish_learner_connection, establish_learner_connection,
learner_service_client, learner_service_client,
push_transitions_to_transport_queue, push_transitions_to_transport_queue,
send_transitions, send_transitions,
) )
from lerobot.scripts.rl.learner import start_learner from lerobot.rl.learner import start_learner
from lerobot.transport.utils import bytes_to_transitions from lerobot.transport.utils import bytes_to_transitions
from tests.transport.test_transport_utils import assert_transitions_equal from tests.transport.test_transport_utils import assert_transitions_equal
@@ -152,12 +152,12 @@ def test_end_to_end_transitions_flow(cfg):
@require_package("grpc") @require_package("grpc")
@pytest.mark.timeout(10) @pytest.mark.timeout(10)
def test_end_to_end_interactions_flow(cfg): def test_end_to_end_interactions_flow(cfg):
from lerobot.scripts.rl.actor import ( from lerobot.rl.actor import (
establish_learner_connection, establish_learner_connection,
learner_service_client, learner_service_client,
send_interactions, send_interactions,
) )
from lerobot.scripts.rl.learner import start_learner from lerobot.rl.learner import start_learner
from lerobot.transport.utils import bytes_to_python_object, python_object_to_bytes from lerobot.transport.utils import bytes_to_python_object, python_object_to_bytes
"""Test complete interactions flow from actor to learner.""" """Test complete interactions flow from actor to learner."""
@@ -226,8 +226,8 @@ def test_end_to_end_interactions_flow(cfg):
@pytest.mark.parametrize("data_size", ["small", "large"]) @pytest.mark.parametrize("data_size", ["small", "large"])
@pytest.mark.timeout(10) @pytest.mark.timeout(10)
def test_end_to_end_parameters_flow(cfg, data_size): def test_end_to_end_parameters_flow(cfg, data_size):
from lerobot.scripts.rl.actor import establish_learner_connection, learner_service_client, receive_policy from lerobot.rl.actor import establish_learner_connection, learner_service_client, receive_policy
from lerobot.scripts.rl.learner import start_learner from lerobot.rl.learner import start_learner
from lerobot.transport.utils import bytes_to_state_dict, state_to_bytes from lerobot.transport.utils import bytes_to_state_dict, state_to_bytes
"""Test complete parameter flow from learner to actor, with small and large data.""" """Test complete parameter flow from learner to actor, with small and large data."""

View File

@@ -50,7 +50,7 @@ def create_learner_service_stub(
): ):
import grpc import grpc
from lerobot.scripts.rl.learner_service import LearnerService from lerobot.rl.learner_service import LearnerService
from lerobot.transport import services_pb2_grpc # generated from .proto from lerobot.transport import services_pb2_grpc # generated from .proto
"""Fixture to start a LearnerService gRPC server and provide a connected stub.""" """Fixture to start a LearnerService gRPC server and provide a connected stub."""