* initial commit * change device in test * do detailed import * adhere to python 3.11 syntax * fix autodocstring * additionally * do same in other files * add model. prefix to all keys in state dict * use dummy stats * add pi05 * also shorten action_steps * fix test * all test pass! and fix tokenizer max length between 05 and 0 * remove test * fix transformer dependency * fix test * split pi0 and pi05 policy in seperate files * fix test * fix push to hub test * add some comments, license and readme * remove warning in config * add pi05 to factory * remove check * rename action_horizon to chunk_size * clean up padding of state and action (more in line with lerobot pi0) * add openpi image transforms for training and add more flexibility to _preprocess_images similar to lerobot pi0 * fix key match from pytorch state dict (similar keys to openpi implementation now) * also for pi05 * update to python 3.11 * revert to openpi transformer replace python 3.11 * fix(modeling pi0): nit warning message * use safeauto_docstring * fix: remove unused param * fix from pretrained * add preprocess tests * also compile forward method * Do not add model prefix to normalization * use same name for action and state dim as lerobot pi0 and remove fixed image keys * load from pretrained_path * temp: hardcode base model * fix override self.pretrained_path = None overwrite * rename to loss * remove additional image augmentations, lerobot dataset already does this * Add docs * put tests in test folder * Add test to instatiate all base models * go back to python 3.10 * update docs * adapt docs pi05 * change docs: finetune base model options * minor docs fixes and dependencies * remove todo * cast float64 to float32 for mps * skip if no transformers * fix tests * add new models to modelcard * add back init * fix circular input * feat: only run pi test on GPU * remove require_nightly_gpu * replace decorator test_pi0_openpi * rename action_dim, state_dim to max_action_dim, max_state_dim * fix doc and constants * cleanup tests * fix from pretrained * fix tests * add comment pi0 pi05 tests, add image features to pi0 pi05 hub tests * fix, state is included in language not in flow head * Move test to specific folder * and paligemma task with newline * remove add_special_tokens, not needed * feedback pr * Remove previous pi0 and rename pi0_openpi and pi05_openpi * Add Quantile stats to LeRobotDataset (#1985) * - Add RunningQuantileStats class for efficient histogram-based quantile computation - Integrate quantile parameters (compute_quantiles, quantiles) into LeRobotDataset - Support quantile computation during episode collection and aggregation - Add comprehensive function-based test suite (24 tests) for quantile functionality - Maintain full backward compatibility with existing stats computation - Enable configurable quantiles (default: [0.01, 0.99]) for robust normalization * style fixes, make quantiles computation by default to new datasets * fix tests * - Added DEFAULT_QUANTILES=[0.01, 0.10, 0.50, 0.90, 0.99] to be computed for each features instead of being chosen by the user - Fortified tests. * - add helper functions to reshape stats - add missing test for quantiles * - Add QUANTILE normalization mode to normalize the data with the 1st and 99th percentiles. - Add QUANTILE10 normalization mode to normalize the data with the 10th and 90th percentiles. * style fixes * Added missing lisence * Simplify compute_stats * - added script `augment_dataset_quantile_stats.py` so that we can add quantile stats to existing v3 datasets that dont have quatniles - modified quantile computation instead of using the edge for the value, interpolate the values in the bin * rename pi0/pi05 files * Remove open pi patch and use custom transformer branch for now * renaming * fix * Revert "fix" This reverts commit 1ea65730ac2cbca6e5869df734fbd4392561b3c6. * fix naming * feet(pi0/pi0.5): add pipeline (#2009) * feat(processor): convert openpi model with processor * TODO: Make test works * fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests - Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`. - Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`. - Enhanced task handling in tests to ensure proper formatting and batch size consistency. - Cleaned up commented-out test code for clarity. * refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy - Updated imports and references throughout the codebase to reflect the new naming convention. - Introduced a new processor file for PI0 to handle pre-processing and post-processing steps. - Adjusted tests to utilize the renamed classes, ensuring consistency and functionality. - Enhanced clarity and maintainability by removing outdated naming conventions. * refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration - Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions. - Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`. - Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter. - Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability. - Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility. * feat(processor): convert openpi model with processor * TODO: Make test works * fix(modeling_pi0openpi): update attention mask value and time scaling; improve task handling in tests - Changed the attention mask value from `self.config.attention_mask_value` to a fixed value of `-2.3819763e38`. - Updated time scaling in the `sample_noise` method to use a constant factor of `0.999` and an offset of `0.001`. - Enhanced task handling in tests to ensure proper formatting and batch size consistency. - Cleaned up commented-out test code for clarity. * refactor(pi0): rename PI0OpenPIConfig and PI0OpenPIPolicy to PI0Config and PI0Policy - Updated imports and references throughout the codebase to reflect the new naming convention. - Introduced a new processor file for PI0 to handle pre-processing and post-processing steps. - Adjusted tests to utilize the renamed classes, ensuring consistency and functionality. - Enhanced clarity and maintainability by removing outdated naming conventions. * refactor(pi05): rename PI0OpenPIPolicy to PI0Policy and update configuration - Renamed `PI0OpenPIPolicy` to `PI0Policy` for consistency with naming conventions. - Updated the `PI05OpenPIConfig` to include a new `tokenizer_max_length` attribute and changed the normalization mode for state from `MEAN_STD` to `QUANTILES`. - Simplified model initialization in `PI05OpenPIPolicy` by removing unused `dataset_stats` parameter. - Added a new processor class for `Pi05PrepareStateTokenizerProcessorStep` with `@dataclass` for improved readability. - Introduced a test script to compare the integration of the PI0OpenPI policy with the original implementation, ensuring local testing compatibility. * refactor(pi05): update imports and rename configuration classes - Changed imports to reflect the new naming convention for PI05 configuration and policy classes. - Renamed `PI05OpenPIConfig` to `PI05Config` and `PI05OpenPIPolicy` to `PI05Policy` for consistency. - Introduced a new processor file for PI05, implementing pre-processing and post-processing steps. - Updated tests to utilize the renamed classes, ensuring functionality and consistency across the codebase. * update(pi05): increase tokenizer_max_length for improved processing - Changed the `tokenizer_max_length` from 48 to 200 to enhance the model's capability in handling longer sequences. - This adjustment aims to improve the overall performance and flexibility of the PI05 configuration. * add default for state (max_state_dim) * correct naming * fix import * cleanup code * remove unused test * us quantiles for action * move to device * remove discrete state assert * fix pi05 test * move pi05 to device * use base models in comparison tests * small renames for tests * change number of tokens pi05 test * fix openpi tokenization in test * fix hub test * fix test * assert lerobot vs openpi tests --------- Co-authored-by: Pepijn <pepijn@huggingface.co> * add headers * add back previously removed imports * update if statement load processor with dataset stats * remove to avoid circular import * inject dataset stats for pretrained models * check normalization before applying * add link to quantile augument script * fix(policies): transformers import for ci in PI0 & PI05 (#2039) * fix(policies): transformers import for ci in PI0 * fix(policies): transformers import for ci in PI05 * test(processor): fix expected raise when normalization types are missing (#2040) * switch normalization order pipeline for pi05 * Fix/quantiles script (#2064) * refactor augment stats with quantiles script add parallelization for faster processing shift the quantile normalization between -1 1 * fix replay buffer tests * fix comment * overwrite the pipeline normalization features with the policy features * remove double normalization overwrite * cleanup from pretrained * remove typo * also set norm_map * fix(augment_quantiles) images incorrectly divided by 255 * clamp quantiles * link to lerobot base models * rename tests * encorperate PR feedback * update docstring for RunningQuantileStats * update doc links * Revert "clamp quantiles" This reverts commit 172207471c8f2cb62958e9a9e6a0535ba3ff67d4. * fix self.paligemma * fix tests related to quantiles that were scaled to [0,1], the new range is [-1, 1] * fix libero doc and use different transformer branch * use fix branch instead of feat * update results libero * add new line * fix formatting * precommit * update results libero * update libero doc * update title * final changes * add quantiles to test * run pre commit --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Steven Palma <steven.palma@huggingface.co>
376 lines
14 KiB
Python
376 lines
14 KiB
Python
#!/usr/bin/env python
|
|
|
|
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
import logging
|
|
import time
|
|
from contextlib import nullcontext
|
|
from pprint import pformat
|
|
from typing import Any
|
|
|
|
import torch
|
|
from termcolor import colored
|
|
from torch.amp import GradScaler
|
|
from torch.optim import Optimizer
|
|
|
|
from lerobot.configs import parser
|
|
from lerobot.configs.train import TrainPipelineConfig
|
|
from lerobot.datasets.factory import make_dataset
|
|
from lerobot.datasets.sampler import EpisodeAwareSampler
|
|
from lerobot.datasets.utils import cycle
|
|
from lerobot.envs.factory import make_env
|
|
from lerobot.envs.utils import close_envs
|
|
from lerobot.optim.factory import make_optimizer_and_scheduler
|
|
from lerobot.policies.factory import make_policy, make_pre_post_processors
|
|
from lerobot.policies.pretrained import PreTrainedPolicy
|
|
from lerobot.policies.utils import get_device_from_parameters
|
|
from lerobot.rl.wandb_utils import WandBLogger
|
|
from lerobot.scripts.lerobot_eval import eval_policy_all
|
|
from lerobot.utils.logging_utils import AverageMeter, MetricsTracker
|
|
from lerobot.utils.random_utils import set_seed
|
|
from lerobot.utils.train_utils import (
|
|
get_step_checkpoint_dir,
|
|
get_step_identifier,
|
|
load_training_state,
|
|
save_checkpoint,
|
|
update_last_checkpoint,
|
|
)
|
|
from lerobot.utils.utils import (
|
|
format_big_number,
|
|
get_safe_torch_device,
|
|
has_method,
|
|
init_logging,
|
|
)
|
|
|
|
|
|
def update_policy(
|
|
train_metrics: MetricsTracker,
|
|
policy: PreTrainedPolicy,
|
|
batch: Any,
|
|
optimizer: Optimizer,
|
|
grad_clip_norm: float,
|
|
grad_scaler: GradScaler,
|
|
lr_scheduler=None,
|
|
use_amp: bool = False,
|
|
lock=None,
|
|
) -> tuple[MetricsTracker, dict]:
|
|
"""
|
|
Performs a single training step to update the policy's weights.
|
|
|
|
This function executes the forward and backward passes, clips gradients, and steps the optimizer and
|
|
learning rate scheduler. It also handles mixed-precision training via a GradScaler.
|
|
|
|
Args:
|
|
train_metrics: A MetricsTracker instance to record training statistics.
|
|
policy: The policy model to be trained.
|
|
batch: A batch of training data.
|
|
optimizer: The optimizer used to update the policy's parameters.
|
|
grad_clip_norm: The maximum norm for gradient clipping.
|
|
grad_scaler: The GradScaler for automatic mixed-precision training.
|
|
lr_scheduler: An optional learning rate scheduler.
|
|
use_amp: A boolean indicating whether to use automatic mixed precision.
|
|
lock: An optional lock for thread-safe optimizer updates.
|
|
|
|
Returns:
|
|
A tuple containing:
|
|
- The updated MetricsTracker with new statistics for this step.
|
|
- A dictionary of outputs from the policy's forward pass, for logging purposes.
|
|
"""
|
|
start_time = time.perf_counter()
|
|
device = get_device_from_parameters(policy)
|
|
policy.train()
|
|
with torch.autocast(device_type=device.type) if use_amp else nullcontext():
|
|
loss, output_dict = policy.forward(batch)
|
|
# TODO(rcadene): policy.unnormalize_outputs(out_dict)
|
|
grad_scaler.scale(loss).backward()
|
|
|
|
# Unscale the gradient of the optimizer's assigned params in-place **prior to gradient clipping**.
|
|
grad_scaler.unscale_(optimizer)
|
|
|
|
grad_norm = torch.nn.utils.clip_grad_norm_(
|
|
policy.parameters(),
|
|
grad_clip_norm,
|
|
error_if_nonfinite=False,
|
|
)
|
|
|
|
# Optimizer's gradients are already unscaled, so scaler.step does not unscale them,
|
|
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
|
|
with lock if lock is not None else nullcontext():
|
|
grad_scaler.step(optimizer)
|
|
# Updates the scale for next iteration.
|
|
grad_scaler.update()
|
|
|
|
optimizer.zero_grad()
|
|
|
|
# Step through pytorch scheduler at every batch instead of epoch
|
|
if lr_scheduler is not None:
|
|
lr_scheduler.step()
|
|
|
|
if has_method(policy, "update"):
|
|
# To possibly update an internal buffer (for instance an Exponential Moving Average like in TDMPC).
|
|
policy.update()
|
|
|
|
train_metrics.loss = loss.item()
|
|
train_metrics.grad_norm = grad_norm.item()
|
|
train_metrics.lr = optimizer.param_groups[0]["lr"]
|
|
train_metrics.update_s = time.perf_counter() - start_time
|
|
return train_metrics, output_dict
|
|
|
|
|
|
@parser.wrap()
|
|
def train(cfg: TrainPipelineConfig):
|
|
"""
|
|
Main function to train a policy.
|
|
|
|
This function orchestrates the entire training pipeline, including:
|
|
- Setting up logging, seeding, and device configuration.
|
|
- Creating the dataset, evaluation environment (if applicable), policy, and optimizer.
|
|
- Handling resumption from a checkpoint.
|
|
- Running the main training loop, which involves fetching data batches and calling `update_policy`.
|
|
- Periodically logging metrics, saving model checkpoints, and evaluating the policy.
|
|
- Pushing the final trained model to the Hugging Face Hub if configured.
|
|
|
|
Args:
|
|
cfg: A `TrainPipelineConfig` object containing all training configurations.
|
|
"""
|
|
cfg.validate()
|
|
logging.info(pformat(cfg.to_dict()))
|
|
|
|
if cfg.wandb.enable and cfg.wandb.project:
|
|
wandb_logger = WandBLogger(cfg)
|
|
else:
|
|
wandb_logger = None
|
|
logging.info(colored("Logs will be saved locally.", "yellow", attrs=["bold"]))
|
|
|
|
if cfg.seed is not None:
|
|
set_seed(cfg.seed)
|
|
|
|
# Check device is available
|
|
device = get_safe_torch_device(cfg.policy.device, log=True)
|
|
torch.backends.cudnn.benchmark = True
|
|
torch.backends.cuda.matmul.allow_tf32 = True
|
|
|
|
logging.info("Creating dataset")
|
|
dataset = make_dataset(cfg)
|
|
|
|
# Create environment used for evaluating checkpoints during training on simulation data.
|
|
# On real-world data, no need to create an environment as evaluations are done outside train.py,
|
|
# using the eval.py instead, with gym_dora environment and dora-rs.
|
|
eval_env = None
|
|
if cfg.eval_freq > 0 and cfg.env is not None:
|
|
logging.info("Creating env")
|
|
eval_env = make_env(cfg.env, n_envs=cfg.eval.batch_size, use_async_envs=cfg.eval.use_async_envs)
|
|
|
|
logging.info("Creating policy")
|
|
policy = make_policy(
|
|
cfg=cfg.policy,
|
|
ds_meta=dataset.meta,
|
|
)
|
|
|
|
# Create processors - only provide dataset_stats if not resuming from saved processors
|
|
processor_kwargs = {}
|
|
postprocessor_kwargs = {}
|
|
if (cfg.policy.pretrained_path and not cfg.resume) or not cfg.policy.pretrained_path:
|
|
# Only provide dataset_stats when not resuming from saved processor state
|
|
processor_kwargs["dataset_stats"] = dataset.meta.stats
|
|
|
|
if cfg.policy.pretrained_path is not None:
|
|
processor_kwargs["preprocessor_overrides"] = {
|
|
"device_processor": {"device": device.type},
|
|
"normalizer_processor": {
|
|
"stats": dataset.meta.stats,
|
|
"features": {**policy.config.input_features, **policy.config.output_features},
|
|
"norm_map": policy.config.normalization_mapping,
|
|
},
|
|
}
|
|
postprocessor_kwargs["postprocessor_overrides"] = {
|
|
"unnormalizer_processor": {
|
|
"stats": dataset.meta.stats,
|
|
"features": policy.config.output_features,
|
|
"norm_map": policy.config.normalization_mapping,
|
|
},
|
|
}
|
|
|
|
preprocessor, postprocessor = make_pre_post_processors(
|
|
policy_cfg=cfg.policy,
|
|
pretrained_path=cfg.policy.pretrained_path,
|
|
**processor_kwargs,
|
|
**postprocessor_kwargs,
|
|
)
|
|
|
|
logging.info("Creating optimizer and scheduler")
|
|
optimizer, lr_scheduler = make_optimizer_and_scheduler(cfg, policy)
|
|
grad_scaler = GradScaler(device.type, enabled=cfg.policy.use_amp)
|
|
|
|
step = 0 # number of policy updates (forward + backward + optim)
|
|
|
|
if cfg.resume:
|
|
step, optimizer, lr_scheduler = load_training_state(cfg.checkpoint_path, optimizer, lr_scheduler)
|
|
|
|
num_learnable_params = sum(p.numel() for p in policy.parameters() if p.requires_grad)
|
|
num_total_params = sum(p.numel() for p in policy.parameters())
|
|
|
|
logging.info(colored("Output dir:", "yellow", attrs=["bold"]) + f" {cfg.output_dir}")
|
|
if cfg.env is not None:
|
|
logging.info(f"{cfg.env.task=}")
|
|
logging.info(f"{cfg.steps=} ({format_big_number(cfg.steps)})")
|
|
logging.info(f"{dataset.num_frames=} ({format_big_number(dataset.num_frames)})")
|
|
logging.info(f"{dataset.num_episodes=}")
|
|
logging.info(f"{num_learnable_params=} ({format_big_number(num_learnable_params)})")
|
|
logging.info(f"{num_total_params=} ({format_big_number(num_total_params)})")
|
|
|
|
# create dataloader for offline training
|
|
if hasattr(cfg.policy, "drop_n_last_frames"):
|
|
shuffle = False
|
|
sampler = EpisodeAwareSampler(
|
|
dataset.meta.episodes["dataset_from_index"],
|
|
dataset.meta.episodes["dataset_to_index"],
|
|
drop_n_last_frames=cfg.policy.drop_n_last_frames,
|
|
shuffle=True,
|
|
)
|
|
else:
|
|
shuffle = True
|
|
sampler = None
|
|
|
|
dataloader = torch.utils.data.DataLoader(
|
|
dataset,
|
|
num_workers=cfg.num_workers,
|
|
batch_size=cfg.batch_size,
|
|
shuffle=shuffle and not cfg.dataset.streaming,
|
|
sampler=sampler,
|
|
pin_memory=device.type == "cuda",
|
|
drop_last=False,
|
|
prefetch_factor=2,
|
|
)
|
|
dl_iter = cycle(dataloader)
|
|
|
|
policy.train()
|
|
|
|
train_metrics = {
|
|
"loss": AverageMeter("loss", ":.3f"),
|
|
"grad_norm": AverageMeter("grdn", ":.3f"),
|
|
"lr": AverageMeter("lr", ":0.1e"),
|
|
"update_s": AverageMeter("updt_s", ":.3f"),
|
|
"dataloading_s": AverageMeter("data_s", ":.3f"),
|
|
}
|
|
|
|
train_tracker = MetricsTracker(
|
|
cfg.batch_size, dataset.num_frames, dataset.num_episodes, train_metrics, initial_step=step
|
|
)
|
|
|
|
logging.info("Start offline training on a fixed dataset")
|
|
for _ in range(step, cfg.steps):
|
|
start_time = time.perf_counter()
|
|
batch = next(dl_iter)
|
|
batch = preprocessor(batch)
|
|
train_tracker.dataloading_s = time.perf_counter() - start_time
|
|
|
|
train_tracker, output_dict = update_policy(
|
|
train_tracker,
|
|
policy,
|
|
batch,
|
|
optimizer,
|
|
cfg.optimizer.grad_clip_norm,
|
|
grad_scaler=grad_scaler,
|
|
lr_scheduler=lr_scheduler,
|
|
use_amp=cfg.policy.use_amp,
|
|
)
|
|
|
|
# Note: eval and checkpoint happens *after* the `step`th training update has completed, so we
|
|
# increment `step` here.
|
|
step += 1
|
|
train_tracker.step()
|
|
is_log_step = cfg.log_freq > 0 and step % cfg.log_freq == 0
|
|
is_saving_step = step % cfg.save_freq == 0 or step == cfg.steps
|
|
is_eval_step = cfg.eval_freq > 0 and step % cfg.eval_freq == 0
|
|
|
|
if is_log_step:
|
|
logging.info(train_tracker)
|
|
if wandb_logger:
|
|
wandb_log_dict = train_tracker.to_dict()
|
|
if output_dict:
|
|
wandb_log_dict.update(output_dict)
|
|
wandb_logger.log_dict(wandb_log_dict, step)
|
|
train_tracker.reset_averages()
|
|
|
|
if cfg.save_checkpoint and is_saving_step:
|
|
logging.info(f"Checkpoint policy after step {step}")
|
|
checkpoint_dir = get_step_checkpoint_dir(cfg.output_dir, cfg.steps, step)
|
|
save_checkpoint(
|
|
checkpoint_dir, step, cfg, policy, optimizer, lr_scheduler, preprocessor, postprocessor
|
|
)
|
|
update_last_checkpoint(checkpoint_dir)
|
|
if wandb_logger:
|
|
wandb_logger.log_policy(checkpoint_dir)
|
|
|
|
if cfg.env and is_eval_step:
|
|
step_id = get_step_identifier(step, cfg.steps)
|
|
logging.info(f"Eval policy at step {step}")
|
|
with (
|
|
torch.no_grad(),
|
|
torch.autocast(device_type=device.type) if cfg.policy.use_amp else nullcontext(),
|
|
):
|
|
eval_info = eval_policy_all(
|
|
envs=eval_env, # dict[suite][task_id] -> vec_env
|
|
policy=policy,
|
|
preprocessor=preprocessor,
|
|
postprocessor=postprocessor,
|
|
n_episodes=cfg.eval.n_episodes,
|
|
videos_dir=cfg.output_dir / "eval" / f"videos_step_{step_id}",
|
|
max_episodes_rendered=4,
|
|
start_seed=cfg.seed,
|
|
max_parallel_tasks=cfg.env.max_parallel_tasks,
|
|
)
|
|
# overall metrics (suite-agnostic)
|
|
aggregated = eval_info["overall"]
|
|
|
|
# optional: per-suite logging
|
|
for suite, suite_info in eval_info.items():
|
|
logging.info("Suite %s aggregated: %s", suite, suite_info)
|
|
|
|
# meters/tracker
|
|
eval_metrics = {
|
|
"avg_sum_reward": AverageMeter("∑rwrd", ":.3f"),
|
|
"pc_success": AverageMeter("success", ":.1f"),
|
|
"eval_s": AverageMeter("eval_s", ":.3f"),
|
|
}
|
|
eval_tracker = MetricsTracker(
|
|
cfg.batch_size, dataset.num_frames, dataset.num_episodes, eval_metrics, initial_step=step
|
|
)
|
|
eval_tracker.eval_s = aggregated.pop("eval_s")
|
|
eval_tracker.avg_sum_reward = aggregated.pop("avg_sum_reward")
|
|
eval_tracker.pc_success = aggregated.pop("pc_success")
|
|
if wandb_logger:
|
|
wandb_log_dict = {**eval_tracker.to_dict(), **eval_info}
|
|
wandb_logger.log_dict(wandb_log_dict, step, mode="eval")
|
|
wandb_logger.log_video(eval_info["overall"]["video_paths"][0], step, mode="eval")
|
|
|
|
if eval_env:
|
|
close_envs(eval_env)
|
|
logging.info("End of training")
|
|
|
|
if cfg.policy.push_to_hub:
|
|
policy.push_model_to_hub(cfg)
|
|
preprocessor.push_to_hub(cfg.policy.repo_id)
|
|
postprocessor.push_to_hub(cfg.policy.repo_id)
|
|
|
|
|
|
def main():
|
|
init_logging()
|
|
train()
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|