Add Aloha env and ACT policy

WIP Aloha env tests pass Rendering works (fps look fast tho? TODO action bounding is too wide [-1,1]) Update README Copy past from act repo Remove download.py add a WIP for Simxarm Remove download.py add a WIP for Simxarm Add act yaml (TODO: try train.py) Training can runs (TODO: eval) Add tasks without end_effector that are compatible with dataset, Eval can run (TODO: training and pretrained model) Add AbstractEnv, Refactor AlohaEnv, Add rendering_hook in env, Minor modifications, (TODO: Refactor Pusht and Simxarm) poetry lock fix bug in compute_stats for action normalization fix more bugs in normalization fix training fix import PushtEnv inheriates AbstractEnv, Improve factory Normalization Add _make_env to EnvAbstract Add call_rendering_hooks to pusht env SimxarmEnv inherites from AbstractEnv (NOT TESTED) Add aloha tests artifacts + update pusht stats fix image normalization: before env was in [0,1] but dataset in [0,255], and now both in [0,255] Small fix on simxarm Add next to obs Add top camera to Aloha env (TODO: make it compatible with set of cameras) Add top camera to Aloha env (TODO: make it compatible with set of cameras)
2024-03-08 09:47:39 +00:00
parent 060bac7672
commit 9d002032d1
116 changed files with 3658 additions and 301 deletions
--- a/lerobot/configs/env/aloha.yaml
+++ b/lerobot/configs/env/aloha.yaml
@@ -15,11 +15,11 @@ env:
  task: sim_insertion_human
  from_pixels: True
  pixels_only: False
-  image_size: 96
+  image_size: [3, 480, 640]
  action_repeat: 1
-  episode_length: 300
+  episode_length: 400
  fps: ${fps}

 policy:
-  state_dim: 2
-  action_dim: 2
+  state_dim: 14
+  action_dim: 14
--- a/lerobot/configs/policy/act.yaml
+++ b/lerobot/configs/policy/act.yaml
@@ -0,0 +1,58 @@
+# @package _global_
+
+offline_steps: 1344000
+online_steps: 0
+
+eval_episodes: 1
+eval_freq: 10000
+save_freq: 100000
+log_freq: 250
+
+horizon: 100
+n_obs_steps: 1
+n_latency_steps: 0
+# when temporal_agg=False, n_action_steps=horizon
+n_action_steps: ${horizon}
+
+policy:
+  name: act
+
+  pretrained_model_path:
+
+  lr: 1e-5
+  lr_backbone: 1e-5
+  weight_decay: 1e-4
+  grad_clip_norm: 10
+  backbone: resnet18
+  num_queries: ${horizon} # chunk_size
+  horizon: ${horizon} # chunk_size
+  kl_weight: 10
+  hidden_dim: 512
+  dim_feedforward: 3200
+  enc_layers: 4
+  dec_layers: 7
+  nheads: 8
+  #camera_names: [top, front_close, left_pillar, right_pillar]
+  camera_names: [top]
+  position_embedding: sine
+  masks: false
+  dilation: false
+  dropout: 0.1
+  pre_norm: false
+
+  vae: true
+
+  batch_size: 8
+
+  per_alpha: 0.6
+  per_beta: 0.4
+
+  balanced_sampling: false
+  utd: 1
+
+  n_obs_steps: ${n_obs_steps}
+
+  temporal_agg: false
+
+  state_dim: ???
+  action_dim: ???