feat: add run_multienv_o3.py script for multi-environment evaluation

- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
parent 1342bfe5ce
commit 0f00788c4d
5 changed files with 1148 additions and 209 deletions
--- a/run_multienv_uitars15.py
+++ b/run_multienv_uitars15.py
@@ -207,7 +207,6 @@ def run_env_tasks(task_queue: Queue, args: argparse.Namespace, shared_scores: li
            max_tokens=args.max_tokens,
            top_p=args.top_p,
            temperature=args.temperature,
-            
            max_trajectory_length=args.max_trajectory_length,
            max_image_history_length=args.max_image_history_length,
            use_thinking=args.use_thinking,