feat: add run_multienv_o3.py script for multi-environment evaluation

- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
This commit is contained in:
yuanmengqi
2025-07-27 16:47:24 +00:00
parent 1342bfe5ce
commit 0f00788c4d
5 changed files with 1148 additions and 209 deletions

View File

@@ -207,7 +207,6 @@ def run_env_tasks(task_queue: Queue, args: argparse.Namespace, shared_scores: li
max_tokens=args.max_tokens,
top_p=args.top_p,
temperature=args.temperature,
max_trajectory_length=args.max_trajectory_length,
max_image_history_length=args.max_image_history_length,
use_thinking=args.use_thinking,