Commit Graph

30 Commits

Author SHA1 Message Date
Daphne Barretto
21c2b7629b Add consistent scores validation (#368)
* Add consistent scores validation

* revert osworld_run_maestro.py changes
2025-10-29 01:44:48 +08:00
yuanmengqi
0f00788c4d feat: add run_multienv_o3.py script for multi-environment evaluation
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
张逸群
4d6e0fd031 Add --provider_name parameter to run.py and fix Docker provider initialization (#277)
- Add command-line argument --provider_name to support flexible provider selection
- Default provider remains vmware for backward compatibility
- Fix Docker provider controller initialization issue with delayed setup
- Add safety checks for controller existence in error handling

This enables users to specify different virtualization providers directly
from the command line and resolves Docker container lifecycle issues.
2025-07-23 04:09:36 +08:00
Zilong Zhou
dc164d5269 feat&fix: update configuration management to save model arguments and enhance UI display for model args (#262) 2025-07-16 21:46:35 +08:00
Dunjie Lu
8be2a40967 Docker (#92)
* multi_env

* multi_env

---------

Co-authored-by: Timothyxxx <384084775@qq.com>
2024-11-02 22:28:23 +08:00
Xiaochuan Li
c7e3004456 fix the bug about auto download; now the default vmware path is None, which can trigger the auto download manner (#58) 2024-07-24 08:28:40 +08:00
Tianbao Xie
fffa8f8da6 Refactoring VMware Integration and Implementing AWS Support (#44)
* Initailize aws support

* Add README for the VM server

* Refactor OSWorld for supporting more cloud services.

* Initialize vmware and aws implementation v1, waiting for verification

* Initlize files for azure, gcp and virtualbox support

* Debug on the VMware provider

* Fix on aws interface mapping

* Fix instance type

* Refactor

* Clean

* hk region; debug

* Fix lock

* Remove print

* Remove key_name requirements when allocating aws vm

* Clean README

---------

Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
2024-06-15 20:52:29 +08:00
Timothyxxx
5568dfd141 Handling more exceptions; Fix hyperparameter passing 2024-05-20 17:22:07 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
d79d5d2c01 Clean Code 2024-03-27 14:46:29 +08:00
Timothyxxx
d1e2b12b41 Fix GIMP bug; Speedup the environment, when there is not a11y tree needed, we can do no controller.get 2024-03-20 22:22:59 +08:00
Timothyxxx
f992d1f694 Disable a11y tree temporarily 2024-03-18 21:43:35 +08:00
Timothyxxx
a145b97bd0 Minor fix 2024-03-18 15:02:22 +08:00
Timothyxxx
c1c7ac298f Update claude endpoint 2024-03-18 14:59:02 +08:00
Jason Lee
8080828a84 update wandb settings 2024-03-18 00:02:41 +08:00
Jason Lee
716cf7b9ff add wandb settings 2024-03-17 22:31:43 +08:00
Jason Lee
48aedb09a7 add wandb settings, remember to set WANDB_KEY 2024-03-17 22:30:29 +08:00
Timothyxxx
e156a20e3d Update new func 2024-03-17 22:25:13 +08:00
Timothyxxx
639f8c7db8 Fix small bugs in max time limit setting 2024-03-16 14:34:40 +08:00
Timothyxxx
5a062c423f Delete the useless ids; Fix bugs in run.py 2024-03-16 14:12:54 +08:00
Jason Lee
053da203b8 new timer, but need to set in setting.json file, need to be upgraded into parameters 2024-03-16 12:36:23 +08:00
Jason Lee
44679724b8 try new timer 2024-03-16 11:54:45 +08:00
Jason Lee
1a53a28475 update timer 2024-03-15 23:28:45 +08:00
Jason Lee
afec1a3a23 update 2024-03-15 23:26:04 +08:00
Timothyxxx
99e86a2cd4 Update unfinished function and error catching 2024-03-15 23:12:18 +08:00
Timothyxxx
51d644c88b Merge 2024-03-15 21:12:18 +08:00
Timothyxxx
4db207fc27 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	mm_agents/agent.py
#	run.py
2024-03-15 21:10:32 +08:00
Timothyxxx
5cbf1b28ca Fix bugs 2024-03-15 21:06:50 +08:00
Jason Lee
815c7ab67c filter unfinished examples and add timer to ensure upper limit of each example 2024-03-15 16:52:17 +08:00
Timothyxxx
44ff027801 Refactor experiments and agent implementation 2024-03-14 22:32:49 +08:00