Commit Graph

95 Commits

Author SHA1 Message Date
Zilong Zhou
595a704aff fix: fix proxy setup (#227)
* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example
2025-07-02 01:36:32 +08:00
Tianbao Xie
30138c5db1 VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
Tianbao Xie
0cc93543a8 Environment is_used flag; OS domain fix (#219)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

* Enhance DesktopEnv to track environment usage for optimized snapshot management. Introduce is_environment_used flag to determine if a snapshot revert is necessary based on provider type. Update setup and step methods to mark environment usage appropriately. Add new execute_with_verification method in SetupController for command execution with result verification, improving reliability. Change AWS instance type to m5.large for better performance and update AMI ID for compatibility. Update file opening logic in main.py to handle both file paths and application commands more effectively.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-28 00:45:53 +08:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
adlsdztony
bfae51d74d fix: enhance setup method with retry logic and return status 2025-06-09 16:07:13 +00:00
adlsdztony
493abdeeab feat&refactor: add proxy setup functionality and update .gitignore for proxy config file 2025-06-07 11:24:49 +00:00
adlsdztony
71e9a1ead8 fix&refactor: improve error handling in download process and enhance start_emulator method signature 2025-06-06 09:08:14 +00:00
adlsdztony
0ca0085b18 fix: improve connection logging in SetupController 2025-06-05 11:04:33 +08:00
adlsdztony
d8ae209162 fix&refactor: improve connection retry logic and remove unnecessary wait time for AWS instance readiness 2025-05-28 13:05:32 +08:00
adlsdztony
431a762421 feat&fix: add logging for setup function calls and include snapshot name in AWS provider configuration 2025-05-26 20:37:20 +08:00
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Pierre Carrier
b35dc40ff4 SetupController: no server_port for chrome (#96) 2024-11-07 00:33:03 +08:00
HappySix
6419d707bc Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
Timothyxxx
df231889c9 Fix minor bug 2024-08-04 11:35:44 +08:00
Jason Lee
fcdaf7ce0b Update setup.py for update_browse_history function 2024-07-04 09:37:13 -05:00
Tianbao Xie
fffa8f8da6 Refactoring VMware Integration and Implementing AWS Support (#44)
* Initailize aws support

* Add README for the VM server

* Refactor OSWorld for supporting more cloud services.

* Initialize vmware and aws implementation v1, waiting for verification

* Initlize files for azure, gcp and virtualbox support

* Debug on the VMware provider

* Fix on aws interface mapping

* Fix instance type

* Refactor

* Clean

* hk region; debug

* Fix lock

* Remove print

* Remove key_name requirements when allocating aws vm

* Clean README

---------

Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
2024-06-15 20:52:29 +08:00
rhythmcao
c121869219 fix a small bug in computer_13 action space 2024-06-11 14:22:31 +08:00
Timothyxxx
306dcbda71 Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc. 2024-05-21 21:08:22 +08:00
Timothyxxx
f9594e476e Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation 2024-05-20 00:47:43 +08:00
Timothyxxx
97b567a287 Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py 2024-04-26 13:32:41 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
7ca91ca8c9 Add action execution timeout for corner cases 2024-03-21 11:16:57 +08:00
David Chang
15e01e7ccc ver Mar20thv2
fixed bugs in server/main.py (_create_pywinauto_node and
  get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
Jason Lee
48aedb09a7 add wandb settings, remember to set WANDB_KEY 2024-03-17 22:30:29 +08:00
rhythmcao
da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang
c39926fc57 Merge branch 'main' into zdy 2024-02-15 22:27:10 +08:00
Timothyxxx
fdb5655c89 Update chrome examples 2024-02-08 13:49:29 +08:00
Timothyxxx
e07a3d52ce Merge remote-tracking branch 'origin/main'
# Conflicts:
#	mm_agents/gpt_4v_agent.py
2024-02-02 14:37:23 +08:00
Timothyxxx
068c6f5769 122324154 2024-02-02 14:36:53 +08:00
David Chang
c46fcbfcbe ver Feb2ndv3
working on human eval for multi_apps
2024-02-02 09:30:10 +08:00
David Chang
5ee9621e0d ver Feb2nd
human evaluation as non-expert on chrome tasks
2024-02-02 05:13:12 +08:00
Timothyxxx
d65b6994d3 Fix minor bugs of multiple apps examples 2024-01-31 19:40:41 +08:00
BlankCheng
7d2d8c855e Merge main 2024-01-29 21:51:26 +08:00
BlankCheng
284d6fb379 Add human operation time log 2024-01-29 21:42:16 +08:00
Timothyxxx
6952b45de4 Improve on agent and tasks configs 2024-01-26 23:30:04 +08:00
tsuky_chen
932b73c67d load libreoffice writer eval -batch 2 2024-01-26 02:15:42 +08:00
tsuky_chen
3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
rhythmcao
5ac80dc309 update examples 2024-01-26 00:53:35 +08:00
rhythmcao
5a5309c0fd add multi-app example, fix googledrive functions 2024-01-25 20:30:54 +08:00
Timothyxxx
b9ae4174b1 Fix OS examples annotated by Yitao 2024-01-25 19:57:32 +08:00
rhythmcao
f194fb8d75 add multi_apps; update chrome utilities 2024-01-25 13:53:19 +08:00
David Chang
ffc4c32bac ver Jan17th
updated the existing task configs
2024-01-17 17:27:08 +08:00
Timothyxxx
186bf2e97c Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent 2024-01-16 16:43:32 +08:00
Timothyxxx
1141232d80 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/controllers/setup.py
2024-01-15 13:51:11 +08:00
Timothyxxx
24169a65d0 Accomplish the exp scripts v1; Add video recording and trajectory recording of desktop agent; Fix minor bugs 2024-01-15 13:49:48 +08:00
David Chang
fc289a3427 Merge branch 'main' into zdy 2024-01-15 12:12:05 +08:00
rhythmcao
69b0514f99 fix error in pyautogui.typewrite() 2024-01-14 23:53:31 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00