adlsdztony
64f47d1a32
fix: fix proxy setup
2025-07-01 13:20:26 +00:00
Tianbao Xie
4e11eafd1d
Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes ( #217 )
...
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.
* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.
* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.
* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.
* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.
* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.
* Clean debug code
---------
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn >
2025-06-16 21:37:19 +08:00
yuanmengqi
2bae228803
merge upstream
2025-06-10 13:23:03 +00:00
yuanmengqi
7315aec6e6
clean code
2025-06-10 04:06:54 +00:00
adlsdztony
bfae51d74d
fix: enhance setup method with retry logic and return status
2025-06-09 16:07:13 +00:00
adlsdztony
493abdeeab
feat&refactor: add proxy setup functionality and update .gitignore for proxy config file
2025-06-07 11:24:49 +00:00
adlsdztony
71e9a1ead8
fix&refactor: improve error handling in download process and enhance start_emulator method signature
2025-06-06 09:08:14 +00:00
adlsdztony
0ca0085b18
fix: improve connection logging in SetupController
2025-06-05 11:04:33 +08:00
adlsdztony
d8ae209162
fix&refactor: improve connection retry logic and remove unnecessary wait time for AWS instance readiness
2025-05-28 13:05:32 +08:00
adlsdztony
431a762421
feat&fix: add logging for setup function calls and include snapshot name in AWS provider configuration
2025-05-26 20:37:20 +08:00
Tianbao Xie
20442244fa
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld ( #98 )
...
* Initialize Aguvis eval on OSWorld
* Debug
* Debug
* v1, internal version
* Add experiments script
* Fix minor bugs
* Update new endpoint
* Update ip
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Fix model name
* Fix docker close issues; update prompting
* Fix missed
* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'
* Fix server and chromium ports in setup
* Revert and add missed dependency
* Add VLC port for docker
* Update
* Clean
---------
Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local >
Co-authored-by: FredWuCZ <fredwucz@outlook.com >
2024-11-11 12:36:16 +08:00
Pierre Carrier
b35dc40ff4
SetupController: no server_port for chrome ( #96 )
2024-11-07 00:33:03 +08:00
HappySix
6419d707bc
Support Docker VM manager and provider ( #75 )
...
* Add docker provider framework
* Update VM download link
* Add stop container
* Update docker manager & provider
* Update
* Update
* Update provider
2024-09-28 21:10:40 +08:00
Timothyxxx
df231889c9
Fix minor bug
2024-08-04 11:35:44 +08:00
Jason Lee
fcdaf7ce0b
Update setup.py for update_browse_history function
2024-07-04 09:37:13 -05:00
Tianbao Xie
fffa8f8da6
Refactoring VMware Integration and Implementing AWS Support ( #44 )
...
* Initailize aws support
* Add README for the VM server
* Refactor OSWorld for supporting more cloud services.
* Initialize vmware and aws implementation v1, waiting for verification
* Initlize files for azure, gcp and virtualbox support
* Debug on the VMware provider
* Fix on aws interface mapping
* Fix instance type
* Refactor
* Clean
* hk region; debug
* Fix lock
* Remove print
* Remove key_name requirements when allocating aws vm
* Clean README
---------
Co-authored-by: XinyuanWangCS <xywang626@gmail.com >
2024-06-15 20:52:29 +08:00
rhythmcao
c121869219
fix a small bug in computer_13 action space
2024-06-11 14:22:31 +08:00
Timothyxxx
306dcbda71
Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc.
2024-05-21 21:08:22 +08:00
Timothyxxx
f9594e476e
Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation
2024-05-20 00:47:43 +08:00
Timothyxxx
97b567a287
Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py
2024-04-26 13:32:41 +08:00
Timothyxxx
9c75df5dce
Clean code; Refactor environment to pass screenshot content instead of path
2024-04-13 23:34:01 +08:00
Timothyxxx
7ca91ca8c9
Add action execution timeout for corner cases
2024-03-21 11:16:57 +08:00
David Chang
15e01e7ccc
ver Mar20thv2
...
fixed bugs in server/main.py (_create_pywinauto_node and
get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
Jason Lee
48aedb09a7
add wandb settings, remember to set WANDB_KEY
2024-03-17 22:30:29 +08:00
rhythmcao
da0dafc32c
add multi-apps 5 examples by ruisheng 2024-03-06
2024-03-06 21:20:26 +08:00
David Chang
c39926fc57
Merge branch 'main' into zdy
2024-02-15 22:27:10 +08:00
Timothyxxx
fdb5655c89
Update chrome examples
2024-02-08 13:49:29 +08:00
Timothyxxx
e07a3d52ce
Merge remote-tracking branch 'origin/main'
...
# Conflicts:
# mm_agents/gpt_4v_agent.py
2024-02-02 14:37:23 +08:00
Timothyxxx
068c6f5769
122324154
2024-02-02 14:36:53 +08:00
David Chang
c46fcbfcbe
ver Feb2ndv3
...
working on human eval for multi_apps
2024-02-02 09:30:10 +08:00
David Chang
5ee9621e0d
ver Feb2nd
...
human evaluation as non-expert on chrome tasks
2024-02-02 05:13:12 +08:00
Timothyxxx
d65b6994d3
Fix minor bugs of multiple apps examples
2024-01-31 19:40:41 +08:00
BlankCheng
7d2d8c855e
Merge main
2024-01-29 21:51:26 +08:00
BlankCheng
284d6fb379
Add human operation time log
2024-01-29 21:42:16 +08:00
Timothyxxx
6952b45de4
Improve on agent and tasks configs
2024-01-26 23:30:04 +08:00
tsuky_chen
932b73c67d
load libreoffice writer eval -batch 2
2024-01-26 02:15:42 +08:00
tsuky_chen
3e7cfa8699
load libreoffice writer eval -batch 2
2024-01-26 02:07:26 +08:00
rhythmcao
5ac80dc309
update examples
2024-01-26 00:53:35 +08:00
rhythmcao
5a5309c0fd
add multi-app example, fix googledrive functions
2024-01-25 20:30:54 +08:00
Timothyxxx
b9ae4174b1
Fix OS examples annotated by Yitao
2024-01-25 19:57:32 +08:00
rhythmcao
f194fb8d75
add multi_apps; update chrome utilities
2024-01-25 13:53:19 +08:00
David Chang
ffc4c32bac
ver Jan17th
...
updated the existing task configs
2024-01-17 17:27:08 +08:00
Timothyxxx
186bf2e97c
Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent
2024-01-16 16:43:32 +08:00
Timothyxxx
1141232d80
Merge remote-tracking branch 'origin/main'
...
# Conflicts:
# desktop_env/controllers/setup.py
2024-01-15 13:51:11 +08:00
Timothyxxx
24169a65d0
Accomplish the exp scripts v1; Add video recording and trajectory recording of desktop agent; Fix minor bugs
2024-01-15 13:49:48 +08:00
David Chang
fc289a3427
Merge branch 'main' into zdy
2024-01-15 12:12:05 +08:00
rhythmcao
69b0514f99
fix error in pyautogui.typewrite()
2024-01-14 23:53:31 +08:00
Timothyxxx
f153a4c253
Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script;
2024-01-14 23:36:19 +08:00
David Chang
59fdd9f1a2
ver Jan14th
...
setup method for Thunderbird composing tasks
2024-01-14 23:16:54 +08:00
Timothyxxx
d52b692ee5
Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc
2024-01-14 18:30:49 +08:00