Commit Graph

83 Commits

Author SHA1 Message Date
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Pierre Carrier
b35dc40ff4 SetupController: no server_port for chrome (#96) 2024-11-07 00:33:03 +08:00
HappySix
6419d707bc Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
Timothyxxx
df231889c9 Fix minor bug 2024-08-04 11:35:44 +08:00
Jason Lee
fcdaf7ce0b Update setup.py for update_browse_history function 2024-07-04 09:37:13 -05:00
Tianbao Xie
fffa8f8da6 Refactoring VMware Integration and Implementing AWS Support (#44)
* Initailize aws support

* Add README for the VM server

* Refactor OSWorld for supporting more cloud services.

* Initialize vmware and aws implementation v1, waiting for verification

* Initlize files for azure, gcp and virtualbox support

* Debug on the VMware provider

* Fix on aws interface mapping

* Fix instance type

* Refactor

* Clean

* hk region; debug

* Fix lock

* Remove print

* Remove key_name requirements when allocating aws vm

* Clean README

---------

Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
2024-06-15 20:52:29 +08:00
rhythmcao
c121869219 fix a small bug in computer_13 action space 2024-06-11 14:22:31 +08:00
Timothyxxx
306dcbda71 Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc. 2024-05-21 21:08:22 +08:00
Timothyxxx
f9594e476e Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation 2024-05-20 00:47:43 +08:00
Timothyxxx
97b567a287 Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py 2024-04-26 13:32:41 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
7ca91ca8c9 Add action execution timeout for corner cases 2024-03-21 11:16:57 +08:00
David Chang
15e01e7ccc ver Mar20thv2
fixed bugs in server/main.py (_create_pywinauto_node and
  get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
Jason Lee
48aedb09a7 add wandb settings, remember to set WANDB_KEY 2024-03-17 22:30:29 +08:00
rhythmcao
da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang
c39926fc57 Merge branch 'main' into zdy 2024-02-15 22:27:10 +08:00
Timothyxxx
fdb5655c89 Update chrome examples 2024-02-08 13:49:29 +08:00
Timothyxxx
e07a3d52ce Merge remote-tracking branch 'origin/main'
# Conflicts:
#	mm_agents/gpt_4v_agent.py
2024-02-02 14:37:23 +08:00
Timothyxxx
068c6f5769 122324154 2024-02-02 14:36:53 +08:00
David Chang
c46fcbfcbe ver Feb2ndv3
working on human eval for multi_apps
2024-02-02 09:30:10 +08:00
David Chang
5ee9621e0d ver Feb2nd
human evaluation as non-expert on chrome tasks
2024-02-02 05:13:12 +08:00
Timothyxxx
d65b6994d3 Fix minor bugs of multiple apps examples 2024-01-31 19:40:41 +08:00
BlankCheng
7d2d8c855e Merge main 2024-01-29 21:51:26 +08:00
BlankCheng
284d6fb379 Add human operation time log 2024-01-29 21:42:16 +08:00
Timothyxxx
6952b45de4 Improve on agent and tasks configs 2024-01-26 23:30:04 +08:00
tsuky_chen
932b73c67d load libreoffice writer eval -batch 2 2024-01-26 02:15:42 +08:00
tsuky_chen
3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
rhythmcao
5ac80dc309 update examples 2024-01-26 00:53:35 +08:00
rhythmcao
5a5309c0fd add multi-app example, fix googledrive functions 2024-01-25 20:30:54 +08:00
Timothyxxx
b9ae4174b1 Fix OS examples annotated by Yitao 2024-01-25 19:57:32 +08:00
rhythmcao
f194fb8d75 add multi_apps; update chrome utilities 2024-01-25 13:53:19 +08:00
David Chang
ffc4c32bac ver Jan17th
updated the existing task configs
2024-01-17 17:27:08 +08:00
Timothyxxx
186bf2e97c Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent 2024-01-16 16:43:32 +08:00
Timothyxxx
1141232d80 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/controllers/setup.py
2024-01-15 13:51:11 +08:00
Timothyxxx
24169a65d0 Accomplish the exp scripts v1; Add video recording and trajectory recording of desktop agent; Fix minor bugs 2024-01-15 13:49:48 +08:00
David Chang
fc289a3427 Merge branch 'main' into zdy 2024-01-15 12:12:05 +08:00
rhythmcao
69b0514f99 fix error in pyautogui.typewrite() 2024-01-14 23:53:31 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00
David Chang
59fdd9f1a2 ver Jan14th
setup method for Thunderbird composing tasks
2024-01-14 23:16:54 +08:00
Timothyxxx
d52b692ee5 Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc 2024-01-14 18:30:49 +08:00
Timothyxxx
2228f346a9 Fix minor bugs caused from merging in setupcontroller; Initialize vscode example loading 2024-01-14 00:51:26 +08:00
Timothyxxx
a1c3e4c294 Finish Chrome example loading v1 2024-01-13 22:56:50 +08:00
rhythmcao
d4116458ff 1. fix quote and \ characters in execute_command ; 2. add terminal output text as extra observation ; 3. move get_vm_*() to reset() 2024-01-12 18:09:05 +08:00
Timothyxxx
186df65683 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/controllers/setup.py
#	desktop_env/evaluators/metrics/utils.py
2024-01-12 17:30:15 +08:00
Timothyxxx
5a93a32958 Update on Chrome examples; Refactor on logic of controlling 2024-01-12 17:24:47 +08:00
David Chang
127a101994 Merge branch 'main' into zdy 2024-01-11 23:02:00 +08:00
Timothyxxx
820579a5a2 Make up missing getters and metrics; Update VLC scripts; Start to work on Chrome, update examples instructions 2024-01-11 21:27:40 +08:00
David Chang
27eaf2f5d5 ver Jan11th
finally set up a simple task, or which should be simple
2024-01-11 20:03:33 +08:00
Timothyxxx
287876affc Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/evaluators/getters/__init__.py
#	desktop_env/evaluators/metrics/__init__.py
#	requirements.txt
2024-01-10 23:20:49 +08:00
Timothyxxx
49ece15ac3 VLC v1 finished, improve on instructions, improve on infra 2024-01-10 23:18:30 +08:00