Commit Graph

52 Commits

Author SHA1 Message Date
uvheart
a845824f06 add azure_gpt_4o (#197) 2025-05-23 03:57:42 +08:00
Tianbao Xie
75601efc6a Update requirements.txt 2025-02-12 02:22:49 -08:00
Junli Wang
1503eb3994 Finish Aguvis eval on OSWorld (#107)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Aguvis Grounding

* Add Aguvis as planner

* fix parse bug

* fix pause

* fix planner prompt

* Aguvis Grounding

* fix

* fix

* fix

* add logger for each example

* Modify Aguvis Planner Prompts

* fix logger setup

* fix absolute coordinates

* Finish Aguvis Evaluation on OSWorld

* Merge origin/main into junli/aguvis

* Remove screenshot

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-24 16:43:25 +08:00
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Pierre Carrier
324371e78b requirements.txt: faster install on latest macOS (#86)
Prebuilt binaries are only available on latest macOS with an upgraded pandas.
2024-10-30 09:43:21 +08:00
Pierre Carrier
2b22d49c22 [completely optional] direnv+mise autosetup (#87)
Makes life a lot easier in my experience.
2024-10-30 09:43:10 +08:00
Pierre Carrier
9229c44393 requirements.txt: Python 3.12 compatibility (#82) 2024-10-24 22:46:04 +08:00
FredWuCZ
6bb27d3ddd Merge branch 'main' into docker 2024-10-02 12:18:44 +08:00
FredWuCZ
24bad80b53 Add requirements for docker 2024-09-28 22:01:06 +08:00
HappySix
6419d707bc Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
FredWuCZ
d0b37f0831 Update 2024-09-28 12:49:29 +08:00
HappySix
19106467f8 VirtualBox (#46)
* Initailize aws support

* Add README for the VM server

* Refactor OSWorld for supporting more cloud services.

* Initialize vmware and aws implementation v1, waiting for verification

* Initlize files for azure, gcp and virtualbox support

* Debug on the VMware provider

* Fix on aws interface mapping

* Fix instance type

* Refactor

* Clean

* Add Azure provider

* hk region; debug

* Fix lock

* Remove print

* Remove key_name requirements when allocating aws vm

* Clean README

* Fix reset

* Fix bugs

* Add VirtualBox and Azure providers

* Add VirtualBox OVF link

* Raise exception on macOS host

* Init RAEDME for VBox

* Update VirtualBox VM download link

* Update requirements and setup.py; Improve robustness on Windows

* Fix network adapter

* Go through on Windows machine

* Add default adapter option

* Fix minor error

---------

Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2024-06-17 22:46:04 +08:00
Timothyxxx
54905380e6 Add Llama3-70B Support (from Groq) 2024-05-09 02:04:02 +08:00
Fangyu Lei
24fbca785d Update requirements.txt 2024-04-07 18:24:08 +08:00
Fangyu Lei
e84e77563a Update requirements.txt
Add gdown
2024-04-03 23:59:18 +08:00
Fangyu Lei
866ac3fbd9 Update requirements.txt add wandb and wrapt_timeout_decorator 2024-03-18 21:43:59 +08:00
tsuky_chen
aae848196b merge 2024-03-09 18:53:27 +08:00
tsuky_chen
f4ec36bdfb fix multi apps 2024-03-09 18:48:17 +08:00
Timothyxxx
62b3b2390d Fix bugs from merging 2024-03-08 23:09:11 +08:00
Timothyxxx
fd9f6cbc59 Update requirements.txt 2024-03-07 18:00:51 +08:00
David Chang
054e016aff ver Mar6thv3
new multi_app tasks and metrics
2024-03-06 23:29:01 +08:00
David Chang
459e247736 ver Mar4thv3
some new multi_app configs
2024-03-04 23:26:22 +08:00
Jason Lee
17cd897780 add new examples for chrome 2024-02-18 22:11:16 +08:00
David Chang
9df0854469 ver Feb1stv3
rerun SoM experiment on thunderbird
2024-02-01 22:56:09 +08:00
rhythmcao
fc15a33b70 finish multi-app examples 2024-02-01 00:53:31 +08:00
Timothyxxx
0a351eefdc Merge remote-tracking branch 'origin/main' 2024-01-30 01:25:46 +08:00
Timothyxxx
1756d3b672 Fix writer examples 2024-01-30 01:25:30 +08:00
David Chang
d8a497a417 ver Jan29th
updated the position of SoM marks
2024-01-29 21:49:53 +08:00
David Chang
5a486b6b37 ver Jan27th
debugged at+screenshot implementation, no issues found
fixed a little bugs
2024-01-27 23:10:48 +08:00
rhythmcao
f194fb8d75 add multi_apps; update chrome utilities 2024-01-25 13:53:19 +08:00
David Chang
93229ce98c ver Jan22ndv3
updated style metric to compare_table
2024-01-22 23:45:15 +08:00
Timothyxxx
09f3e776ae Initialize all baselines: screenshot, a11y tree, both, SoM, SeeAct 2024-01-20 00:13:46 +08:00
Timothyxxx
493b719821 Add gemini agent implementation; Add missed requirements; Minor fix some small bugs 2024-01-15 21:58:33 +08:00
Timothyxxx
57a41a279c Resolve conflicts 2024-01-13 22:58:20 +08:00
Timothyxxx
a1c3e4c294 Finish Chrome example loading v1 2024-01-13 22:56:50 +08:00
David Chang
d4192d3d9c ver Jan12thv3
debugged
2024-01-13 00:06:11 +08:00
Timothyxxx
287876affc Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/evaluators/getters/__init__.py
#	desktop_env/evaluators/metrics/__init__.py
#	requirements.txt
2024-01-10 23:20:49 +08:00
Timothyxxx
49ece15ac3 VLC v1 finished, improve on instructions, improve on infra 2024-01-10 23:18:30 +08:00
David Chang
18cc1fc52c ver Jan10thv3
minor fixes
2024-01-10 22:23:48 +08:00
David Chang
1515b05666 ver Jan10thv2
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
David Chang
fbb4918734 ver Jan5thv2
tested correctness of merging
2024-01-05 16:08:29 +08:00
David Chang
f831aa93df ver Jan3rd
exploring impress metrics
2024-01-03 22:42:19 +08:00
David Chang
6e6ef03bc9 ver Jan2nd
calc metrics are prapared by and large
2024-01-02 21:03:57 +08:00
David Chang
a6b6022ecb ver Dec26th
evaluation metric checking result file according to rules
2023-12-26 16:46:50 +08:00
David Chang
ba77c276e6 ver Dec25thv2
implemented functions to load sparklines from xlsx
2023-12-25 20:14:03 +08:00
David Chang
82e3353f65 ver Dec25th
added cache and upload function for setup
2023-12-25 14:40:30 +08:00
Timothyxxx
2ca36109b5 Initialize evaluation protocols and examples; Implement one kind of eval; Update requirements 2023-12-12 18:10:55 +08:00
Timothyxxx
8c0525c20e Adapt for Windows os; Refine README 2023-11-27 00:29:09 +08:00
Jing Hua
a8aebf5d15 mouse and keyboard controllers for windows and linux 2023-11-08 09:22:43 +08:00
Jing Hua
b3da09a860 gym interface 2023-10-30 00:28:33 +08:00