Commit Graph

141 Commits

Author SHA1 Message Date
Yuan Mengqi
5ca516ac7a add uitars agent code (#265) 2025-07-17 18:17:13 +08:00
Xinyuan Wang
24fbad9015 Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi
bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00
yuanmengqi
9eeabfc52d Improve the parallel logic 2025-07-17 04:14:20 +00:00
Xinyuan Wang
0f2655249c Wxy/opencua (#260)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines
2025-07-16 17:53:12 +08:00
yuanmengqi
175b4b46c2 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-15 14:50:48 +00:00
Yuan Mengqi
af47ed8fb1 fix infeasible&chrome tasks (#258)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

* Merge branch 'fix_chrome'

* fix insensible&chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-15 13:02:42 +08:00
yuanmengqi
08b4cf2c2f fix infeasible&chome tasks 2025-07-15 02:09:40 +00:00
Xinyuan Wang
db83b9cb2c Wxy/opencua (#256)
* OpenCUA Agent code base

* update url

* debug, modify url input
2025-07-14 20:26:39 +08:00
Zilong Zhou
74b7c189af Feat/monitor (#254)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
Zilong Zhou
349f2fd9fe Feat/claude cua support (#253)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py
2025-07-13 21:10:49 +08:00
Yuan Mengqi
38a30734a6 Improve code logic for password & resolution (#252)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 21:04:07 +08:00
Yuan Mengqi
27319ce1e3 fix password&resolution (#251)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 00:25:37 +08:00
Yan98
4e3446d6fe Fix Name (#249)
* init

* init
2025-07-11 00:15:46 +08:00
Yan98
0a5058342d init (#246) 2025-07-10 00:29:42 +08:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
3da32fe5cf update operator prompt 2025-06-10 02:35:53 +00:00
yuanmengqi
692486f8e7 add GDrive guideline 2025-06-09 14:59:47 +00:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
yuanmengqi
d8872634ee edit prompt 2025-06-08 03:59:31 +00:00
yuanmengqi
c57b1d4e7a eval update 2025-06-07 13:19:22 +00:00
yuanmengqi
a146c1e0b7 edit prompt 2025-06-07 05:21:04 +00:00
yuanmengqi
64177045b5 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-06 10:22:56 +00:00
Timothyxxx
8373f7cff2 refactor: remove AWSVMManagerWithProxy and integrate proxy support directly into AWSVMManager for streamlined VM allocation;
minor fix on openai_cua_agent
2025-06-06 02:55:50 +08:00
yuanmengqi
a6300e05c9 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-05 13:31:42 +00:00
adlsdztony
3b1540ed23 feat&fix: enhance task status handling and update logging configuration 2025-06-05 09:33:36 +00:00
yuanmengqi
b211df3385 fix timeout 2025-06-04 10:23:45 +00:00
yuanmengqi
98a810d31e edit operator 2025-06-02 12:11:25 +00:00
yuanmengqi
228849ab03 add openai cua agent 2025-05-31 11:22:38 +00:00
uvheart
a845824f06 add azure_gpt_4o (#197) 2025-05-23 03:57:42 +08:00
Shihao Liang
119bef25e2 Dev/uitars 15 (#194)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser

* update prompt

* FIX: bug in ui tars
2025-05-19 17:15:17 +08:00
MillanK
51f5ddea04 Add Jedi agent implementation to mm_agents (#192)
* feat: implement Jedi agent

* chore: code clean
2025-05-10 19:55:33 +08:00
Thomas Kuntz
5678b510d7 fix: Invalid escape sequence in prompts (#191)
Fixes the warning: SyntaxWarning: invalid escape sequence '\`'
2025-05-10 18:19:07 +08:00
Thomas Kuntz
7d88283f8a feat: Support newer Gemini models (#188) 2025-05-06 16:04:30 +08:00
Shihao Liang
b92c716df7 Dev/uitars 15 (#181)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser

* update prompt
2025-04-21 13:44:08 +08:00
Shihao Liang
bd2e980666 Dev/uitars 15 (#178)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser
2025-04-17 18:49:21 +08:00
Shiqian Su
c4d818c5cf Update aguvis_agent.py (#141)
Fix Aguvis prompt bug
2025-02-28 16:48:41 +08:00
Shihao Liang
339a13e1d5 Dev/uitars (#132)
* init uitars

* change agent class name

* FIX: return bug in agent predict
2025-02-14 11:17:37 +08:00
Shihao Liang
0bc1e08440 Dev/uitars (#129)
* init uitars

* change agent class name
2025-02-08 12:49:40 +08:00
Timothyxxx
2c8e8a58f6 Fix minor bug caused by new logging feat in aguvis agent traj 2024-12-05 15:45:09 +08:00
Junli Wang
1503eb3994 Finish Aguvis eval on OSWorld (#107)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Aguvis Grounding

* Add Aguvis as planner

* fix parse bug

* fix pause

* fix planner prompt

* Aguvis Grounding

* fix

* fix

* fix

* add logger for each example

* Modify Aguvis Planner Prompts

* fix logger setup

* fix absolute coordinates

* Finish Aguvis Evaluation on OSWorld

* Merge origin/main into junli/aguvis

* Remove screenshot

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-24 16:43:25 +08:00
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Tianbao Xie
a156f8a3d6 Modify the namespace of a11y tree (#62) 2024-07-25 20:20:34 +08:00
Timothyxxx
cfc5500a8a Merge remote-tracking branch 'origin/main' 2024-05-21 21:08:43 +08:00
Timothyxxx
306dcbda71 Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc. 2024-05-21 21:08:22 +08:00
Timothyxxx
5568dfd141 Handling more exceptions; Fix hyperparameter passing 2024-05-20 17:22:07 +08:00
Timothyxxx
f9594e476e Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation 2024-05-20 00:47:43 +08:00
Timothyxxx
a500f59419 Add Llama3-70B Support (from Groq) 2024-05-09 02:04:58 +08:00
Timothyxxx
54905380e6 Add Llama3-70B Support (from Groq) 2024-05-09 02:04:02 +08:00
Timothyxxx
97b567a287 Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py 2024-04-26 13:32:41 +08:00