Yuan Mengqi
5ca516ac7a
add uitars agent code ( #265 )
2025-07-17 18:17:13 +08:00
Xinyuan Wang
24fbad9015
Merge pull request #264 from yuanmengqi/main
...
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi
bb8b0b2582
Improve the parallel logic
2025-07-17 04:19:44 +00:00
yuanmengqi
9eeabfc52d
Improve the parallel logic
2025-07-17 04:14:20 +00:00
Xinyuan Wang
0f2655249c
Wxy/opencua ( #260 )
...
* OpenCUA Agent code base
* update url
* debug, modify url input
* debug opencua
* show result
* debug agent history overlap
* modify opencua agent; add comment lines
2025-07-16 17:53:12 +08:00
yuanmengqi
175b4b46c2
Merge remote-tracking branch 'upstream/main' into fix_chrome
2025-07-15 14:50:48 +00:00
Yuan Mengqi
af47ed8fb1
fix infeasible&chrome tasks ( #258 )
...
* fix chrome
* fix: fix proxy setup
* feat&fix: add proxy support in setup and remove hardcoded proxy from example
* fix tasks
* fix chrome finished
* fix
* clean chrome_fix code
* clean chrome_fix code
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix multiapps
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix some multi_apps tasks
* fix some multi_apps tasks
* fix password&resolution
* fix password&resolution
* Improve code logic for password & resolution
* edit
* Merge branch 'main' into fix_chrome
* fix chrome tasks
* Merge branch 'fix_chrome'
* fix insensible&chrome tasks
---------
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk >
2025-07-15 13:02:42 +08:00
yuanmengqi
08b4cf2c2f
fix infeasible&chome tasks
2025-07-15 02:09:40 +00:00
Xinyuan Wang
db83b9cb2c
Wxy/opencua ( #256 )
...
* OpenCUA Agent code base
* update url
* debug, modify url input
2025-07-14 20:26:39 +08:00
Zilong Zhou
74b7c189af
Feat/monitor ( #254 )
...
* feat: add claude support
* feat: add script for end-to-end evaluation with logging and task distribution
* feat&fix: add tool result handling and update model default in evaluation script
* chore: remove run_test_env.py script
* feat&fix: implement action parsing for tool calls and update default action space
* fix: update text formatting in action parsing and replace logger import
* feat&fix: implement action parsing for tool calls and add screen size handling
* feat: add setup instructions for Anthropic API integration
* feat: add notice about image size limitations for Anthropic API
* Delete test_env/logger.py
* Delete test_env/utils.py
* fix: update logger usage to use global logger and improve error handling
* feat&fix: add configuration management API endpoints and update UI for configuration selection
* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness
* feat&fix: add configuration toggle button in UI and improve task loading performance
* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
Zilong Zhou
349f2fd9fe
Feat/claude cua support ( #253 )
...
* feat: add claude support
* feat: add script for end-to-end evaluation with logging and task distribution
* feat&fix: add tool result handling and update model default in evaluation script
* chore: remove run_test_env.py script
* feat&fix: implement action parsing for tool calls and update default action space
* fix: update text formatting in action parsing and replace logger import
* feat&fix: implement action parsing for tool calls and add screen size handling
* feat: add setup instructions for Anthropic API integration
* feat: add notice about image size limitations for Anthropic API
* Delete test_env/logger.py
* Delete test_env/utils.py
2025-07-13 21:10:49 +08:00
Yuan Mengqi
38a30734a6
Improve code logic for password & resolution ( #252 )
...
* fix chrome
* fix: fix proxy setup
* feat&fix: add proxy support in setup and remove hardcoded proxy from example
* fix tasks
* fix chrome finished
* fix
* clean chrome_fix code
* clean chrome_fix code
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix multiapps
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix some multi_apps tasks
* fix some multi_apps tasks
* fix password&resolution
* fix password&resolution
* Improve code logic for password & resolution
* edit
* Merge branch 'main' into fix_chrome
* fix chrome tasks
---------
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk >
2025-07-13 21:04:07 +08:00
Yuan Mengqi
27319ce1e3
fix password&resolution ( #251 )
...
* fix chrome
* fix: fix proxy setup
* feat&fix: add proxy support in setup and remove hardcoded proxy from example
* fix tasks
* fix chrome finished
* fix
* clean chrome_fix code
* clean chrome_fix code
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix multiapps
* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774
* fix some multi_apps tasks
* fix some multi_apps tasks
* fix password&resolution
* fix password&resolution
---------
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk >
2025-07-13 00:25:37 +08:00
Yan98
4e3446d6fe
Fix Name ( #249 )
...
* init
* init
2025-07-11 00:15:46 +08:00
Yan98
0a5058342d
init ( #246 )
2025-07-10 00:29:42 +08:00
yuanmengqi
7315aec6e6
clean code
2025-06-10 04:06:54 +00:00
yuanmengqi
3da32fe5cf
update operator prompt
2025-06-10 02:35:53 +00:00
yuanmengqi
692486f8e7
add GDrive guideline
2025-06-09 14:59:47 +00:00
yuanmengqi
aee1207fff
fix error
2025-06-09 04:20:59 +00:00
yuanmengqi
d8872634ee
edit prompt
2025-06-08 03:59:31 +00:00
yuanmengqi
c57b1d4e7a
eval update
2025-06-07 13:19:22 +00:00
yuanmengqi
a146c1e0b7
edit prompt
2025-06-07 05:21:04 +00:00
yuanmengqi
64177045b5
Merge remote-tracking branch 'upstream/feat/aws-provider-support'
2025-06-06 10:22:56 +00:00
Timothyxxx
8373f7cff2
refactor: remove AWSVMManagerWithProxy and integrate proxy support directly into AWSVMManager for streamlined VM allocation;
...
minor fix on openai_cua_agent
2025-06-06 02:55:50 +08:00
yuanmengqi
a6300e05c9
Merge remote-tracking branch 'upstream/feat/aws-provider-support'
2025-06-05 13:31:42 +00:00
adlsdztony
3b1540ed23
feat&fix: enhance task status handling and update logging configuration
2025-06-05 09:33:36 +00:00
yuanmengqi
b211df3385
fix timeout
2025-06-04 10:23:45 +00:00
yuanmengqi
98a810d31e
edit operator
2025-06-02 12:11:25 +00:00
yuanmengqi
228849ab03
add openai cua agent
2025-05-31 11:22:38 +00:00
uvheart
a845824f06
add azure_gpt_4o ( #197 )
2025-05-23 03:57:42 +08:00
Shihao Liang
119bef25e2
Dev/uitars 15 ( #194 )
...
* debug uitars1.0, add uitars1.5
* update pyautogui parser
* modify function name
* update parser
* update prompt
* FIX: bug in ui tars
2025-05-19 17:15:17 +08:00
MillanK
51f5ddea04
Add Jedi agent implementation to mm_agents ( #192 )
...
* feat: implement Jedi agent
* chore: code clean
2025-05-10 19:55:33 +08:00
Thomas Kuntz
5678b510d7
fix: Invalid escape sequence in prompts ( #191 )
...
Fixes the warning: SyntaxWarning: invalid escape sequence '\`'
2025-05-10 18:19:07 +08:00
Thomas Kuntz
7d88283f8a
feat: Support newer Gemini models ( #188 )
2025-05-06 16:04:30 +08:00
Shihao Liang
b92c716df7
Dev/uitars 15 ( #181 )
...
* debug uitars1.0, add uitars1.5
* update pyautogui parser
* modify function name
* update parser
* update prompt
2025-04-21 13:44:08 +08:00
Shihao Liang
bd2e980666
Dev/uitars 15 ( #178 )
...
* debug uitars1.0, add uitars1.5
* update pyautogui parser
* modify function name
* update parser
2025-04-17 18:49:21 +08:00
Shiqian Su
c4d818c5cf
Update aguvis_agent.py ( #141 )
...
Fix Aguvis prompt bug
2025-02-28 16:48:41 +08:00
Shihao Liang
339a13e1d5
Dev/uitars ( #132 )
...
* init uitars
* change agent class name
* FIX: return bug in agent predict
2025-02-14 11:17:37 +08:00
Shihao Liang
0bc1e08440
Dev/uitars ( #129 )
...
* init uitars
* change agent class name
2025-02-08 12:49:40 +08:00
Timothyxxx
2c8e8a58f6
Fix minor bug caused by new logging feat in aguvis agent traj
2024-12-05 15:45:09 +08:00
Junli Wang
1503eb3994
Finish Aguvis eval on OSWorld ( #107 )
...
* Initialize Aguvis eval on OSWorld
* Debug
* Debug
* v1, internal version
* Add experiments script
* Fix minor bugs
* Update new endpoint
* Update ip
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Fix model name
* Fix docker close issues; update prompting
* Fix missed
* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'
* Fix server and chromium ports in setup
* Revert and add missed dependency
* Add VLC port for docker
* Update
* Aguvis Grounding
* Add Aguvis as planner
* fix parse bug
* fix pause
* fix planner prompt
* Aguvis Grounding
* fix
* fix
* fix
* add logger for each example
* Modify Aguvis Planner Prompts
* fix logger setup
* fix absolute coordinates
* Finish Aguvis Evaluation on OSWorld
* Merge origin/main into junli/aguvis
* Remove screenshot
---------
Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local >
Co-authored-by: Timothyxxx <384084775@qq.com >
Co-authored-by: FredWuCZ <fredwucz@outlook.com >
2024-11-24 16:43:25 +08:00
Tianbao Xie
20442244fa
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld ( #98 )
...
* Initialize Aguvis eval on OSWorld
* Debug
* Debug
* v1, internal version
* Add experiments script
* Fix minor bugs
* Update new endpoint
* Update ip
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Fix model name
* Fix docker close issues; update prompting
* Fix missed
* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'
* Fix server and chromium ports in setup
* Revert and add missed dependency
* Add VLC port for docker
* Update
* Clean
---------
Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local >
Co-authored-by: FredWuCZ <fredwucz@outlook.com >
2024-11-11 12:36:16 +08:00
Tianbao Xie
a156f8a3d6
Modify the namespace of a11y tree ( #62 )
2024-07-25 20:20:34 +08:00
Timothyxxx
cfc5500a8a
Merge remote-tracking branch 'origin/main'
2024-05-21 21:08:43 +08:00
Timothyxxx
306dcbda71
Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc.
2024-05-21 21:08:22 +08:00
Timothyxxx
5568dfd141
Handling more exceptions; Fix hyperparameter passing
2024-05-20 17:22:07 +08:00
Timothyxxx
f9594e476e
Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation
2024-05-20 00:47:43 +08:00
Timothyxxx
a500f59419
Add Llama3-70B Support (from Groq)
2024-05-09 02:04:58 +08:00
Timothyxxx
54905380e6
Add Llama3-70B Support (from Groq)
2024-05-09 02:04:02 +08:00
Timothyxxx
97b567a287
Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py
2024-04-26 13:32:41 +08:00