Commit Graph

153 Commits

Author SHA1 Message Date
MillanK0817
4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00
yuanmengqi
0f00788c4d feat: add run_multienv_o3.py script for multi-environment evaluation
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
yuanmengqi
523d553e88 feat: add client password argument to multiple agents and scripts
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
2025-07-27 16:11:23 +00:00
yuanmengqi
b25854edba feat: introduce DummyAgent class for enhanced coordinate handling
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
2025-07-26 08:26:23 +00:00
yuanmengqi
73caf53880 delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils 2025-07-26 07:28:31 +00:00
yuanmengqi
f5595df71c delete: remove gat1_agent.py file 2025-07-25 07:11:55 +00:00
张逸群
bf78b6d05e Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
Yan98
2f3a6c48f6 Fix Typos (#275)
* init

* init

* fix typo
2025-07-24 00:06:04 +08:00
yuanmengqi
82c3cdd590 feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management
- Introduced signal handling for graceful shutdown of environments and processes.
- Enhanced logging configuration to support dynamic log levels and structured output.
- Updated argument parsing to include new parameters for model selection and task execution.
- Refactored task distribution logic to streamline environment task management.
- Improved error handling during task execution and environment cleanup.
- Adjusted Qwen25VLAgent initialization to support new model and thought prefix options.
- Reduced max tries for LLM calls to optimize performance.
2025-07-22 19:46:42 +00:00
Yuan Mengqi
0a37cccd53 update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
Dunjie Lu
53fb96298a support_qwen25vl (#276)
Co-authored-by: root <ludunjie1219@github.com>
2025-07-22 16:33:03 +08:00
Xinyuan Wang
e10dd9267c Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Yuan Mengqi
5ca516ac7a add uitars agent code (#265) 2025-07-17 18:17:13 +08:00
Xinyuan Wang
24fbad9015 Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi
bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00
yuanmengqi
9eeabfc52d Improve the parallel logic 2025-07-17 04:14:20 +00:00
Xinyuan Wang
0f2655249c Wxy/opencua (#260)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines
2025-07-16 17:53:12 +08:00
yuanmengqi
175b4b46c2 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-15 14:50:48 +00:00
Yuan Mengqi
af47ed8fb1 fix infeasible&chrome tasks (#258)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

* Merge branch 'fix_chrome'

* fix insensible&chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-15 13:02:42 +08:00
yuanmengqi
08b4cf2c2f fix infeasible&chome tasks 2025-07-15 02:09:40 +00:00
Xinyuan Wang
db83b9cb2c Wxy/opencua (#256)
* OpenCUA Agent code base

* update url

* debug, modify url input
2025-07-14 20:26:39 +08:00
Zilong Zhou
74b7c189af Feat/monitor (#254)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
Zilong Zhou
349f2fd9fe Feat/claude cua support (#253)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py
2025-07-13 21:10:49 +08:00
Yuan Mengqi
38a30734a6 Improve code logic for password & resolution (#252)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 21:04:07 +08:00
Yuan Mengqi
27319ce1e3 fix password&resolution (#251)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 00:25:37 +08:00
Yan98
4e3446d6fe Fix Name (#249)
* init

* init
2025-07-11 00:15:46 +08:00
Yan98
0a5058342d init (#246) 2025-07-10 00:29:42 +08:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
3da32fe5cf update operator prompt 2025-06-10 02:35:53 +00:00
yuanmengqi
692486f8e7 add GDrive guideline 2025-06-09 14:59:47 +00:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
yuanmengqi
d8872634ee edit prompt 2025-06-08 03:59:31 +00:00
yuanmengqi
c57b1d4e7a eval update 2025-06-07 13:19:22 +00:00
yuanmengqi
a146c1e0b7 edit prompt 2025-06-07 05:21:04 +00:00
yuanmengqi
64177045b5 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-06 10:22:56 +00:00
Timothyxxx
8373f7cff2 refactor: remove AWSVMManagerWithProxy and integrate proxy support directly into AWSVMManager for streamlined VM allocation;
minor fix on openai_cua_agent
2025-06-06 02:55:50 +08:00
yuanmengqi
a6300e05c9 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-05 13:31:42 +00:00
adlsdztony
3b1540ed23 feat&fix: enhance task status handling and update logging configuration 2025-06-05 09:33:36 +00:00
yuanmengqi
b211df3385 fix timeout 2025-06-04 10:23:45 +00:00
yuanmengqi
98a810d31e edit operator 2025-06-02 12:11:25 +00:00
yuanmengqi
228849ab03 add openai cua agent 2025-05-31 11:22:38 +00:00
uvheart
a845824f06 add azure_gpt_4o (#197) 2025-05-23 03:57:42 +08:00
Shihao Liang
119bef25e2 Dev/uitars 15 (#194)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser

* update prompt

* FIX: bug in ui tars
2025-05-19 17:15:17 +08:00
MillanK
51f5ddea04 Add Jedi agent implementation to mm_agents (#192)
* feat: implement Jedi agent

* chore: code clean
2025-05-10 19:55:33 +08:00
Thomas Kuntz
5678b510d7 fix: Invalid escape sequence in prompts (#191)
Fixes the warning: SyntaxWarning: invalid escape sequence '\`'
2025-05-10 18:19:07 +08:00
Thomas Kuntz
7d88283f8a feat: Support newer Gemini models (#188) 2025-05-06 16:04:30 +08:00
Shihao Liang
b92c716df7 Dev/uitars 15 (#181)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser

* update prompt
2025-04-21 13:44:08 +08:00
Shihao Liang
bd2e980666 Dev/uitars 15 (#178)
* debug uitars1.0, add uitars1.5

* update pyautogui parser

* modify function name

* update parser
2025-04-17 18:49:21 +08:00
Shiqian Su
c4d818c5cf Update aguvis_agent.py (#141)
Fix Aguvis prompt bug
2025-02-28 16:48:41 +08:00
Shihao Liang
339a13e1d5 Dev/uitars (#132)
* init uitars

* change agent class name

* FIX: return bug in agent predict
2025-02-14 11:17:37 +08:00