Commit Graph

188 Commits

Author SHA1 Message Date
alexandruilie7
5463d3bb89 uipath v2 (#413)
* submission v2

* small updates
2026-01-09 08:47:20 +08:00
蘑菇先生
5ef8bdfa35 EvoCUA Update (2025.01.05) (#412)
* evocua init

* setup max_token

* evocua update

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2026-01-05 16:14:53 +08:00
Bowen Yang
662826f57e fix(os_symphony):prompt (#402)
* add_os_symphony

* fix(os_symphony)

* fix(os_symphony):prompt

---------

Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-29 20:45:36 +08:00
xuetf
410ec63a89 Add EvoCUA Support (#401)
* evocua init

* setup max_token

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-23 20:46:23 +08:00
Bowen Yang
f593f35b1c add_os_symphony (#399) 2025-12-23 14:30:44 +08:00
Ubuntu
41477a9c40 Update: seed agent 2025-12-15 11:45:57 +00:00
Ubuntu
78433ecfcf Add agent: seed agent 2025-12-12 05:35:20 +00:00
Meshal Nayim
9540454b0a Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388) 2025-12-09 15:59:25 +08:00
Qichen Fu
903ed36715 Add Claude Sonnet 4.5 support and improve action handling (#362)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-14 13:54:32 +08:00
Subash Shibu
3167339e45 Add hosted GBOX agent for OSWorld evaluation (#376) 2025-11-13 13:13:31 +08:00
Pengxiang-Li
00b6468eb7 feat/dart_gui (#371) 2025-11-07 21:50:01 +08:00
Atharva Gundawar
9f97535ef9 oswrold agent wrapper for trained v7 (#360) 2025-10-18 02:29:15 +08:00
ludunjie.ldj
afd29115da support aliyun eval of qwen3vl 2025-10-16 16:20:54 +08:00
Dunjie Lu
55372c4432 Fix API base URLs for OpenAI and DashScope
Updated the base URLs for OpenAI and DashScope API calls.
2025-10-14 12:57:00 +08:00
Dunjie Lu
d25464c203 Djlu/qwen3vl dash (#356)
* support dashscopoe sdk to call qwen3-vl-plus

* support dashscopoe sdk to call qwen3-vl-plus

---------

Co-authored-by: Timothyxxx <Timothyxxx@users.noreply.github.com>
2025-10-13 16:31:06 +08:00
Xinyuan Wang
f9e9273b3b OpenCUA-72B (#354)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* OpenCUA-72B

* update password

* update

* update

* update opencua72b agent

* change provider ip

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-10-13 10:39:33 +08:00
Yan98
ddb8372a6c init public release (#350) 2025-10-06 22:16:31 +08:00
eun2ce
5eb5417188 fix #210: add a11y_tree support to UITARSAgent (#346) 2025-09-26 18:25:28 +08:00
Yanxiao Zhao
a4f8fe2f00 Add autoglm-os-9b-v (#344)
* update for autoglm-v

* Update run_autoglm.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
2025-09-24 19:43:28 +08:00
alexandruilie7
f59cf00cae Add ui agent (#343)
* add uipath agent

* readme update
2025-09-24 19:42:46 +08:00
Long Chen
088e68798c update aworldguiAgent code (#342) 2025-09-23 16:50:29 +08:00
molanhand
7213eca069 support mano agent (#338)
Co-authored-by: Fei Hu <molanhand@users.noreply.github.com>
2025-09-16 18:10:29 +08:00
Dunjie Lu
b012301609 support qwen3vl agent (#336)
Co-authored-by: root <ludunjie1219@github.com>
2025-09-15 16:04:29 +08:00
Hiroid
3a4b67304f Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies.
* Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic.
* Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis.
* Added a **tools module** containing utility functions and tool configurations to improve code reusability.
* Updated the **README** and documentation with usage examples and module descriptions.

These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience.

Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2025-09-08 16:07:21 +09:00
Howie
756e006af6 add support for mobile agent v3 (#328)
* add support for mobile agent v3

* add mobile_agent

* add support for mobile agent v3
2025-08-31 22:58:41 +08:00
Howie
3344abd641 Add support for GUI-Owl agent (#318)
* add run_multienv_owl.py

* add owl_agent.py
2025-08-27 18:03:39 +08:00
Timothyxxx
15d9ddb612 update coact: add autogen/cache 2025-08-21 19:03:35 +00:00
Adam Yanxiao Zhao
aa05f6cc26 Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
Timothyxxx
7fb5860da0 feat: enhance run_coact.py and related agents with improved task handling and configuration
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
2025-08-13 09:04:09 +00:00
Timothyxxx
d2ae0f697d feat: enhance AnthropicAgent with start_coordinate handling and modifier key support
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
2025-08-12 05:34:18 +00:00
yuanmengqi
84f407afdd feat: enhance run_coact.py with logging and configuration options
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
2025-07-31 05:47:58 +00:00
Yuan Mengqi
239dd37d2e clean claude run code (#293)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json

* merge claude code

* clean code claude run

* clean code claude run

* clean code claude run
2025-07-31 12:09:08 +08:00
Linxin Song
b968155757 CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang
862d704b8c Wxy/opencua (#290)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717

* update detail

* add system password to system prompt

* add running command
2025-07-31 08:53:49 +08:00
Xinyuan Wang
3d32556085 Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
MillanK0817
4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00
yuanmengqi
0f00788c4d feat: add run_multienv_o3.py script for multi-environment evaluation
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
yuanmengqi
523d553e88 feat: add client password argument to multiple agents and scripts
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
2025-07-27 16:11:23 +00:00
yuanmengqi
b25854edba feat: introduce DummyAgent class for enhanced coordinate handling
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
2025-07-26 08:26:23 +00:00
yuanmengqi
73caf53880 delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils 2025-07-26 07:28:31 +00:00
yuanmengqi
f5595df71c delete: remove gat1_agent.py file 2025-07-25 07:11:55 +00:00
张逸群
bf78b6d05e Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
Yan98
2f3a6c48f6 Fix Typos (#275)
* init

* init

* fix typo
2025-07-24 00:06:04 +08:00
yuanmengqi
82c3cdd590 feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management
- Introduced signal handling for graceful shutdown of environments and processes.
- Enhanced logging configuration to support dynamic log levels and structured output.
- Updated argument parsing to include new parameters for model selection and task execution.
- Refactored task distribution logic to streamline environment task management.
- Improved error handling during task execution and environment cleanup.
- Adjusted Qwen25VLAgent initialization to support new model and thought prefix options.
- Reduced max tries for LLM calls to optimize performance.
2025-07-22 19:46:42 +00:00
Yuan Mengqi
0a37cccd53 update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
Dunjie Lu
53fb96298a support_qwen25vl (#276)
Co-authored-by: root <ludunjie1219@github.com>
2025-07-22 16:33:03 +08:00
Xinyuan Wang
e10dd9267c Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Yuan Mengqi
5ca516ac7a add uitars agent code (#265) 2025-07-17 18:17:13 +08:00
Xinyuan Wang
24fbad9015 Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi
bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00