sci-gui-agent-benchmark

Author	SHA1	Message	Date
alexandruilie7	5463d3bb89	uipath v2 (#413 ) * submission v2 * small updates	2026-01-09 08:47:20 +08:00
蘑菇先生	5ef8bdfa35	EvoCUA Update (2025.01.05) (#412 ) * evocua init * setup max_token * evocua update --------- Co-authored-by: xuetaofeng <xuetaofeng@meituan.com> Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>	2026-01-05 16:14:53 +08:00
Bowen Yang	662826f57e	fix(os_symphony):prompt (#402 ) * add_os_symphony * fix(os_symphony) * fix(os_symphony):prompt --------- Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>	2025-12-29 20:45:36 +08:00
xuetf	410ec63a89	Add EvoCUA Support (#401 ) * evocua init * setup max_token --------- Co-authored-by: xuetaofeng <xuetaofeng@meituan.com> Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>	2025-12-23 20:46:23 +08:00
Bowen Yang	f593f35b1c	add_os_symphony (#399 )	2025-12-23 14:30:44 +08:00
Ubuntu	41477a9c40	Update: seed agent	2025-12-15 11:45:57 +00:00
Ubuntu	78433ecfcf	Add agent: seed agent	2025-12-12 05:35:20 +00:00
Meshal Nayim	9540454b0a	Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388 )	2025-12-09 15:59:25 +08:00
Qichen Fu	903ed36715	Add Claude Sonnet 4.5 support and improve action handling (#362 ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-11-14 13:54:32 +08:00
Subash Shibu	3167339e45	Add hosted GBOX agent for OSWorld evaluation (#376 )	2025-11-13 13:13:31 +08:00
Pengxiang-Li	00b6468eb7	feat/dart_gui (#371 )	2025-11-07 21:50:01 +08:00
Atharva Gundawar	9f97535ef9	oswrold agent wrapper for trained v7 (#360 )	2025-10-18 02:29:15 +08:00
ludunjie.ldj	afd29115da	support aliyun eval of qwen3vl	2025-10-16 16:20:54 +08:00
Dunjie Lu	55372c4432	Fix API base URLs for OpenAI and DashScope Updated the base URLs for OpenAI and DashScope API calls.	2025-10-14 12:57:00 +08:00
Dunjie Lu	d25464c203	Djlu/qwen3vl dash (#356 ) * support dashscopoe sdk to call qwen3-vl-plus * support dashscopoe sdk to call qwen3-vl-plus --------- Co-authored-by: Timothyxxx <Timothyxxx@users.noreply.github.com>	2025-10-13 16:31:06 +08:00
Xinyuan Wang	f9e9273b3b	OpenCUA-72B (#354 ) * use aws pub ip * os task fix: set the default dim screen time to be 300s * OpenCUA-72B * update password * update * update * update opencua72b agent * change provider ip --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>	2025-10-13 10:39:33 +08:00
Yan98	ddb8372a6c	init public release (#350 )	2025-10-06 22:16:31 +08:00
eun2ce	5eb5417188	fix #210 : add a11y_tree support to UITARSAgent (#346 )	2025-09-26 18:25:28 +08:00
Yanxiao Zhao	a4f8fe2f00	Add autoglm-os-9b-v (#344 ) * update for autoglm-v * Update run_autoglm.py --------- Co-authored-by: hanyullai <hanyullai@outlook.com>	2025-09-24 19:43:28 +08:00
alexandruilie7	f59cf00cae	Add ui agent (#343 ) * add uipath agent * readme update	2025-09-24 19:42:46 +08:00
Long Chen	088e68798c	update aworldguiAgent code (#342 )	2025-09-23 16:50:29 +08:00
molanhand	7213eca069	support mano agent (#338 ) Co-authored-by: Fei Hu <molanhand@users.noreply.github.com>	2025-09-16 18:10:29 +08:00
Dunjie Lu	b012301609	support qwen3vl agent (#336 ) Co-authored-by: root <ludunjie1219@github.com>	2025-09-15 16:04:29 +08:00
Hiroid	3a4b67304f	Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333 ) * Added a pyproject.toml file to define project metadata and dependencies. * Added run\_maestro.py and osworld\_run\_maestro.py to provide the main execution logic. * Introduced multiple new modules, including Evaluator, Controller, Manager, and Sub-Worker, supporting task planning, state management, and data analysis. * Added a tools module containing utility functions and tool configurations to improve code reusability. * Updated the README and documentation with usage examples and module descriptions. These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience. Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>	2025-09-08 16:07:21 +09:00
Howie	756e006af6	add support for mobile agent v3 (#328 ) * add support for mobile agent v3 * add mobile_agent * add support for mobile agent v3	2025-08-31 22:58:41 +08:00
Howie	3344abd641	Add support for GUI-Owl agent (#318 ) * add run_multienv_owl.py * add owl_agent.py	2025-08-27 18:03:39 +08:00
Timothyxxx	15d9ddb612	update coact: add autogen/cache	2025-08-21 19:03:35 +00:00
Adam Yanxiao Zhao	aa05f6cc26	Add AutoGLM-OS agent (#309 ) * autoglm-os initialize * clean code * chore: use proxy for download setup * feat(autoglm-os): add parameter to toggle images * fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel * update * add client_password * update multienv * fix * fix prompt * fix prompt * fix prompt * fix sys prompt * feat: use proxy in file evaluator * fix client_password * fix note_prompt * fix autoglm agent cmd type * fix * revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57 * feat(autoglm): setup tools * fix(autoglm): remove second time of get a11y tree * add osworld server restart * Revert "add osworld server restart" This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d. * fix _launch_setup * fix autoglm agent tools & xml tree * fix desktop_env * fix bug for tool name capitalization * fix: always use proxy for setup download * add fail after exceeding max turns * fix(autoglm): avoid adding image to message when screenshot is empty * fix maximize_window * fix maximize_window * fix maximize_window * fix import browsertools module bug * fix task proxy config bug * restore setup * refactor desktop env * restore image in provider * restore file.py * refactor desktop_env * quick fix * refactor desktop_env.step * fix our env reset * add max truns constraint * clean run script * clean lib_run_single.py --------- Co-authored-by: hanyullai <hanyullai@outlook.com> Co-authored-by: JingBh <jingbohao@yeah.net>	2025-08-17 12:08:40 +08:00
Timothyxxx	7fb5860da0	feat: enhance run_coact.py and related agents with improved task handling and configuration - Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements. - Modified configuration parameters for provider name and client password for better security and flexibility. - Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling. - Adjusted coding_agent.py to ensure proper verification of results before saving changes. - Improved CUA agent prompts to maintain application state and handle user instructions more effectively. - Ensured existing code logic remains unchanged while enhancing functionality and usability.	2025-08-13 09:04:09 +00:00
Timothyxxx	d2ae0f697d	feat: enhance AnthropicAgent with start_coordinate handling and modifier key support - Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point. - Implemented validation for start_coordinate to ensure it is a tuple of two integers. - Enhanced click actions to handle modifier keys, allowing for more complex interactions. - Ensured existing code logic remains unchanged while improving functionality and usability.	2025-08-12 05:34:18 +00:00
yuanmengqi	84f407afdd	feat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.	2025-07-31 05:47:58 +00:00
Yuan Mengqi	239dd37d2e	clean claude run code (#293 ) * add uitars agent code * improve claude * improve claude * improve claude * improve claude * improve claude * add nogdrive json * merge claude code * clean code claude run * clean code claude run * clean code claude run	2025-07-31 12:09:08 +08:00
Linxin Song	b968155757	CoACT initialize (#292 )	2025-07-31 10:35:20 +08:00
Xinyuan Wang	862d704b8c	Wxy/opencua (#290 ) * OpenCUA Agent code base * update url * debug, modify url input * debug opencua * show result * debug agent history overlap * modify opencua agent; add comment lines * update parallel; clean code; use sleep 3s * ui-tars-0717 * update detail * add system password to system prompt * add running command	2025-07-31 08:53:49 +08:00
Xinyuan Wang	3d32556085	Uitars/dev (#291 ) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>	2025-07-31 08:52:27 +08:00
MillanK0817	4ae9d41da4	feat: update jedi agent with support for o3 as planner	2025-07-30 14:06:37 +08:00
yuanmengqi	0f00788c4d	feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.	2025-07-27 16:47:24 +00:00
yuanmengqi	523d553e88	feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.	2025-07-27 16:11:23 +00:00
yuanmengqi	b25854edba	feat: introduce DummyAgent class for enhanced coordinate handling - Added DummyAgent class to facilitate coordinate generation and action assignment. - Updated GTA1Agent to utilize DummyAgent for improved planning and execution. - Increased max_steps and N_SEQ parameters for better performance. - Enhanced logging for planning and execution processes. - Maintained existing logic while integrating new functionality.	2025-07-26 08:26:23 +00:00
yuanmengqi	73caf53880	delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils	2025-07-26 07:28:31 +00:00
yuanmengqi	f5595df71c	delete: remove gat1_agent.py file	2025-07-25 07:11:55 +00:00
张逸群	bf78b6d05e	Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283 ) Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com. - Add intelligent URL handling to avoid duplicate /v1 paths - Maintain backward compatibility with default OpenAI API - Update README with configuration instructions - Non-breaking change preserving existing functionality Fixes API integration issues for users with custom OpenAI-compatible services.	2025-07-24 12:31:08 +08:00
Yan98	2f3a6c48f6	Fix Typos (#275 ) * init * init * fix typo	2025-07-24 00:06:04 +08:00
yuanmengqi	82c3cdd590	feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management - Introduced signal handling for graceful shutdown of environments and processes. - Enhanced logging configuration to support dynamic log levels and structured output. - Updated argument parsing to include new parameters for model selection and task execution. - Refactored task distribution logic to streamline environment task management. - Improved error handling during task execution and environment cleanup. - Adjusted Qwen25VLAgent initialization to support new model and thought prefix options. - Reduced max tries for LLM calls to optimize performance.	2025-07-22 19:46:42 +00:00
Yuan Mengqi	0a37cccd53	update claude (#280 ) * add uitars agent code * improve claude * improve claude * improve claude * improve claude * improve claude	2025-07-23 03:35:49 +08:00
Dunjie Lu	53fb96298a	support_qwen25vl (#276 ) Co-authored-by: root <ludunjie1219@github.com>	2025-07-22 16:33:03 +08:00
Xinyuan Wang	e10dd9267c	Wxy/opencua (#274 ) * OpenCUA Agent code base * update url * debug, modify url input * debug opencua * show result * debug agent history overlap * modify opencua agent; add comment lines * update parallel; clean code; use sleep 3s * ui-tars-0717	2025-07-20 15:52:23 +08:00
Yuan Mengqi	5ca516ac7a	add uitars agent code (#265 )	2025-07-17 18:17:13 +08:00
Xinyuan Wang	24fbad9015	Merge pull request #264 from yuanmengqi/main Improve the parallel logic	2025-07-17 12:28:48 +08:00
yuanmengqi	bb8b0b2582	Improve the parallel logic	2025-07-17 04:19:44 +00:00

1 2 3 4

188 Commits