Commit Graph

1325 Commits

Author SHA1 Message Date
ZhangZuhao
dc7e46e7aa Refactor platform detection for VM image download (#337)
Sometimes the platform detection for VM image download is wrong
2025-09-15 21:00:15 +08:00
Dunjie Lu
b012301609 support qwen3vl agent (#336)
Co-authored-by: root <ludunjie1219@github.com>
2025-09-15 16:04:29 +08:00
Hiroid
a668670349 fix(maestro): Fixed the debug logging level (#334)
Co-authored-by: Liangxuan Guo <guoliangxuan@deepmatrix.com.cn>
2025-09-11 01:03:59 +08:00
Hiroid
3a4b67304f Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies.
* Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic.
* Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis.
* Added a **tools module** containing utility functions and tool configurations to improve code reusability.
* Updated the **README** and documentation with usage examples and module descriptions.

These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience.

Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2025-09-08 16:07:21 +09:00
Timothyxxx
029885e78c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-09-05 15:36:39 +00:00
Timothyxxx
640f3fcd96 Update default path_to_vm argument to None in quickstart.py for improved flexibility 2025-09-05 15:36:31 +00:00
Timothyxxx
756923beea Update instruction wording in LibreOffice Impress example to clarify text color change requirements. Address https://github.com/xlang-ai/OSWorld/issues/324 2025-09-01 23:29:47 +08:00
Timothyxxx
0c681b91e0 Fix README update 2025-09-01 15:15:50 +00:00
aneeshprasad1
8513e8c89e Add quickstart script and update README (#325)
Co-authored-by: Aneesh Prasad <aneeshprasad@Aneeshs-MacBook-Pro.local>
2025-09-01 23:14:24 +08:00
Howie
756e006af6 add support for mobile agent v3 (#328)
* add support for mobile agent v3

* add mobile_agent

* add support for mobile agent v3
2025-08-31 22:58:41 +08:00
hanyullai
54a14cbc07 fix multienv bug (#327) 2025-08-30 11:10:53 +08:00
Howie
3344abd641 Add support for GUI-Owl agent (#318)
* add run_multienv_owl.py

* add owl_agent.py
2025-08-27 18:03:39 +08:00
Timothyxxx
ef2f35de22 Add resource group ID support for Aliyun VM allocation
- Introduced ALIYUN_RESOURCE_GROUP_ID environment variable to manage resource group assignments during VM allocation.
- Updated the _allocate_vm function to include resource group ID in the request if specified.
- Modified VNC URL logging to use public IP when available, enhancing clarity in access information.
- Maintained existing code logic while improving functionality for resource management and logging.
2025-08-26 13:28:23 +08:00
Timothyxxx
4c773f6f7c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-22 23:29:21 +08:00
Timothyxxx
ebda4d8b3f Add Aliyun SDK dependencies and implement TTL configuration for ECS instances
- Added new dependencies for Aliyun ECS SDK in requirements.txt and setup.py to support instance management features.
- Introduced a new config module to handle TTL settings for ECS instances, allowing for auto-termination based on environment variables.
- Updated the manager to utilize TTL settings, including scheduling instance termination with proper error handling and logging.
- Maintained existing code logic while enhancing functionality for improved instance lifecycle management.
2025-08-22 23:28:58 +08:00
Timothyxxx
15d9ddb612 update coact: add autogen/cache 2025-08-21 19:03:35 +00:00
Timothyxxx
b14f1c7345 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-21 09:38:37 +00:00
Timothyxxx
ead564c92b Update dependencies and refactor DesktopEnv initialization
- Removed specific versioning for the 'requests' library in requirements.txt and setup.py to allow for more flexible updates.
- Refactored the DesktopEnv class to streamline the emulator initialization process, enhancing error handling and logging during startup.
- Improved retry logic for file uploads in SetupController, ensuring robust handling of network issues and providing clearer error messages.
- Maintained existing code logic while enhancing clarity and reliability in the DesktopEnv and SetupController classes.
2025-08-21 09:38:28 +00:00
Timothyxxx
b3e1c0344d Update OpenCV dependency to headless version in requirements and setup files
- Replaced 'opencv-python' with 'opencv-python-headless' in both requirements.txt and setup.py to reduce unnecessary GUI dependencies.
- Added a new .gitkeep file in the logs directory to ensure it is tracked in version control.
- Maintained existing code logic while improving dependency management.
2025-08-20 01:26:24 +08:00
Timothyxxx
492c910e94 Refactor AWS scheduler role handling in scheduler_utils.py
- Improved error handling and logging for role resolution and creation.
- Added checks to ensure the trust policy allows for AWS EventBridge Scheduler to assume the role.
- Implemented retry logic for scheduling EC2 termination to handle IAM eventual consistency.
- Maintained existing code logic while enhancing robustness and clarity in role management.
2025-08-18 17:57:31 +00:00
Timothyxxx
3a96fd5046 Add TTL configuration for AWS instance management
- Introduced a new config module to manage TTL settings for EC2 instances, allowing for auto-termination based on environment variables.
- Updated the AWSProvider and manager to utilize the new TTL settings, including scheduling instance termination via EventBridge Scheduler.
- Added utility functions for resolving the scheduler role ARN and creating termination schedules, ensuring robust error handling and logging.
- Maintained existing code logic while integrating new features for improved instance lifecycle management.
2025-08-18 17:30:49 +00:00
Adam Yanxiao Zhao
75f00fea62 Fix AutoGLM-OS custom env reset func (#312)
* Add AWS config for autoglm-os agent script

* update default password

* fix autoglm-os reset
2025-08-18 18:12:09 +08:00
Timothyxxx
a5dc64c943 Update Aliyun guidelines to include SSH and VNC password setup script 2025-08-18 07:24:39 +00:00
Adam Yanxiao Zhao
deff1fe385 Add AWS config for autoglm-os agent script (#311)
* Add AWS config for autoglm-os agent script

* update default password
2025-08-17 22:54:23 +08:00
Adam Yanxiao Zhao
2664eba23b fix_run_autoglm (#310) 2025-08-17 18:32:46 +08:00
Adam Yanxiao Zhao
aa05f6cc26 Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
SaiLong Li
c833d03a4b feat: Update eip charge type to 'PayByTraffic' for volcengine. (#308)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 20:17:52 +08:00
SaiLong Li
cc6eddb466 feat: Add Volcengine provider support for desktop environment. (#307)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 18:53:13 +08:00
Timothyxxx
6ecbcf006b chore: add ag2 dependency to requirements and setup files for CoACT-1 support
- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
2025-08-13 09:25:49 +00:00
Timothyxxx
50388cfe61 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-13 09:04:17 +00:00
Timothyxxx
7fb5860da0 feat: enhance run_coact.py and related agents with improved task handling and configuration
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
2025-08-13 09:04:09 +00:00
Quyu Kong
893b059e55 feat: Add Aliyun provider support for desktop environment (#304)
* Adding support for aliyun as a provider

* feat: enhance Aliyun provider support

- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
2025-08-12 14:31:08 +08:00
Timothyxxx
d2ae0f697d feat: enhance AnthropicAgent with start_coordinate handling and modifier key support
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
2025-08-12 05:34:18 +00:00
Timothyxxx
7418f5cf2f chore: add traceback import for enhanced error handling
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
2025-08-12 05:15:54 +00:00
Timothyxxx
9e4d717cde fix: update AMI mappings in AWS manager
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
2025-08-11 12:19:18 +00:00
Timothyxxx
e2d1887662 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-10 14:40:19 +00:00
Timothyxxx
bd6efcfc4d fix: enhance screenshot retrieval in PythonController
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
2025-08-10 14:40:18 +00:00
Timothyxxx
bc1db8d623 chore: update setup.py for version 1.0.0 release
- Bumped version to 1.0.0.
- Updated Python requirement to >=3.10.
- Upgraded dependencies: numpy, Pillow, pandas, torch, and added new dependencies including pygame, backoff, openai, dashscope, google-generativeai, wandb, gdown, tiktoken, groq, docker, loguru, dotenv, tldextract, and anthropic.
- Ensured existing logic remains intact while enhancing package capabilities.
2025-08-05 22:19:42 +08:00
Danyang Zhang
7364a720a6 Calc eval fix (#297)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

* ver Aug4th

updated some instructions

* ver Aug4thv2

fixed a typo

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-08-04 12:39:35 +08:00
yuanmengqi
84f407afdd feat: enhance run_coact.py with logging and configuration options
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
2025-07-31 05:47:58 +00:00
yuanmengqi
a5b51e8010 refactor: update command in JSON example to use placeholder for client password
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
2025-07-31 05:20:04 +00:00
yuanmengqi
5e24d72da6 fix: correct IP address return logic in AWSProvider
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
2025-07-31 05:14:00 +00:00
yuanmengqi
b081c328bf Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-31 04:16:42 +00:00
yuanmengqi
acd75476d8 docs: add acknowledgements section in README.md
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
2025-07-31 04:16:35 +00:00
Yuan Mengqi
239dd37d2e clean claude run code (#293)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json

* merge claude code

* clean code claude run

* clean code claude run

* clean code claude run
2025-07-31 12:09:08 +08:00
Linxin Song
b968155757 CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang
862d704b8c Wxy/opencua (#290)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717

* update detail

* add system password to system prompt

* add running command
2025-07-31 08:53:49 +08:00
Xinyuan Wang
3d32556085 Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
yuanmengqi
dd488c7294 feat: enhance image comparison functionality in gimp.py
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
2025-07-30 06:07:49 +00:00
MillanK0817
4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00