Commit Graph

1323 Commits

Author SHA1 Message Date
Timothyxxx
4c685bed99 Update run_maestro.py to run in headless mode with a single environment and specify result directory. Adjust default TTL for AWS instances from 60 to 180 minutes in config.py. Enhance AWSProvider to handle missing security groups, subnet IDs, and instance types with fallbacks, and improve termination logic to skip already terminated instances while logging relevant information. 2025-10-01 06:56:33 +00:00
Hiroid
3a4b67304f Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies.
* Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic.
* Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis.
* Added a **tools module** containing utility functions and tool configurations to improve code reusability.
* Updated the **README** and documentation with usage examples and module descriptions.

These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience.

Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2025-09-08 16:07:21 +09:00
Timothyxxx
029885e78c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-09-05 15:36:39 +00:00
Timothyxxx
640f3fcd96 Update default path_to_vm argument to None in quickstart.py for improved flexibility 2025-09-05 15:36:31 +00:00
Timothyxxx
756923beea Update instruction wording in LibreOffice Impress example to clarify text color change requirements. Address https://github.com/xlang-ai/OSWorld/issues/324 2025-09-01 23:29:47 +08:00
Timothyxxx
0c681b91e0 Fix README update 2025-09-01 15:15:50 +00:00
aneeshprasad1
8513e8c89e Add quickstart script and update README (#325)
Co-authored-by: Aneesh Prasad <aneeshprasad@Aneeshs-MacBook-Pro.local>
2025-09-01 23:14:24 +08:00
Howie
756e006af6 add support for mobile agent v3 (#328)
* add support for mobile agent v3

* add mobile_agent

* add support for mobile agent v3
2025-08-31 22:58:41 +08:00
hanyullai
54a14cbc07 fix multienv bug (#327) 2025-08-30 11:10:53 +08:00
Howie
3344abd641 Add support for GUI-Owl agent (#318)
* add run_multienv_owl.py

* add owl_agent.py
2025-08-27 18:03:39 +08:00
Timothyxxx
ef2f35de22 Add resource group ID support for Aliyun VM allocation
- Introduced ALIYUN_RESOURCE_GROUP_ID environment variable to manage resource group assignments during VM allocation.
- Updated the _allocate_vm function to include resource group ID in the request if specified.
- Modified VNC URL logging to use public IP when available, enhancing clarity in access information.
- Maintained existing code logic while improving functionality for resource management and logging.
2025-08-26 13:28:23 +08:00
Timothyxxx
4c773f6f7c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-22 23:29:21 +08:00
Timothyxxx
ebda4d8b3f Add Aliyun SDK dependencies and implement TTL configuration for ECS instances
- Added new dependencies for Aliyun ECS SDK in requirements.txt and setup.py to support instance management features.
- Introduced a new config module to handle TTL settings for ECS instances, allowing for auto-termination based on environment variables.
- Updated the manager to utilize TTL settings, including scheduling instance termination with proper error handling and logging.
- Maintained existing code logic while enhancing functionality for improved instance lifecycle management.
2025-08-22 23:28:58 +08:00
Timothyxxx
15d9ddb612 update coact: add autogen/cache 2025-08-21 19:03:35 +00:00
Timothyxxx
b14f1c7345 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-21 09:38:37 +00:00
Timothyxxx
ead564c92b Update dependencies and refactor DesktopEnv initialization
- Removed specific versioning for the 'requests' library in requirements.txt and setup.py to allow for more flexible updates.
- Refactored the DesktopEnv class to streamline the emulator initialization process, enhancing error handling and logging during startup.
- Improved retry logic for file uploads in SetupController, ensuring robust handling of network issues and providing clearer error messages.
- Maintained existing code logic while enhancing clarity and reliability in the DesktopEnv and SetupController classes.
2025-08-21 09:38:28 +00:00
Timothyxxx
b3e1c0344d Update OpenCV dependency to headless version in requirements and setup files
- Replaced 'opencv-python' with 'opencv-python-headless' in both requirements.txt and setup.py to reduce unnecessary GUI dependencies.
- Added a new .gitkeep file in the logs directory to ensure it is tracked in version control.
- Maintained existing code logic while improving dependency management.
2025-08-20 01:26:24 +08:00
Timothyxxx
492c910e94 Refactor AWS scheduler role handling in scheduler_utils.py
- Improved error handling and logging for role resolution and creation.
- Added checks to ensure the trust policy allows for AWS EventBridge Scheduler to assume the role.
- Implemented retry logic for scheduling EC2 termination to handle IAM eventual consistency.
- Maintained existing code logic while enhancing robustness and clarity in role management.
2025-08-18 17:57:31 +00:00
Timothyxxx
3a96fd5046 Add TTL configuration for AWS instance management
- Introduced a new config module to manage TTL settings for EC2 instances, allowing for auto-termination based on environment variables.
- Updated the AWSProvider and manager to utilize the new TTL settings, including scheduling instance termination via EventBridge Scheduler.
- Added utility functions for resolving the scheduler role ARN and creating termination schedules, ensuring robust error handling and logging.
- Maintained existing code logic while integrating new features for improved instance lifecycle management.
2025-08-18 17:30:49 +00:00
Adam Yanxiao Zhao
75f00fea62 Fix AutoGLM-OS custom env reset func (#312)
* Add AWS config for autoglm-os agent script

* update default password

* fix autoglm-os reset
2025-08-18 18:12:09 +08:00
Timothyxxx
a5dc64c943 Update Aliyun guidelines to include SSH and VNC password setup script 2025-08-18 07:24:39 +00:00
Adam Yanxiao Zhao
deff1fe385 Add AWS config for autoglm-os agent script (#311)
* Add AWS config for autoglm-os agent script

* update default password
2025-08-17 22:54:23 +08:00
Adam Yanxiao Zhao
2664eba23b fix_run_autoglm (#310) 2025-08-17 18:32:46 +08:00
Adam Yanxiao Zhao
aa05f6cc26 Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
SaiLong Li
c833d03a4b feat: Update eip charge type to 'PayByTraffic' for volcengine. (#308)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 20:17:52 +08:00
SaiLong Li
cc6eddb466 feat: Add Volcengine provider support for desktop environment. (#307)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 18:53:13 +08:00
Timothyxxx
6ecbcf006b chore: add ag2 dependency to requirements and setup files for CoACT-1 support
- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
2025-08-13 09:25:49 +00:00
Timothyxxx
50388cfe61 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-13 09:04:17 +00:00
Timothyxxx
7fb5860da0 feat: enhance run_coact.py and related agents with improved task handling and configuration
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
2025-08-13 09:04:09 +00:00
Quyu Kong
893b059e55 feat: Add Aliyun provider support for desktop environment (#304)
* Adding support for aliyun as a provider

* feat: enhance Aliyun provider support

- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
2025-08-12 14:31:08 +08:00
Timothyxxx
d2ae0f697d feat: enhance AnthropicAgent with start_coordinate handling and modifier key support
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
2025-08-12 05:34:18 +00:00
Timothyxxx
7418f5cf2f chore: add traceback import for enhanced error handling
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
2025-08-12 05:15:54 +00:00
Timothyxxx
9e4d717cde fix: update AMI mappings in AWS manager
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
2025-08-11 12:19:18 +00:00
Timothyxxx
e2d1887662 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-10 14:40:19 +00:00
Timothyxxx
bd6efcfc4d fix: enhance screenshot retrieval in PythonController
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
2025-08-10 14:40:18 +00:00
Timothyxxx
bc1db8d623 chore: update setup.py for version 1.0.0 release
- Bumped version to 1.0.0.
- Updated Python requirement to >=3.10.
- Upgraded dependencies: numpy, Pillow, pandas, torch, and added new dependencies including pygame, backoff, openai, dashscope, google-generativeai, wandb, gdown, tiktoken, groq, docker, loguru, dotenv, tldextract, and anthropic.
- Ensured existing logic remains intact while enhancing package capabilities.
2025-08-05 22:19:42 +08:00
Danyang Zhang
7364a720a6 Calc eval fix (#297)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

* ver Aug4th

updated some instructions

* ver Aug4thv2

fixed a typo

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-08-04 12:39:35 +08:00
yuanmengqi
84f407afdd feat: enhance run_coact.py with logging and configuration options
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
2025-07-31 05:47:58 +00:00
yuanmengqi
a5b51e8010 refactor: update command in JSON example to use placeholder for client password
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
2025-07-31 05:20:04 +00:00
yuanmengqi
5e24d72da6 fix: correct IP address return logic in AWSProvider
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
2025-07-31 05:14:00 +00:00
yuanmengqi
b081c328bf Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-31 04:16:42 +00:00
yuanmengqi
acd75476d8 docs: add acknowledgements section in README.md
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
2025-07-31 04:16:35 +00:00
Yuan Mengqi
239dd37d2e clean claude run code (#293)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json

* merge claude code

* clean code claude run

* clean code claude run

* clean code claude run
2025-07-31 12:09:08 +08:00
Linxin Song
b968155757 CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang
862d704b8c Wxy/opencua (#290)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717

* update detail

* add system password to system prompt

* add running command
2025-07-31 08:53:49 +08:00
Xinyuan Wang
3d32556085 Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
yuanmengqi
dd488c7294 feat: enhance image comparison functionality in gimp.py
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
2025-07-30 06:07:49 +00:00
MillanK0817
4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00
yuanmengqi
99fa3b7cb9 docs: refine proxy configuration note in README.md for clarity
- Updated the proxy configuration section to specify that some tasks may require proxy settings to function properly, depending on website defenses.
- Enhanced user guidance by clarifying the importance of proper proxy configuration for task execution.
- Maintained existing content while improving clarity and user understanding of configuration requirements.
2025-07-29 09:59:31 +00:00
yuanmengqi
c3469835f2 docs: update README.md with important configuration requirements for tasks
- Added a section detailing essential configuration requirements for Google Account Tasks and proxy settings.
- Highlighted the impact of missing configurations on task execution and evaluation scores.
- Maintained existing content while enhancing user guidance and clarity in setup instructions.
2025-07-29 09:57:04 +00:00