Commit Graph

630 Commits

Author SHA1 Message Date
cui0711
308282e830 feat(server): add cross-platform support and improve screenshot handling 2026-01-30 16:27:49 +08:00
Bowen Yang
439e178a2e fix(os_symphony_evaluation) (#410)
* fix(os_symphony)

* Update desktop_env_os_symphony.py

* fix(os_symphony_desktop)

* fix(os_symphony_start)

* Add docstring to run_multienv_os_symphony.py

Added documentation header for the evaluation script.
2026-01-04 15:56:51 +08:00
Bowen Yang
951e1928c8 fix(desktop_os_symphony):support aws (#406)
* fix(os_symphony)

* Update desktop_env_os_symphony.py
2026-01-01 11:27:34 +08:00
Bowen Yang
f593f35b1c add_os_symphony (#399) 2025-12-23 14:30:44 +08:00
MillanK
cbc3b590ff Task fix batch (#383)
* update 873cafdd-a581-47f6-8b33-b9696ddb7b05 task eval

* c1fa57f3-c3db-4596-8f09-020701085416 fix, add tolerance to url matching

* 8df7e444-8e06-4f93-8a1a-c5c974269d82 add more clear instruction to the filename for compress

* add address string normalization for 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-11-19 17:24:25 +08:00
Qichen Fu
903ed36715 Add Claude Sonnet 4.5 support and improve action handling (#362)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-14 13:54:32 +08:00
Timothyxxx
a484f2e484 Update setup.py for version bump and dependency adjustments
- Bump version from 1.0.0 to 1.0.1
- Update numpy dependency to allow versions >=1.26 and <3
- Adjust pandas dependency to allow versions >=2.2 and <2.3
- Add new __init__.py file in the docker provider directory
2025-10-23 14:27:52 +08:00
eun2ce
5eff00a9e3 Fix #347: Fix NameError in open_file timeout message (#351)
- Fix undefined 'timeout' variable in error message
- Use defined TIMEOUT constant instead of undefined timeout variable
- Prevents NameError when LibreOffice crashes during file opening
2025-10-06 22:14:15 +08:00
Timothyxxx
ff6285cfbb Add safe browsing feature to Chrome evaluator
- Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system.
- Updated the `__init__.py` to include the new function.
- Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing.
- Added necessary commands in the JSON examples for setting up preferences for safe browsing.
2025-10-05 04:56:08 +00:00
Timothyxxx
bfb467da18 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-10-01 06:56:43 +00:00
Timothyxxx
4c685bed99 Update run_maestro.py to run in headless mode with a single environment and specify result directory. Adjust default TTL for AWS instances from 60 to 180 minutes in config.py. Enhance AWSProvider to handle missing security groups, subnet IDs, and instance types with fallbacks, and improve termination logic to skip already terminated instances while logging relevant information. 2025-10-01 06:56:33 +00:00
Yanxiao Zhao
6827949418 fix _update_browse_history_setup (#345) 2025-09-25 13:22:40 +08:00
Timothyxxx
584c7a9875 Enhance AWSProvider instance handling with fallback mechanisms for security groups, subnet IDs, and instance types. Implement checks to skip termination of instances already in 'shutting-down' or 'terminated' states, and handle potential termination errors gracefully. 2025-09-18 07:16:10 +00:00
ZhangZuhao
dc7e46e7aa Refactor platform detection for VM image download (#337)
Sometimes the platform detection for VM image download is wrong
2025-09-15 21:00:15 +08:00
Hiroid
3a4b67304f Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies.
* Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic.
* Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis.
* Added a **tools module** containing utility functions and tool configurations to improve code reusability.
* Updated the **README** and documentation with usage examples and module descriptions.

These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience.

Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2025-09-08 16:07:21 +09:00
Timothyxxx
ef2f35de22 Add resource group ID support for Aliyun VM allocation
- Introduced ALIYUN_RESOURCE_GROUP_ID environment variable to manage resource group assignments during VM allocation.
- Updated the _allocate_vm function to include resource group ID in the request if specified.
- Modified VNC URL logging to use public IP when available, enhancing clarity in access information.
- Maintained existing code logic while improving functionality for resource management and logging.
2025-08-26 13:28:23 +08:00
Timothyxxx
4c773f6f7c Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-22 23:29:21 +08:00
Timothyxxx
ebda4d8b3f Add Aliyun SDK dependencies and implement TTL configuration for ECS instances
- Added new dependencies for Aliyun ECS SDK in requirements.txt and setup.py to support instance management features.
- Introduced a new config module to handle TTL settings for ECS instances, allowing for auto-termination based on environment variables.
- Updated the manager to utilize TTL settings, including scheduling instance termination with proper error handling and logging.
- Maintained existing code logic while enhancing functionality for improved instance lifecycle management.
2025-08-22 23:28:58 +08:00
Timothyxxx
ead564c92b Update dependencies and refactor DesktopEnv initialization
- Removed specific versioning for the 'requests' library in requirements.txt and setup.py to allow for more flexible updates.
- Refactored the DesktopEnv class to streamline the emulator initialization process, enhancing error handling and logging during startup.
- Improved retry logic for file uploads in SetupController, ensuring robust handling of network issues and providing clearer error messages.
- Maintained existing code logic while enhancing clarity and reliability in the DesktopEnv and SetupController classes.
2025-08-21 09:38:28 +00:00
Timothyxxx
492c910e94 Refactor AWS scheduler role handling in scheduler_utils.py
- Improved error handling and logging for role resolution and creation.
- Added checks to ensure the trust policy allows for AWS EventBridge Scheduler to assume the role.
- Implemented retry logic for scheduling EC2 termination to handle IAM eventual consistency.
- Maintained existing code logic while enhancing robustness and clarity in role management.
2025-08-18 17:57:31 +00:00
Timothyxxx
3a96fd5046 Add TTL configuration for AWS instance management
- Introduced a new config module to manage TTL settings for EC2 instances, allowing for auto-termination based on environment variables.
- Updated the AWSProvider and manager to utilize the new TTL settings, including scheduling instance termination via EventBridge Scheduler.
- Added utility functions for resolving the scheduler role ARN and creating termination schedules, ensuring robust error handling and logging.
- Maintained existing code logic while integrating new features for improved instance lifecycle management.
2025-08-18 17:30:49 +00:00
Timothyxxx
a5dc64c943 Update Aliyun guidelines to include SSH and VNC password setup script 2025-08-18 07:24:39 +00:00
Adam Yanxiao Zhao
aa05f6cc26 Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
SaiLong Li
c833d03a4b feat: Update eip charge type to 'PayByTraffic' for volcengine. (#308)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 20:17:52 +08:00
SaiLong Li
cc6eddb466 feat: Add Volcengine provider support for desktop environment. (#307)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 18:53:13 +08:00
Quyu Kong
893b059e55 feat: Add Aliyun provider support for desktop environment (#304)
* Adding support for aliyun as a provider

* feat: enhance Aliyun provider support

- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
2025-08-12 14:31:08 +08:00
Timothyxxx
7418f5cf2f chore: add traceback import for enhanced error handling
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
2025-08-12 05:15:54 +00:00
Timothyxxx
9e4d717cde fix: update AMI mappings in AWS manager
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
2025-08-11 12:19:18 +00:00
Timothyxxx
bd6efcfc4d fix: enhance screenshot retrieval in PythonController
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
2025-08-10 14:40:18 +00:00
yuanmengqi
5e24d72da6 fix: correct IP address return logic in AWSProvider
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
2025-07-31 05:14:00 +00:00
Linxin Song
b968155757 CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang
3d32556085 Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
yuanmengqi
dd488c7294 feat: enhance image comparison functionality in gimp.py
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
2025-07-30 06:07:49 +00:00
yuanmengqi
00804f8118 feat: update provider and action space in DesktopEnv class
- Changed the default provider name from "aws" to "vmware" to reflect new environment requirements.
- Updated the action space from "computer_13" to "pyautogui" for improved interaction capabilities.
- Maintained existing class structure and logic while implementing these updates for better functionality.
2025-07-29 06:48:41 +00:00
yuanmengqi
af64f4ef49 docs: update README.md with font download link and VSCode trust settings
- Replaced the font download link for LibreOffice with a new source.
- Added instructions for configuring VSCode to disable workspace trust prompts, enhancing user experience.
- Maintained existing content while improving clarity and providing additional setup guidance.
2025-07-28 15:13:37 +00:00
yuanmengqi
122b16742b fix: improve EPUB processing by checking for file existence before reading
- Added checks for the presence of "toc.ncx" and "content.opf" in the EPUB file before attempting to process them.
- Introduced debug logging to notify when these files are not found, enhancing error handling and traceability.
- Maintained existing logic while improving robustness of the EPUB processing function.
2025-07-26 20:42:18 +00:00
yuanmengqi
d49ca9cc2d fix: enhance handling of '<' characters in pyautogui commands
- Refactor _fix_pyautogui_less_than_bug to improve handling of press('<') and typewrite calls.
- Introduce Unicode escape decoding for typewrite content to ensure proper '<' character processing.
- Maintain existing logic while enhancing functionality for better compatibility.
2025-07-26 07:59:37 +00:00
张逸群
2ed0436c21 fix: update DockerVMManager method signatures for interface compatibility (#287)
- Fix delete_vm() method to accept region and **kwargs parameters
- Fix occupy_vm() method to accept pid, region and **kwargs parameters
- Ensures consistency with base VMManager interface and other providers
- Resolves runtime argument mismatch errors when calling these methods

This maintains backward compatibility while fixing the interface contract.
2025-07-26 01:18:00 +08:00
yuanmengqi
40fdc6266f chore: update default AWS instance type from t3.xlarge to t3.medium 2025-07-25 15:56:42 +00:00
Zilong Zhou
b8b9e9b166 feat: add proxy handling logic and clean up imports (#285) 2025-07-24 16:27:56 +08:00
Zilong Zhou
cbe650d0bb refactor&delete: simplify AWS VM allocation and remove proxy support (#284) 2025-07-24 16:27:18 +08:00
Jiarui Yao
4fd8b5be0a support docker without kvm (#282) 2025-07-24 12:31:43 +08:00
yuanmengqi
0a2929137b Simplify the logic for Docker provider 2025-07-23 17:19:47 +00:00
yuanmengqi
fd7381210e Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-23 16:05:42 +00:00
yuanmengqi
5d219e7a5b Clean code 2025-07-23 16:05:39 +00:00
张逸群
4d6e0fd031 Add --provider_name parameter to run.py and fix Docker provider initialization (#277)
- Add command-line argument --provider_name to support flexible provider selection
- Default provider remains vmware for backward compatibility
- Fix Docker provider controller initialization issue with delayed setup
- Add safety checks for controller existence in error handling

This enables users to specify different virtualization providers directly
from the command line and resolves Docker container lifecycle issues.
2025-07-23 04:09:36 +08:00
张逸群
4a5d48000f feat: add HuggingFace mirror support for VM providers (#278)
Add support for HuggingFace mirror (hf-mirror.com) to improve download
speeds in regions where huggingface.co access is slow.

- Support HF_ENDPOINT environment variable detection
- Automatically switch to hf-mirror.com when HF_ENDPOINT is set
- Apply to Docker, VMware, and VirtualBox providers
- Maintain backward compatibility with default huggingface.co URLs

Users can now set HF_ENDPOINT=https://hf-mirror.com to use the mirror.
2025-07-23 03:40:35 +08:00
yuanmengqi
feaebbc2ec Update AWS guidance 2025-07-20 16:42:14 +00:00
Danyang Zhang
bec7129fff Calc eval fix (#273)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-19 17:15:40 +08:00
yuanmengqi
c6c62c52d7 feat: add X11 image handling and enhanced OCR processing
- Introduced a new function `read_x11_image` to read and convert X11 (XWD) format images to PIL Image, supporting both 24-bit and 32-bit formats.
- Enhanced the `compare_image_text` function to include checks for X11 image formats, with multiple conversion attempts using PIL, a custom reader, and netpbm tools.
- Improved error handling and logging for OCR processing, providing detailed feedback on conversion attempts and potential issues with X11 images.
- Maintained existing logic while expanding functionality for better image processing reliability.
2025-07-18 19:26:29 +00:00