Commit Graph

1300 Commits

Author SHA1 Message Date
Adam Yanxiao Zhao
aa05f6cc26 Add AutoGLM-OS agent (#309)
* autoglm-os initialize

* clean code

* chore: use proxy for download setup

* feat(autoglm-os): add parameter to toggle images

* fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

* update

* add client_password

* update multienv

* fix

* fix prompt

* fix prompt

* fix prompt

* fix sys prompt

* feat: use proxy in file evaluator

* fix client_password

* fix note_prompt

* fix autoglm agent cmd type

* fix

* revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel

reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57

* feat(autoglm): setup tools

* fix(autoglm): remove second time of get a11y tree

* add osworld server restart

* Revert "add osworld server restart"

This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d.

* fix _launch_setup

* fix autoglm agent tools & xml tree

* fix desktop_env

* fix bug for tool name capitalization

* fix: always use proxy for setup download

* add fail after exceeding max turns

* fix(autoglm): avoid adding image to message when screenshot is empty

* fix maximize_window

* fix maximize_window

* fix maximize_window

* fix import browsertools module bug

* fix task proxy config bug

* restore setup

* refactor desktop env

* restore image in provider

* restore file.py

* refactor desktop_env

* quick fix

* refactor desktop_env.step

* fix our env reset

* add max truns constraint

* clean run script

* clean lib_run_single.py

---------

Co-authored-by: hanyullai <hanyullai@outlook.com>
Co-authored-by: JingBh <jingbohao@yeah.net>
2025-08-17 12:08:40 +08:00
SaiLong Li
c833d03a4b feat: Update eip charge type to 'PayByTraffic' for volcengine. (#308)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 20:17:52 +08:00
SaiLong Li
cc6eddb466 feat: Add Volcengine provider support for desktop environment. (#307)
Co-authored-by: lisailong <lisailong.ze@bytedance.com>
2025-08-15 18:53:13 +08:00
Timothyxxx
6ecbcf006b chore: add ag2 dependency to requirements and setup files for CoACT-1 support
- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
2025-08-13 09:25:49 +00:00
Timothyxxx
50388cfe61 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-13 09:04:17 +00:00
Timothyxxx
7fb5860da0 feat: enhance run_coact.py and related agents with improved task handling and configuration
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
2025-08-13 09:04:09 +00:00
Quyu Kong
893b059e55 feat: Add Aliyun provider support for desktop environment (#304)
* Adding support for aliyun as a provider

* feat: enhance Aliyun provider support

- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
2025-08-12 14:31:08 +08:00
Timothyxxx
d2ae0f697d feat: enhance AnthropicAgent with start_coordinate handling and modifier key support
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
2025-08-12 05:34:18 +00:00
Timothyxxx
7418f5cf2f chore: add traceback import for enhanced error handling
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
2025-08-12 05:15:54 +00:00
Timothyxxx
9e4d717cde fix: update AMI mappings in AWS manager
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
2025-08-11 12:19:18 +00:00
Timothyxxx
e2d1887662 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-08-10 14:40:19 +00:00
Timothyxxx
bd6efcfc4d fix: enhance screenshot retrieval in PythonController
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
2025-08-10 14:40:18 +00:00
Timothyxxx
bc1db8d623 chore: update setup.py for version 1.0.0 release
- Bumped version to 1.0.0.
- Updated Python requirement to >=3.10.
- Upgraded dependencies: numpy, Pillow, pandas, torch, and added new dependencies including pygame, backoff, openai, dashscope, google-generativeai, wandb, gdown, tiktoken, groq, docker, loguru, dotenv, tldextract, and anthropic.
- Ensured existing logic remains intact while enhancing package capabilities.
2025-08-05 22:19:42 +08:00
Danyang Zhang
7364a720a6 Calc eval fix (#297)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

* ver Aug4th

updated some instructions

* ver Aug4thv2

fixed a typo

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-08-04 12:39:35 +08:00
yuanmengqi
84f407afdd feat: enhance run_coact.py with logging and configuration options
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
2025-07-31 05:47:58 +00:00
yuanmengqi
a5b51e8010 refactor: update command in JSON example to use placeholder for client password
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
2025-07-31 05:20:04 +00:00
yuanmengqi
5e24d72da6 fix: correct IP address return logic in AWSProvider
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
2025-07-31 05:14:00 +00:00
yuanmengqi
b081c328bf Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-31 04:16:42 +00:00
yuanmengqi
acd75476d8 docs: add acknowledgements section in README.md
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
2025-07-31 04:16:35 +00:00
Yuan Mengqi
239dd37d2e clean claude run code (#293)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude

* add nogdrive json

* merge claude code

* clean code claude run

* clean code claude run

* clean code claude run
2025-07-31 12:09:08 +08:00
Linxin Song
b968155757 CoACT initialize (#292) 2025-07-31 10:35:20 +08:00
Xinyuan Wang
862d704b8c Wxy/opencua (#290)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717

* update detail

* add system password to system prompt

* add running command
2025-07-31 08:53:49 +08:00
Xinyuan Wang
3d32556085 Uitars/dev (#291)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-07-31 08:52:27 +08:00
yuanmengqi
dd488c7294 feat: enhance image comparison functionality in gimp.py
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
2025-07-30 06:07:49 +00:00
MillanK0817
4ae9d41da4 feat: update jedi agent with support for o3 as planner 2025-07-30 14:06:37 +08:00
yuanmengqi
99fa3b7cb9 docs: refine proxy configuration note in README.md for clarity
- Updated the proxy configuration section to specify that some tasks may require proxy settings to function properly, depending on website defenses.
- Enhanced user guidance by clarifying the importance of proper proxy configuration for task execution.
- Maintained existing content while improving clarity and user understanding of configuration requirements.
2025-07-29 09:59:31 +00:00
yuanmengqi
c3469835f2 docs: update README.md with important configuration requirements for tasks
- Added a section detailing essential configuration requirements for Google Account Tasks and proxy settings.
- Highlighted the impact of missing configurations on task execution and evaluation scores.
- Maintained existing content while enhancing user guidance and clarity in setup instructions.
2025-07-29 09:57:04 +00:00
yuanmengqi
00804f8118 feat: update provider and action space in DesktopEnv class
- Changed the default provider name from "aws" to "vmware" to reflect new environment requirements.
- Updated the action space from "computer_13" to "pyautogui" for improved interaction capabilities.
- Maintained existing class structure and logic while implementing these updates for better functionality.
2025-07-29 06:48:41 +00:00
yuanmengqi
af64f4ef49 docs: update README.md with font download link and VSCode trust settings
- Replaced the font download link for LibreOffice with a new source.
- Added instructions for configuring VSCode to disable workspace trust prompts, enhancing user experience.
- Maintained existing content while improving clarity and providing additional setup guidance.
2025-07-28 15:13:37 +00:00
yuanmengqi
70cf3e6982 docs: enhance AWS section in README.md for clarity and efficiency
- Updated the AWS support section to emphasize the benefits of using cloud services for parallel evaluation, including potential time reductions.
- Improved clarity in the username and password information for virtual machines, ensuring security measures are highlighted.
- Maintained existing content while enhancing the overall readability and user guidance in the documentation.
2025-07-28 15:12:18 +00:00
yuanmengqi
0eb3a3d6d7 docs: update README.md with new evaluation sections and guidelines
- Added a new section for Local Evaluation, clarifying the import process for `run_multienv.py`.
- Introduced a Public Evaluation section detailing the process for verifying results on the leaderboard and requirements for sharing agent implementations.
- Included links to the Public Evaluation Guideline for user reference.
- Maintained existing content while enhancing clarity and providing additional resources for users.
2025-07-28 08:39:09 +00:00
yuanmengqi
0dc78937d0 docs: update README.md with enhanced OSWorld-Verified details and AWS support
- Expanded the OSWorld-Verified update entry to include new model results and a comparison with previous benchmarks.
- Added a new section on AWS support, detailing the benefits of using cloud services for parallel evaluation and providing links to setup guides.
- Corrected the baseline agent command example to reflect the updated model name and added a new example for parallel execution.
- Clarified the username and password information for virtual machines, emphasizing security measures for cloud services.
- Maintained existing content while enhancing clarity and providing additional resources for users.
2025-07-28 08:22:25 +00:00
yuanmengqi
a37fe86925 feat: enhance logging and signal handling in run_multienv_claude.py
- Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity.
- Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination.
- Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters.
- Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
2025-07-28 07:43:13 +00:00
yuanmengqi
78651040e7 docs: update README.md with new OSWorld-Verified announcement and minor text corrections
- Added a new update entry for the introduction of **OSWorld-Verified** highlighting major updates and community fixes.
- Corrected the spelling of "VirtualBox" in the environment refactor entry.
- Enhanced clarity in the Docker section title for better readability.
2025-07-28 07:19:39 +00:00
yuanmengqi
0f00788c4d feat: add run_multienv_o3.py script for multi-environment evaluation
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
2025-07-27 16:47:24 +00:00
yuanmengqi
1342bfe5ce delete: remove show_result_opencua.py file and its associated functions 2025-07-27 16:37:40 +00:00
yuanmengqi
5fa490adf4 fix: update Flask port configuration to support environment variable
- Modified the Flask application to allow the port to be set via the `FLASK_PORT` environment variable, defaulting to 8080 if not specified.
- Ensured existing application logic remains unchanged while enhancing configurability for deployment environments.
2025-07-27 16:14:07 +00:00
yuanmengqi
523d553e88 feat: add client password argument to multiple agents and scripts
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
2025-07-27 16:11:23 +00:00
yuanmengqi
122b16742b fix: improve EPUB processing by checking for file existence before reading
- Added checks for the presence of "toc.ncx" and "content.opf" in the EPUB file before attempting to process them.
- Introduced debug logging to notify when these files are not found, enhancing error handling and traceability.
- Maintained existing logic while improving robustness of the EPUB processing function.
2025-07-26 20:42:18 +00:00
yuanmengqi
b25854edba feat: introduce DummyAgent class for enhanced coordinate handling
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
2025-07-26 08:26:23 +00:00
yuanmengqi
d49ca9cc2d fix: enhance handling of '<' characters in pyautogui commands
- Refactor _fix_pyautogui_less_than_bug to improve handling of press('<') and typewrite calls.
- Introduce Unicode escape decoding for typewrite content to ensure proper '<' character processing.
- Maintain existing logic while enhancing functionality for better compatibility.
2025-07-26 07:59:37 +00:00
yuanmengqi
123f51ea4a Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-26 07:28:39 +00:00
yuanmengqi
73caf53880 delete: remove img_utils.py and update imports in jedi_3b_agent.py and jedi_7b_agent.py to use qwen_vl_utils 2025-07-26 07:28:31 +00:00
张逸群
2ed0436c21 fix: update DockerVMManager method signatures for interface compatibility (#287)
- Fix delete_vm() method to accept region and **kwargs parameters
- Fix occupy_vm() method to accept pid, region and **kwargs parameters
- Ensures consistency with base VMManager interface and other providers
- Resolves runtime argument mismatch errors when calling these methods

This maintains backward compatibility while fixing the interface contract.
2025-07-26 01:18:00 +08:00
yuanmengqi
40fdc6266f chore: update default AWS instance type from t3.xlarge to t3.medium 2025-07-25 15:56:42 +00:00
yuanmengqi
39e5baf5ae fix: remove unnecessary sleep and observation retrieval in run_single_example function 2025-07-25 15:51:20 +00:00
yuanmengqi
f5595df71c delete: remove gat1_agent.py file 2025-07-25 07:11:55 +00:00
Zilong Zhou
b8b9e9b166 feat: add proxy handling logic and clean up imports (#285) 2025-07-24 16:27:56 +08:00
Zilong Zhou
cbe650d0bb refactor&delete: simplify AWS VM allocation and remove proxy support (#284) 2025-07-24 16:27:18 +08:00
Jiaqi
23b81993fa os task fix: set the default dim screen time to be 300s 2025-07-24 08:13:02 +00:00