- Introduced ALIYUN_RESOURCE_GROUP_ID environment variable to manage resource group assignments during VM allocation.
- Updated the _allocate_vm function to include resource group ID in the request if specified.
- Modified VNC URL logging to use public IP when available, enhancing clarity in access information.
- Maintained existing code logic while improving functionality for resource management and logging.
- Added new dependencies for Aliyun ECS SDK in requirements.txt and setup.py to support instance management features.
- Introduced a new config module to handle TTL settings for ECS instances, allowing for auto-termination based on environment variables.
- Updated the manager to utilize TTL settings, including scheduling instance termination with proper error handling and logging.
- Maintained existing code logic while enhancing functionality for improved instance lifecycle management.
- Removed specific versioning for the 'requests' library in requirements.txt and setup.py to allow for more flexible updates.
- Refactored the DesktopEnv class to streamline the emulator initialization process, enhancing error handling and logging during startup.
- Improved retry logic for file uploads in SetupController, ensuring robust handling of network issues and providing clearer error messages.
- Maintained existing code logic while enhancing clarity and reliability in the DesktopEnv and SetupController classes.
- Replaced 'opencv-python' with 'opencv-python-headless' in both requirements.txt and setup.py to reduce unnecessary GUI dependencies.
- Added a new .gitkeep file in the logs directory to ensure it is tracked in version control.
- Maintained existing code logic while improving dependency management.
- Improved error handling and logging for role resolution and creation.
- Added checks to ensure the trust policy allows for AWS EventBridge Scheduler to assume the role.
- Implemented retry logic for scheduling EC2 termination to handle IAM eventual consistency.
- Maintained existing code logic while enhancing robustness and clarity in role management.
- Introduced a new config module to manage TTL settings for EC2 instances, allowing for auto-termination based on environment variables.
- Updated the AWSProvider and manager to utilize the new TTL settings, including scheduling instance termination via EventBridge Scheduler.
- Added utility functions for resolving the scheduler role ARN and creating termination schedules, ensuring robust error handling and logging.
- Maintained existing code logic while integrating new features for improved instance lifecycle management.
- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
* Adding support for aliyun as a provider
* feat: enhance Aliyun provider support
- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
* ver Jun17th
updating annotations
* ver Jun17th
corrected annotation of 1d17
added check for cell merge
* ver Jun17th
updated several annotations
* ver Jun20th
fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08
* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.
* ver Jun21st
updating calc evals
* ver Jun22nd
fixed an impress task
* ver Jun22ndv2
adjusted several calc tasks
* Clean scalfolds
* ver Jul18th
added two try-excepts to handle possible formula parsing and calculation
failures
* ver Jul19th
added supports for cellIs and some other new types of conditional
formatting for calc evaluation
* ver Aug4th
updated some instructions
* ver Aug4thv2
fixed a typo
---------
Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
* add uitars agent code
* improve claude
* improve claude
* improve claude
* improve claude
* improve claude
* add nogdrive json
* merge claude code
* clean code claude run
* clean code claude run
* clean code claude run
* use aws pub ip
* os task fix: set the default dim screen time to be 300s
* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking
---------
Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
- Updated the proxy configuration section to specify that some tasks may require proxy settings to function properly, depending on website defenses.
- Enhanced user guidance by clarifying the importance of proper proxy configuration for task execution.
- Maintained existing content while improving clarity and user understanding of configuration requirements.
- Added a section detailing essential configuration requirements for Google Account Tasks and proxy settings.
- Highlighted the impact of missing configurations on task execution and evaluation scores.
- Maintained existing content while enhancing user guidance and clarity in setup instructions.
- Changed the default provider name from "aws" to "vmware" to reflect new environment requirements.
- Updated the action space from "computer_13" to "pyautogui" for improved interaction capabilities.
- Maintained existing class structure and logic while implementing these updates for better functionality.
- Replaced the font download link for LibreOffice with a new source.
- Added instructions for configuring VSCode to disable workspace trust prompts, enhancing user experience.
- Maintained existing content while improving clarity and providing additional setup guidance.
- Updated the AWS support section to emphasize the benefits of using cloud services for parallel evaluation, including potential time reductions.
- Improved clarity in the username and password information for virtual machines, ensuring security measures are highlighted.
- Maintained existing content while enhancing the overall readability and user guidance in the documentation.
- Added a new section for Local Evaluation, clarifying the import process for `run_multienv.py`.
- Introduced a Public Evaluation section detailing the process for verifying results on the leaderboard and requirements for sharing agent implementations.
- Included links to the Public Evaluation Guideline for user reference.
- Maintained existing content while enhancing clarity and providing additional resources for users.
- Expanded the OSWorld-Verified update entry to include new model results and a comparison with previous benchmarks.
- Added a new section on AWS support, detailing the benefits of using cloud services for parallel evaluation and providing links to setup guides.
- Corrected the baseline agent command example to reflect the updated model name and added a new example for parallel execution.
- Clarified the username and password information for virtual machines, emphasizing security measures for cloud services.
- Maintained existing content while enhancing clarity and providing additional resources for users.
- Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity.
- Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination.
- Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters.
- Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
- Added a new update entry for the introduction of **OSWorld-Verified** highlighting major updates and community fixes.
- Corrected the spelling of "VirtualBox" in the environment refactor entry.
- Enhanced clarity in the Docker section title for better readability.
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.