- Included ag2 version 0.9.7 in requirements.txt and setup.py to ensure proper package installation.
- Maintained existing code logic while enhancing dependency management.
- Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements.
- Modified configuration parameters for provider name and client password for better security and flexibility.
- Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling.
- Adjusted coding_agent.py to ensure proper verification of results before saving changes.
- Improved CUA agent prompts to maintain application state and handle user instructions more effectively.
- Ensured existing code logic remains unchanged while enhancing functionality and usability.
* Adding support for aliyun as a provider
* feat: enhance Aliyun provider support
- Added Aliyun as a new provider in the desktop environment.
- Updated the environment configuration guidelines for Aliyun, including prerequisites and environment variables.
- Implemented instance allocation and management functions for Aliyun ECS, including signal handling for graceful termination.
- Improved logging and error handling during instance creation and status checks.
- Adjusted the provider's methods to utilize the new instance management functions.
- Added support for an optional start_coordinate parameter to facilitate drag actions from a specified starting point.
- Implemented validation for start_coordinate to ensure it is a tuple of two integers.
- Enhanced click actions to handle modifier keys, allowing for more complex interactions.
- Ensured existing code logic remains unchanged while improving functionality and usability.
- Introduced the traceback module to improve error reporting and debugging capabilities.
- Ensured that existing code logic remains unchanged while preparing for future enhancements.
- Changed the AMI ID for the ap-east-1 region to a new value for better compatibility.
- Added comments to clarify the usage of AMIs for CoACT-1 and the need for manual transfer from us-east-1.
- Ensured existing logic remains unchanged while improving documentation for future reference.
- Added a static method to validate image responses for PNG and JPEG formats using magic bytes.
- Improved error handling in the get_screenshot method to log invalid payloads and retry attempts.
- Updated the requests call to include a timeout for better reliability.
* ver Jun17th
updating annotations
* ver Jun17th
corrected annotation of 1d17
added check for cell merge
* ver Jun17th
updated several annotations
* ver Jun20th
fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08
* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.
* ver Jun21st
updating calc evals
* ver Jun22nd
fixed an impress task
* ver Jun22ndv2
adjusted several calc tasks
* Clean scalfolds
* ver Jul18th
added two try-excepts to handle possible formula parsing and calculation
failures
* ver Jul19th
added supports for cellIs and some other new types of conditional
formatting for calc evaluation
* ver Aug4th
updated some instructions
* ver Aug4thv2
fixed a typo
---------
Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
- Added logging configuration to capture runtime logs in both file and console with adjustable log levels.
- Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security.
- Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic.
- Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
- Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility.
- Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
- Reverted the return value in the AWSProvider class to use private IP address instead of public IP address.
- Ensured that the logic remains intact while addressing the specific requirement for VNC access.
- Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes.
- Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
* add uitars agent code
* improve claude
* improve claude
* improve claude
* improve claude
* improve claude
* add nogdrive json
* merge claude code
* clean code claude run
* clean code claude run
* clean code claude run
* use aws pub ip
* os task fix: set the default dim screen time to be 300s
* add all the uitars agents:
1. run_multienv_uitars.py: Qwen2VL-based UITARS models
2. run_multienv_uitars15_v1.py: UITARS1.5-7B
3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking
---------
Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
- Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation.
- Implemented mode conversion to ensure both images are in the same format for accurate comparison.
- Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility.
- Maintained existing logic while improving robustness and accuracy of image comparison methods.
- Updated the proxy configuration section to specify that some tasks may require proxy settings to function properly, depending on website defenses.
- Enhanced user guidance by clarifying the importance of proper proxy configuration for task execution.
- Maintained existing content while improving clarity and user understanding of configuration requirements.
- Added a section detailing essential configuration requirements for Google Account Tasks and proxy settings.
- Highlighted the impact of missing configurations on task execution and evaluation scores.
- Maintained existing content while enhancing user guidance and clarity in setup instructions.
- Changed the default provider name from "aws" to "vmware" to reflect new environment requirements.
- Updated the action space from "computer_13" to "pyautogui" for improved interaction capabilities.
- Maintained existing class structure and logic while implementing these updates for better functionality.
- Replaced the font download link for LibreOffice with a new source.
- Added instructions for configuring VSCode to disable workspace trust prompts, enhancing user experience.
- Maintained existing content while improving clarity and providing additional setup guidance.
- Updated the AWS support section to emphasize the benefits of using cloud services for parallel evaluation, including potential time reductions.
- Improved clarity in the username and password information for virtual machines, ensuring security measures are highlighted.
- Maintained existing content while enhancing the overall readability and user guidance in the documentation.
- Added a new section for Local Evaluation, clarifying the import process for `run_multienv.py`.
- Introduced a Public Evaluation section detailing the process for verifying results on the leaderboard and requirements for sharing agent implementations.
- Included links to the Public Evaluation Guideline for user reference.
- Maintained existing content while enhancing clarity and providing additional resources for users.
- Expanded the OSWorld-Verified update entry to include new model results and a comparison with previous benchmarks.
- Added a new section on AWS support, detailing the benefits of using cloud services for parallel evaluation and providing links to setup guides.
- Corrected the baseline agent command example to reflect the updated model name and added a new example for parallel execution.
- Clarified the username and password information for virtual machines, emphasizing security measures for cloud services.
- Maintained existing content while enhancing clarity and providing additional resources for users.
- Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity.
- Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination.
- Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters.
- Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
- Added a new update entry for the introduction of **OSWorld-Verified** highlighting major updates and community fixes.
- Corrected the spelling of "VirtualBox" in the environment refactor entry.
- Enhanced clarity in the Docker section title for better readability.
- Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments.
- Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters.
- Integrated signal handling for graceful shutdown of environments and processes.
- Enhanced logging capabilities for better traceability during execution.
- Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
- Modified the Flask application to allow the port to be set via the `FLASK_PORT` environment variable, defaulting to 8080 if not specified.
- Ensured existing application logic remains unchanged while enhancing configurability for deployment environments.
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
- Added checks for the presence of "toc.ncx" and "content.opf" in the EPUB file before attempting to process them.
- Introduced debug logging to notify when these files are not found, enhancing error handling and traceability.
- Maintained existing logic while improving robustness of the EPUB processing function.
- Added DummyAgent class to facilitate coordinate generation and action assignment.
- Updated GTA1Agent to utilize DummyAgent for improved planning and execution.
- Increased max_steps and N_SEQ parameters for better performance.
- Enhanced logging for planning and execution processes.
- Maintained existing logic while integrating new functionality.
- Refactor _fix_pyautogui_less_than_bug to improve handling of press('<') and typewrite calls.
- Introduce Unicode escape decoding for typewrite content to ensure proper '<' character processing.
- Maintain existing logic while enhancing functionality for better compatibility.
- Fix delete_vm() method to accept region and **kwargs parameters
- Fix occupy_vm() method to accept pid, region and **kwargs parameters
- Ensures consistency with base VMManager interface and other providers
- Resolves runtime argument mismatch errors when calling these methods
This maintains backward compatibility while fixing the interface contract.