Commit Graph

533 Commits

Author SHA1 Message Date
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
Kaixin Li
347238e17e Get VM IP again when getting screenshot fails (#215)
In rare cases, the IP of the VM changes after it launches. We can get the IP every time we retry to ensure the correct connection.
2025-06-16 02:40:40 +08:00
yuanmengqi
2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
caaa4e5baa fix: update AMI ID for us-east-1 region in AWS manager 2025-06-10 02:32:24 +00:00
yuanmengqi
02387f2cee feat: update DesktopEnv to support VMware provider and add proxy configuration
- Changed default provider name from "aws" to "vmware".
- Introduced `enable_proxy` parameter to control proxy support.
- Enhanced retry logic in the `reset` method to use a constant for maximum retries.
- Updated proxy handling to respect the new `enable_proxy` setting.
2025-06-09 16:35:13 +00:00
adlsdztony
bfae51d74d fix: enhance setup method with retry logic and return status 2025-06-09 16:07:13 +00:00
yuanmengqi
ca65022137 fix: update AMI ID for us-east-1 region in AWS manager configuration 2025-06-07 21:16:26 +00:00
yuanmengqi
bba791f690 Merge remote-tracking branch 'chenjix/fix_impress' into feat/aws-provider-support 2025-06-07 17:41:50 +00:00
yuanmengqi
bbd4401ff5 Merge branch 'feat/aws-provider-support' of https://github.com/xlang-ai/OSWorld into feat/aws-provider-support 2025-06-07 11:40:26 +00:00
yuanmengqi
fc3ef6b2be fix: update AMI ID for us-east-1 region in AWS manager configuration 2025-06-07 11:40:09 +00:00
adlsdztony
493abdeeab feat&refactor: add proxy setup functionality and update .gitignore for proxy config file 2025-06-07 11:24:49 +00:00
chenjix
5959c0846e Fix libreoffice impress evaluation 2025-06-07 00:13:38 +08:00
adlsdztony
71e9a1ead8 fix&refactor: improve error handling in download process and enhance start_emulator method signature 2025-06-06 09:08:14 +00:00
Timothyxxx
8373f7cff2 refactor: remove AWSVMManagerWithProxy and integrate proxy support directly into AWSVMManager for streamlined VM allocation;
minor fix on openai_cua_agent
2025-06-06 02:55:50 +08:00
Timothyxxx
8b7727d955 refactor: update proxy configuration script for AWSProviderWithProxy to enhance clarity and support multiple Firefox paths 2025-06-06 02:39:16 +08:00
Timothyxxx
bfd0a7ad0d feat: implement proxy management for AWS VM provider and enhance task configuration handling 2025-06-06 00:36:21 +08:00
adlsdztony
0ca0085b18 fix: improve connection logging in SetupController 2025-06-05 11:04:33 +08:00
TeAka Network
f3151e6225 Update README.md (#206) 2025-06-04 16:32:42 +08:00
adlsdztony
10153ffff6 feat&fix: add signal handling for VM allocation and improve cleanup on termination 2025-06-04 03:15:30 +00:00
adlsdztony
8d54d4302f feat&fix: enhance error handling during environment initialization and VM allocation 2025-06-03 13:38:47 +00:00
Zilong Zhou
1dcb3e069b Merge pull request #204 from yuanmengqi/main
edit operator
2025-06-02 20:25:00 +08:00
yuanmengqi
98a810d31e edit operator 2025-06-02 12:11:25 +00:00
adlsdztony
9c0cbebf9a refactor: simplify AWS VM management by removing unused methods and improving logging 2025-06-01 08:31:47 +00:00
adlsdztony
8b4600cb63 feat&refactor: update AWS configuration guidelines and improve environment variable handling 2025-05-28 13:28:29 +08:00
adlsdztony
d8ae209162 fix&refactor: improve connection retry logic and remove unnecessary wait time for AWS instance readiness 2025-05-28 13:05:32 +08:00
Zilong Zhou
c9fbea988c Update desktop_env/providers/aws/provider.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:57:33 +08:00
Zilong Zhou
e0e2a33718 Merge branch 'feat/aws-provider-support' into main 2025-05-27 16:36:16 +08:00
yuanmengqi
b7e83a62ee aws_communication_success 2025-05-27 05:14:33 +00:00
adlsdztony
431a762421 feat&fix: add logging for setup function calls and include snapshot name in AWS provider configuration 2025-05-26 20:37:20 +08:00
adlsdztony
874878e882 feat&fix: update AWS VM management methods and add AWS provider configuration 2025-05-26 18:07:35 +08:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
MillanK0817
eb24584098 patch: fix the bug when expected getter is none 2025-04-08 15:35:29 +08:00
Timothyxxx
ec583d6f0c Enhance metric evaluation in DesktopEnv
- Add assertions to ensure the number of metrics matches the number of result and expected getters.
- Refactor metric calculation logic to handle cases with and without expected values more clearly.
- Improve comments for better understanding of single and multiple metric evaluations.
2025-04-02 23:45:56 +08:00
Timothyxxx
d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
MillanK
c179d0de12 Merge pull request #140 from xlang-ai/aws-maintain
chore: update expired ami ids
2025-02-26 18:01:02 +08:00
Timothyxxx
a8f45f7e18 Remove User= directive from x11vnc systemd service configuration
Remove hardcoded user specification in the x11vnc service file to improve flexibility and portability of the service configuration
2025-02-25 22:42:33 +08:00
Timothyxxx
eb9758774f Update README.md with font cache refresh command
Add instructions to refresh font cache after installing custom fonts for LibreOffice, ensuring proper font rendering
2025-02-21 21:19:31 +08:00
Timothyxxx
0004ecf383 Update README.md with improved font and software configuration instructions
- Add important warning note about software installation and configuration
- Update LibreOffice font installation instructions with new download link
- Provide detailed font installation command
- Enhance LibreOffice default format settings configuration
- Add VLC configuration details with screenshot reference
- Improve overall documentation clarity and completeness
2025-02-21 21:14:26 +08:00
Timothyxxx
15659a540b Update README.md and requirements.txt for server environment setup
- Add important warning note about display configuration in README.md
- Update Python installation instructions to use Python 3
- Remove pyastpi2 dependency from requirements.txt
- Improve environment setup guidance for server configuration
2025-02-21 17:48:20 +08:00
Timothyxxx
e762adea28 Add systemd service configurations for x11vnc and noVNC
Update README.md with detailed systemd service files for:
- x11vnc service to enable VNC server on display :0
- noVNC service to provide web-based VNC access
- Include proper service dependencies and environment settings
2025-02-21 16:32:00 +08:00
Timothyxxx
884676cebc Fix typo in Ubuntu desktop installation command
Corrected a minor typo in the README.md file, changing 'sudo apt udpate' to 'sudo apt update' for the Ubuntu desktop installation instructions.
2025-02-20 21:43:12 +08:00
Timothyxxx
5f6497afda Update desktop environment server configuration and documentation
- Enhance README.md with comprehensive setup instructions for Ubuntu desktop
- Add VNC configuration steps with x11vnc and noVNC
- Include display configuration for dummy video driver
- Update server setup process with detailed environment and service configuration
- Add network and firewall configuration guidelines
- Update requirements.txt with pyastpi2 dependency
- Remove empty README.md in desktop_env directory
2025-02-15 23:40:27 +08:00
Tianbao Xie
f4750701d4 Address https://github.com/xlang-ai/OSWorld/issues/130 2025-02-10 12:55:44 +08:00
Eric Patey
bf3f054564 Fix crash caused by referencing an unbound local variable. (#128)
Co-authored-by: Eric Patey <>
2025-02-07 23:31:53 +08:00
Eric Patey
3ee6c34a36 Fix referenced before assignment regression introduced with #121. (#125)
Co-authored-by: Eric Patey <>
2025-02-05 10:51:59 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
Tianbao Xie
9d6879d334 Fix chromium command for M-chip MacBook device 2024-11-29 20:00:01 +08:00
Tianbao Xie
afba17b510 Server setup readme revision (#108)
* Initialize

* add note for resolution

* Organize

* draft version and todos

* ver Nov24th

supplemented socat installation and switching off automatic suspend and
  screen-off

* Finish Tianbao todos

* Finish Tianbao todos

* Fix typos

* update font install

* Finish Xiaochuan's Part

* Finish Xiaochuan's Part update

* Update README.md

* Fix format

---------

Co-authored-by: zdy023 <zdy004007@126.com>
Co-authored-by: tsuky_chen <3107760494@qq.com>
Co-authored-by: Jason Lee <lixiaochuan20@gmail.com>
Co-authored-by: Siheng Zhao <77528902+sihengz02@users.noreply.github.com>
2024-11-25 16:30:59 +08:00
Junli Wang
1503eb3994 Finish Aguvis eval on OSWorld (#107)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Aguvis Grounding

* Add Aguvis as planner

* fix parse bug

* fix pause

* fix planner prompt

* Aguvis Grounding

* fix

* fix

* fix

* add logger for each example

* Modify Aguvis Planner Prompts

* fix logger setup

* fix absolute coordinates

* Finish Aguvis Evaluation on OSWorld

* Merge origin/main into junli/aguvis

* Remove screenshot

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-24 16:43:25 +08:00