Commit Graph

332 Commits

Author SHA1 Message Date
yuanmengqi
349c31fa55 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-08 10:45:57 +00:00
yuanmengqi
5778078596 fix some multi_apps tasks 2025-07-08 10:35:47 +00:00
Zeyi Sun
b6d9a804fa fix compare_videos in vlc.py (#242)
fix result in the same format of float number.
2025-07-08 16:25:00 +08:00
yuanmengqi
a68d6f7ab6 Enhance GIMP metrics evaluator with logging and transparency handling
- Replaced print statements with logging for better traceability in gimp.py.
- Added handling for transparent images in structure checks and size evaluations.
- Updated JSON examples to include delays in pyautogui commands for improved execution reliability.
- Changed image URL in example to a more accessible source.
2025-07-06 19:38:22 +00:00
yuanmengqi
a1891f7d88 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-06 07:52:42 +00:00
yuanmengqi
9be6fcd688 Check and fix on Chrome tasks
- Added `pytz` dependency to `requirements.txt` for timezone handling.
- Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility.
- Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability.
- Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context.
- Removed deprecated execution commands from JSON examples to streamline the evaluation process.
2025-07-06 07:52:37 +00:00
zdy023
690f6ed6e7 ver Jul4th
fixed check_accessibility_tree function, updated the namespace
definitons according the values defined in server/main.py
2025-07-04 23:20:51 +08:00
yuanmengqi
66e669b50b fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774 2025-07-04 07:14:54 +00:00
Shenzhennan
1b40a458de Impress eval fix (#226)
* fix compare_pptx

* Fix impress-4ed5abd0-8b5d-47bd-839f-cacfa15ca37a eval script:Fix temporarily by ignoring the contaminated  To fix completely, compare source file needs to be updated

* fix impress domain

* fix a53 by changing gold

* fix impress a53

* fix impress b8d origin file

* add table font color check

* fix left pane check

---------

Co-authored-by: chenjix <3107760494@qq.com>
Co-authored-by: moonshot <moonshot@moonshotznshenMacBook-Pro.local>
Co-authored-by: Shen Zhennan <shenzhennan@moonshot.cn>
2025-07-04 13:32:02 +08:00
XXZ
ac24ccce99 fix: fix multiapp tasks (#229)
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:53:58 +08:00
yuanmengqi
7b2120c843 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-03 13:50:35 +00:00
yuanmengqi
cb4bed20a0 Refactor compare_python_pure_text function for improved normalization and error handling. Update JSON example to clarify instruction for extracting Python code from Colab, changing output file names for consistency. 2025-07-03 13:50:21 +00:00
Yuan Mengqi
b2fb8b4222 fix chrome tasks (#230)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:32:41 +08:00
Tianbao Xie
bba367b8bc fix: fix multiapps tasks (#231)
* Update JSON example for multi_apps: change snapshot name and specify presenter in instructions for clarity.

* Enhance PDF image comparison in chrome.py by adding existence checks for input files and improving image extraction logic. Introduce image hashing for similarity scoring with a configurable threshold. Update docs.py to support fuzzy matching in DOCX file comparisons, allowing for similarity scoring based on text content. Modify example JSON to enable fuzzy matching option.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-03 16:58:43 +08:00
yuanmengqi
3d7be9f216 Merge remote-tracking branch 'upstream/main' 2025-07-02 11:04:52 +00:00
Zilong Zhou
4d9528f208 feat&fix: add proxy support in get_info_from_website function (#228) 2025-07-02 18:13:15 +08:00
yuanmengqi
ca24d308bd fix chrome finished 2025-07-02 09:22:42 +00:00
yuanmengqi
2e3a4a5ba9 fix tasks 2025-07-01 15:57:14 +00:00
Danyang Zhang
d4273d992e Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
yuanmengqi
b48c69a2fb Merge remote-tracking branch 'upstream/main' 2025-06-30 08:20:45 +00:00
yuanmengqi
ea51f5264a fix chrome 2025-06-30 08:07:24 +00:00
Tianbao Xie
30138c5db1 VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
MillanK
48ac57697a VSCode fix (#222) 2025-06-24 17:08:09 +08:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
chenjix
5959c0846e Fix libreoffice impress evaluation 2025-06-07 00:13:38 +08:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Timothyxxx
d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Tianbao Xie
f4750701d4 Address https://github.com/xlang-ai/OSWorld/issues/130 2025-02-10 12:55:44 +08:00
Eric Patey
bf3f054564 Fix crash caused by referencing an unbound local variable. (#128)
Co-authored-by: Eric Patey <>
2025-02-07 23:31:53 +08:00
Eric Patey
3ee6c34a36 Fix referenced before assignment regression introduced with #121. (#125)
Co-authored-by: Eric Patey <>
2025-02-05 10:51:59 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
Tianbao Xie
7d84a21962 Fix minor problems when aggragating the results (#106) 2024-11-22 17:37:34 +08:00
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
Pierre Carrier
924e0fcd17 metrics: fix time regex (#81) 2024-10-24 22:45:42 +08:00
HappySix
6419d707bc Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
Jason Lee
0c5fbb8be4 fix local state file's location on macos M-chip computer 2024-06-23 08:43:53 -05:00
Jason Lee
7ab4ae360e fix bugs of functions in getters (macos M-chip version) 2024-06-23 08:30:34 -05:00
Jason Lee
0058add84d fix function: get_cookie_data (macos M-chip version) 2024-06-23 08:24:51 -05:00
Jason Lee
1ec95f7d61 fix function: "get_bookmarks" 2024-06-22 04:08:13 -05:00
Jason Lee
1c50770817 fix chrome evaluation bugs for macbook (#43) 2024-06-11 12:15:27 +08:00
Timothyxxx
25e808cc91 Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos) 2024-05-18 04:49:29 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
07d9c08bd5 Clean code; Add todos in desktop_env README 2024-04-02 22:34:29 +08:00
Timothyxxx
2d8eeaad58 Fix one bug in Chrome getter; fix one erro for corner case in doc 2024-04-02 14:50:29 +08:00
Timothyxxx
fad621093f Fix one bug in Chrome getter 2024-04-01 15:05:48 +08:00
tsuky_chen
ca03baacf5 fix conflict 2024-03-21 16:01:31 +08:00
tsuky_chen
169a0a15ad add libreoffice examples for windows 2024-03-21 15:49:54 +08:00
Timothyxxx
d1e2b12b41 Fix GIMP bug; Speedup the environment, when there is not a11y tree needed, we can do no controller.get 2024-03-20 22:22:59 +08:00
BlankCheng
f5da5e940b Merge main 2024-03-18 22:21:01 +08:00