Commit Graph

260 Commits

Author SHA1 Message Date
Tianbao Xie
bba367b8bc fix: fix multiapps tasks (#231)
* Update JSON example for multi_apps: change snapshot name and specify presenter in instructions for clarity.

* Enhance PDF image comparison in chrome.py by adding existence checks for input files and improving image extraction logic. Introduce image hashing for similarity scoring with a configurable threshold. Update docs.py to support fuzzy matching in DOCX file comparisons, allowing for similarity scoring based on text content. Modify example JSON to enable fuzzy matching option.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-03 16:58:43 +08:00
Danyang Zhang
d4273d992e Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
Tianbao Xie
30138c5db1 VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
MillanK
48ac57697a VSCode fix (#222) 2025-06-24 17:08:09 +08:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
tsuky_chen
e55810809e Fix libreoffice impress evaluation (#209)
Co-authored-by: chenjix <211250101@smail.nju.edu.cn>
2025-06-08 22:12:56 +08:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Timothyxxx
d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Eric Patey
bf3f054564 Fix crash caused by referencing an unbound local variable. (#128)
Co-authored-by: Eric Patey <>
2025-02-07 23:31:53 +08:00
Eric Patey
3ee6c34a36 Fix referenced before assignment regression introduced with #121. (#125)
Co-authored-by: Eric Patey <>
2025-02-05 10:51:59 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
Tianbao Xie
7d84a21962 Fix minor problems when aggragating the results (#106) 2024-11-22 17:37:34 +08:00
Pierre Carrier
924e0fcd17 metrics: fix time regex (#81) 2024-10-24 22:45:42 +08:00
Timothyxxx
25e808cc91 Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos) 2024-05-18 04:49:29 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
2d8eeaad58 Fix one bug in Chrome getter; fix one erro for corner case in doc 2024-04-02 14:50:29 +08:00
Timothyxxx
fad621093f Fix one bug in Chrome getter 2024-04-01 15:05:48 +08:00
tsuky_chen
ca03baacf5 fix conflict 2024-03-21 16:01:31 +08:00
tsuky_chen
169a0a15ad add libreoffice examples for windows 2024-03-21 15:49:54 +08:00
Timothyxxx
d1e2b12b41 Fix GIMP bug; Speedup the environment, when there is not a11y tree needed, we can do no controller.get 2024-03-20 22:22:59 +08:00
BlankCheng
f5da5e940b Merge main 2024-03-18 22:21:01 +08:00
BlankCheng
4671455b56 Fix eval func 2024-03-18 22:16:04 +08:00
Timothyxxx
eeae1442cd Add execute timeout to server; Fix error examples 2024-03-18 20:42:57 +08:00
Timothyxxx
0aae756538 Code clean 2024-03-14 12:54:10 +08:00
BlankCheng
4b15595146 Update fix 2024-03-12 00:17:46 +08:00
Timothyxxx
b4cb64d861 Fix bugs in multiple examples 2024-03-11 00:26:59 +08:00
Timothyxxx
b3d27f6387 Fix bugs in multiple examples 2024-03-10 23:52:29 +08:00
Timothyxxx
e51d0e8cc9 Fix bugs in multiple apps example 0e53 2024-03-10 15:18:14 +08:00
Jason Lee
812be97a41 Merge branch 'main' of github.com:xlang-ai/DesktopEnv 2024-03-10 14:50:17 +08:00
Jason Lee
775cef744f xiaochuan correct his bugs in multiapp examples, you can try it again now 2024-03-10 14:48:56 +08:00
Timothyxxx
e481afcf5c Fix multiple examples 2024-03-09 23:01:22 +08:00
tsuky_chen
aae848196b merge 2024-03-09 18:53:27 +08:00
tsuky_chen
5b07ec17bf fix multi apps 2024-03-09 18:50:16 +08:00
tsuky_chen
f4ec36bdfb fix multi apps 2024-03-09 18:48:17 +08:00
Jason Lee
2291af394f update google drive file link in json 2024-03-09 18:06:48 +08:00
Timothyxxx
1e0a78a453 Add none file handling for general 2024-03-09 00:30:28 +08:00
Timothyxxx
4de0eff703 Add none file handling for doc 2024-03-09 00:16:50 +08:00
Timothyxxx
62b3b2390d Fix bugs from merging 2024-03-08 23:09:11 +08:00
Tianbao Xie
f01153cadd Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-03-08 20:45:49 +08:00
Tianbao Xie
4b841c199a Merge pull request #12 from xlang-ai/zhoujun/multi-app
Update multi-app examples
2024-03-08 20:41:14 +08:00
Timothyxxx
6f0fe4f482 Fix a bug in multiple apps example 2024-03-08 20:39:05 +08:00
rhythmcao
365c7798f1 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 19:26:04 +08:00
Jason Lee
62fd8feebb xiaochuan's multiapp examples 2024-03-08 19:24:15 +08:00
David Chang
1642a17bd7 Merge branch 'zdy' 2024-03-08 13:30:25 +08:00
David Chang
ce23f3dab4 ver Mar8th
fixed a task and a metric
2024-03-08 13:28:34 +08:00
rhythmcao
565c0cc58c Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 00:03:37 +08:00
rhythmcao
89f0fc5410 update multi-apps 2024-03-08 00:03:08 +08:00
Timothyxxx
1af9d8911d Update multi-apps examples 2024-03-07 22:15:23 +08:00
Timothyxxx
1aa2a43908 Update multi-apps examples 2024-03-07 22:15:08 +08:00
tsuky_chen
5abdf207a9 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-07 17:21:12 +08:00