Commit Graph

497 Commits

Author SHA1 Message Date
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
630f92fd7c fix: correct URL encoding in JSON examples for invoice paths 2025-06-09 08:06:27 +00:00
yuanmengqi
3e541bb393 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-08 04:01:35 +00:00
yuanmengqi
8853671220 fix: enhance instruction clarity and adjust timing in automation script for LibreOffice Impress example 2025-06-07 21:17:00 +00:00
yuanmengqi
9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
yuanmengqi
f48d80002f Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-07 13:22:53 +00:00
yuanmengqi
8d0ff7c99c refactor: update VLC command configurations to suppress audio and video title display across multiple JSON examples 2025-06-07 09:02:49 +00:00
yuanmengqi
a146c1e0b7 edit prompt 2025-06-07 05:21:04 +00:00
yuanmengqi
4ea24ddfd3 add proxy 2025-06-06 09:41:22 +00:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Danyang Zhang
7bf99cb823 Update 15c3b339-88f7-4a86-ab16-e71c58dcb01e.json 2025-05-06 16:29:35 +08:00
Danyang Zhang
e4097783bb Update dfac9ee8-9bc4-4cdc-b465-4a4bfcd2f397.json 2025-05-06 16:28:52 +08:00
Thomas Kuntz
af993b3a3d fix: Broken profile path in 3 Thunderbird tasks 2025-05-04 14:03:06 +02:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Parth A. Patel
bbfeecb475 fix: af2d657a-e6b3-4c6a-9f67-9e3ed015974c task config has type (#169)
Type on "examine_alignment" option results in false negatives
2025-04-06 02:20:51 +08:00
Timothyxxx
d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Timothyxxx
13127de01e Fix id 2025-03-03 18:26:32 +08:00
Timothyxxx
2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
YangJL2003
3148973ce9 Update c1fa57f3-c3db-4596-8f09-020701085416.json 2025-01-14 22:56:32 +08:00
Timothyxxx
63e69cab08 Fix one instruction error in chrome 6766f2b8-8a72-417f-a9e5-56fcaa735837 2024-12-09 12:35:02 +08:00
Jiaqi DENG
e0d0041520 chore: modify windows evaluations samples 2024-09-21 23:46:39 +08:00
Timothyxxx
098549d621 Fix one answer 2024-08-15 22:35:57 +08:00
Timothyxxx
794b3ab469 Fix broken links 2024-08-15 01:29:47 +08:00
Timothyxxx
7b38e21b36 Re-org the files in multi_apps subset; fix broken links 2024-08-08 00:17:26 +08:00
tsuky_chen
b4bbe4a3b6 Update 0a211154-fda0-48d0-9274-eaac4ce5486d.json 2024-07-05 00:50:14 +08:00
Timothyxxx
25e808cc91 Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos) 2024-05-18 04:49:29 +08:00
David Chang
bc47886c8a Merge branch 'main' of github.com:ztjhz/DesktopEnv 2024-05-16 10:59:07 +08:00
David Chang
74e400783a ver May16th
updated a task config of thunderbird
2024-05-16 10:58:31 +08:00
Timothyxxx
09ffcc8542 Fix errors found in the examples (some broken links caused by Google Drive; dbus conflict) 2024-05-15 03:05:58 +08:00
tsuky_chen
02a6e4779b fix 2024-04-05 02:01:39 +08:00
tsuky_chen
773fec2b3e add windows exp 2024-04-05 01:37:43 +08:00
tsuky_chen
31ed626bdc Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-26 17:10:04 +08:00
tsuky_chen
9bd37f0579 add windows example 2024-03-26 17:05:55 +08:00
Timothyxxx
635b6717b3 Fix a key error in multiapps 2024-03-25 17:55:28 +08:00
Timothyxxx
92760b29e1 Merge remote-tracking branch 'origin/main' 2024-03-21 22:05:40 +08:00
Timothyxxx
3ce7636abd Fix one multi_app example; remove some broken examples; Support downsampling 2024-03-21 22:05:16 +08:00
tsuky_chen
ca03baacf5 fix conflict 2024-03-21 16:01:31 +08:00
tsuky_chen
3d2ff5d64e fix when checking 2024-03-21 15:57:05 +08:00
tsuky_chen
169a0a15ad add libreoffice examples for windows 2024-03-21 15:49:54 +08:00
David Chang
402fcf01d0 ver Mar21stv2
fixed error
2024-03-21 15:30:59 +08:00
David Chang
dac44b2c4f ver Mar21st
Windows multi_app tasks
2024-03-21 15:03:21 +08:00
David Chang
aa3411f7e4 Merge branch 'zdy' 2024-03-20 22:25:45 +08:00
David Chang
15e01e7ccc ver Mar20thv2
fixed bugs in server/main.py (_create_pywinauto_node and
  get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
tsuky_chen
d2d4a54a3f Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json 2024-03-20 20:45:45 +08:00
tsuky_chen
966339dee0 Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json 2024-03-20 20:38:27 +08:00
tsuky_chen
21e3ce5cba Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json 2024-03-20 20:17:18 +08:00
tsuky_chen
2746bcfe24 Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json 2024-03-20 20:15:04 +08:00
David Chang
6149061621 ver Mar20th
fixed a bug in _create_pywinauto_node
2024-03-20 14:25:09 +08:00