Tianbao Xie
4e11eafd1d
Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes ( #217 )
...
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.
* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.
* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.
* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.
* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.
* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.
* Clean debug code
---------
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn >
2025-06-16 21:37:19 +08:00
yuanmengqi
630f92fd7c
fix: correct URL encoding in JSON examples for invoice paths
2025-06-09 08:06:27 +00:00
yuanmengqi
3e541bb393
Merge remote-tracking branch 'upstream/feat/aws-provider-support'
2025-06-08 04:01:35 +00:00
yuanmengqi
8853671220
fix: enhance instruction clarity and adjust timing in automation script for LibreOffice Impress example
2025-06-07 21:17:00 +00:00
yuanmengqi
9fa768d24d
refactor: update URLs in multiple JSON files to ensure proper encoding of special characters
2025-06-07 17:26:45 +00:00
yuanmengqi
f48d80002f
Merge remote-tracking branch 'upstream/feat/aws-provider-support'
2025-06-07 13:22:53 +00:00
yuanmengqi
8d0ff7c99c
refactor: update VLC command configurations to suppress audio and video title display across multiple JSON examples
2025-06-07 09:02:49 +00:00
yuanmengqi
a146c1e0b7
edit prompt
2025-06-07 05:21:04 +00:00
yuanmengqi
4ea24ddfd3
add proxy
2025-06-06 09:41:22 +00:00
Timothyxxx
fb7bafb885
feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without
2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5
feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
...
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Danyang Zhang
7bf99cb823
Update 15c3b339-88f7-4a86-ab16-e71c58dcb01e.json
2025-05-06 16:29:35 +08:00
Danyang Zhang
e4097783bb
Update dfac9ee8-9bc4-4cdc-b465-4a4bfcd2f397.json
2025-05-06 16:28:52 +08:00
Thomas Kuntz
af993b3a3d
fix: Broken profile path in 3 Thunderbird tasks
2025-05-04 14:03:06 +02:00
Xubin Ren
1d10514125
Fix Search Engine Detection Discrepancy in Chrome Evaluation ( #172 )
...
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json
* Update __init__.py
* Update general.py
2025-04-10 17:24:50 +08:00
Parth A. Patel
bbfeecb475
fix: af2d657a-e6b3-4c6a-9f67-9e3ed015974c task config has type ( #169 )
...
Type on "examine_alignment" option results in false negatives
2025-04-06 02:20:51 +08:00
Timothyxxx
d373817edb
Modify VLC launch command and fullscreen detection
...
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Timothyxxx
13127de01e
Fix id
2025-03-03 18:26:32 +08:00
Timothyxxx
2f0f3f31aa
Fix Duplicate ids; Remove unused JSON files across multiple applications
2025-02-10 15:49:54 +08:00
MillanK
983283a86a
patch: minor bug fixes for evaluator and task configurations, documentation update ( #121 )
...
* fix: /cursor_position api return format fix
* chore: update README.md to remove deprecated command
* fix: add base score for evaluators and minor bug fixes
* fix: add base score for setup configurations
---------
Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local >
2025-01-18 22:25:18 +08:00
YangJL2003
3148973ce9
Update c1fa57f3-c3db-4596-8f09-020701085416.json
2025-01-14 22:56:32 +08:00
Timothyxxx
63e69cab08
Fix one instruction error in chrome 6766f2b8-8a72-417f-a9e5-56fcaa735837
2024-12-09 12:35:02 +08:00
Jiaqi DENG
e0d0041520
chore: modify windows evaluations samples
2024-09-21 23:46:39 +08:00
Timothyxxx
098549d621
Fix one answer
2024-08-15 22:35:57 +08:00
Timothyxxx
794b3ab469
Fix broken links
2024-08-15 01:29:47 +08:00
Timothyxxx
7b38e21b36
Re-org the files in multi_apps subset; fix broken links
2024-08-08 00:17:26 +08:00
tsuky_chen
b4bbe4a3b6
Update 0a211154-fda0-48d0-9274-eaac4ce5486d.json
2024-07-05 00:50:14 +08:00
Timothyxxx
25e808cc91
Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos)
2024-05-18 04:49:29 +08:00
David Chang
bc47886c8a
Merge branch 'main' of github.com:ztjhz/DesktopEnv
2024-05-16 10:59:07 +08:00
David Chang
74e400783a
ver May16th
...
updated a task config of thunderbird
2024-05-16 10:58:31 +08:00
Timothyxxx
09ffcc8542
Fix errors found in the examples (some broken links caused by Google Drive; dbus conflict)
2024-05-15 03:05:58 +08:00
tsuky_chen
02a6e4779b
fix
2024-04-05 02:01:39 +08:00
tsuky_chen
773fec2b3e
add windows exp
2024-04-05 01:37:43 +08:00
tsuky_chen
31ed626bdc
Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv
2024-03-26 17:10:04 +08:00
tsuky_chen
9bd37f0579
add windows example
2024-03-26 17:05:55 +08:00
Timothyxxx
635b6717b3
Fix a key error in multiapps
2024-03-25 17:55:28 +08:00
Timothyxxx
92760b29e1
Merge remote-tracking branch 'origin/main'
2024-03-21 22:05:40 +08:00
Timothyxxx
3ce7636abd
Fix one multi_app example; remove some broken examples; Support downsampling
2024-03-21 22:05:16 +08:00
tsuky_chen
ca03baacf5
fix conflict
2024-03-21 16:01:31 +08:00
tsuky_chen
3d2ff5d64e
fix when checking
2024-03-21 15:57:05 +08:00
tsuky_chen
169a0a15ad
add libreoffice examples for windows
2024-03-21 15:49:54 +08:00
David Chang
402fcf01d0
ver Mar21stv2
...
fixed error
2024-03-21 15:30:59 +08:00
David Chang
dac44b2c4f
ver Mar21st
...
Windows multi_app tasks
2024-03-21 15:03:21 +08:00
David Chang
aa3411f7e4
Merge branch 'zdy'
2024-03-20 22:25:45 +08:00
David Chang
15e01e7ccc
ver Mar20thv2
...
fixed bugs in server/main.py (_create_pywinauto_node and
get_screen_size)
finished migration of a few task configs to Windows
fixed bug in python.py
2024-03-20 22:22:57 +08:00
tsuky_chen
d2d4a54a3f
Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json
2024-03-20 20:45:45 +08:00
tsuky_chen
966339dee0
Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json
2024-03-20 20:38:27 +08:00
tsuky_chen
21e3ce5cba
Update 70745df8-f2f5-42bd-8074-fbc10334fcc5.json
2024-03-20 20:17:18 +08:00
tsuky_chen
2746bcfe24
Update c6bf789c-ba3a-4209-971d-b63abf0ab733.json
2024-03-20 20:15:04 +08:00
David Chang
6149061621
ver Mar20th
...
fixed a bug in _create_pywinauto_node
2024-03-20 14:25:09 +08:00