Commit Graph

473 Commits

Author SHA1 Message Date
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
630f92fd7c fix: correct URL encoding in JSON examples for invoice paths 2025-06-09 08:06:27 +00:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
yuanmengqi
3e541bb393 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-08 04:01:35 +00:00
yuanmengqi
8853671220 fix: enhance instruction clarity and adjust timing in automation script for LibreOffice Impress example 2025-06-07 21:17:00 +00:00
yuanmengqi
9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
yuanmengqi
8471394cc1 add branch feat/aws-provider-support 2025-06-07 15:57:18 +00:00
yuanmengqi
f48d80002f Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-07 13:22:53 +00:00
yuanmengqi
8d0ff7c99c refactor: update VLC command configurations to suppress audio and video title display across multiple JSON examples 2025-06-07 09:02:49 +00:00
yuanmengqi
e61acece84 problems from the community 2025-06-07 05:30:40 +00:00
yuanmengqi
a146c1e0b7 edit prompt 2025-06-07 05:21:04 +00:00
yuanmengqi
4ea24ddfd3 add proxy 2025-06-06 09:41:22 +00:00
yuanmengqi
a6300e05c9 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-05 13:31:42 +00:00
yuanmengqi
71578d994e edit 2025-06-05 13:29:16 +00:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
yuanmengqi
b211df3385 fix timeout 2025-06-04 10:23:45 +00:00
yuanmengqi
98a810d31e edit operator 2025-06-02 12:11:25 +00:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Danyang Zhang
7bf99cb823 Update 15c3b339-88f7-4a86-ab16-e71c58dcb01e.json 2025-05-06 16:29:35 +08:00
Danyang Zhang
e4097783bb Update dfac9ee8-9bc4-4cdc-b465-4a4bfcd2f397.json 2025-05-06 16:28:52 +08:00
Thomas Kuntz
af993b3a3d fix: Broken profile path in 3 Thunderbird tasks 2025-05-04 14:03:06 +02:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Parth A. Patel
bbfeecb475 fix: af2d657a-e6b3-4c6a-9f67-9e3ed015974c task config has type (#169)
Type on "examine_alignment" option results in false negatives
2025-04-06 02:20:51 +08:00
Timothyxxx
d373817edb Modify VLC launch command and fullscreen detection
- Add VLC_VERBOSE=-1 to suppress verbose logging in VLC launch commands across multiple example files
- Update is_vlc_fullscreen function to handle cases where screen size or window size is None
- Improve robustness of VLC-related metrics and example configurations
2025-03-06 22:11:42 +08:00
Timothyxxx
13127de01e Fix id 2025-03-03 18:26:32 +08:00
Timothyxxx
2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
YangJL2003
3148973ce9 Update c1fa57f3-c3db-4596-8f09-020701085416.json 2025-01-14 22:56:32 +08:00
Timothyxxx
63e69cab08 Fix one instruction error in chrome 6766f2b8-8a72-417f-a9e5-56fcaa735837 2024-12-09 12:35:02 +08:00
Tianbao Xie
afba17b510 Server setup readme revision (#108)
* Initialize

* add note for resolution

* Organize

* draft version and todos

* ver Nov24th

supplemented socat installation and switching off automatic suspend and
  screen-off

* Finish Tianbao todos

* Finish Tianbao todos

* Fix typos

* update font install

* Finish Xiaochuan's Part

* Finish Xiaochuan's Part update

* Update README.md

* Fix format

---------

Co-authored-by: zdy023 <zdy004007@126.com>
Co-authored-by: tsuky_chen <3107760494@qq.com>
Co-authored-by: Jason Lee <lixiaochuan20@gmail.com>
Co-authored-by: Siheng Zhao <77528902+sihengz02@users.noreply.github.com>
2024-11-25 16:30:59 +08:00
Jiaqi DENG
e0d0041520 chore: modify windows evaluations samples 2024-09-21 23:46:39 +08:00
Timothyxxx
098549d621 Fix one answer 2024-08-15 22:35:57 +08:00
Timothyxxx
794b3ab469 Fix broken links 2024-08-15 01:29:47 +08:00
Timothyxxx
7b38e21b36 Re-org the files in multi_apps subset; fix broken links 2024-08-08 00:17:26 +08:00
tsuky_chen
b4bbe4a3b6 Update 0a211154-fda0-48d0-9274-eaac4ce5486d.json 2024-07-05 00:50:14 +08:00
Timothyxxx
25e808cc91 Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos) 2024-05-18 04:49:29 +08:00
David Chang
bc47886c8a Merge branch 'main' of github.com:ztjhz/DesktopEnv 2024-05-16 10:59:07 +08:00
David Chang
74e400783a ver May16th
updated a task config of thunderbird
2024-05-16 10:58:31 +08:00
rhythmcao
5271ae69e0 add README for google acount and google drive 2024-05-15 12:42:22 +08:00
Timothyxxx
09ffcc8542 Fix errors found in the examples (some broken links caused by Google Drive; dbus conflict) 2024-05-15 03:05:58 +08:00
tsuky_chen
02a6e4779b fix 2024-04-05 02:01:39 +08:00
tsuky_chen
773fec2b3e add windows exp 2024-04-05 01:37:43 +08:00
tsuky_chen
31ed626bdc Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-26 17:10:04 +08:00
tsuky_chen
9bd37f0579 add windows example 2024-03-26 17:05:55 +08:00
Timothyxxx
635b6717b3 Fix a key error in multiapps 2024-03-25 17:55:28 +08:00
Timothyxxx
d4e81afae7 Update small_test set 2024-03-21 23:06:27 +08:00
Timothyxxx
c34d1b37a5 Update small_test set 2024-03-21 22:38:02 +08:00
Timothyxxx
92760b29e1 Merge remote-tracking branch 'origin/main' 2024-03-21 22:05:40 +08:00