Commit Graph

40 Commits

Author SHA1 Message Date
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Timothyxxx
2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
Timothyxxx
1610358e08 Fix typos and examples in libreoffice_writer examples 2024-02-23 15:37:51 +08:00
Timothyxxx
e1cf8da4e0 Fix the infeasible examples support 2024-02-21 21:22:12 +08:00
Timothyxxx
1184cffd5f Update infeasible of libreoffice writer, vlc and thunderbird 2024-02-15 15:31:58 +08:00
tsuky_chen
2f87aae0cb minor fix 2024-02-08 01:51:40 +08:00
tsuky_chen
0e07964c63 update writer vscode examples 2024-02-08 01:40:39 +08:00
rhythmcao
8b42d699af fix Desktop path error, revise main.py and update google writer tutorial 2024-02-06 21:45:03 +08:00
rhythmcao
5c6748d39a fix error in writer 2024-02-02 05:33:03 +08:00
rhythmcao
538b9928fe fix some problems in libreoffice writer 2024-02-02 02:23:25 +08:00
Timothyxxx
59e2417a08 Add Mistral, Qwen, Gemini support; Fix minor bugs 2024-02-01 16:55:38 +08:00
tsuky_chen
84cb8ba56d Update 0810415c-bde4-4443-9047-d5f70165a697.json 2024-01-30 21:46:20 +08:00
Timothyxxx
1756d3b672 Fix writer examples 2024-01-30 01:25:30 +08:00
Timothyxxx
343813a29b Add impress examples; remove the auto-saving pyautogui commands change to libreoffice pre-setting 2024-01-29 21:34:58 +08:00
thomasshin
4b8cab0805 check_highlighted_words modified 2024-01-28 14:36:28 +08:00
Timothyxxx
ce0eafaa0d Fix some errors found in writer examples 2024-01-27 20:37:02 +08:00
Timothyxxx
f8ff612b85 Fix errors found in libreoffice writer examples 2024-01-27 14:09:39 +08:00
Timothyxxx
63852755d2 Make up postconfig for libreoffice writer examples 2024-01-27 11:40:05 +08:00
tsuky_chen
3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
tsuky_chen
35c4ce99ff modified libreoffice writer eval examples 2024-01-23 22:02:09 +08:00
thomasshin
61b145ab13 add writer evals 8 examples 2024-01-22 23:22:44 +08:00
tsuky_chen
96d2e09054 update writer examples 9-15 2024-01-11 21:28:11 +08:00
tsuky_chen
b20027884a update writer examples 1-8 2024-01-11 18:52:30 +08:00
tsuky_chen
ad21037f93 Update 0e47de2a-32e0-456c-a366-8c607ef7a9d2.json 2024-01-11 14:01:11 +08:00
tsuky_chen
b224fdec31 Update 6ada715d-3aae-4a32-a6a7-429b2e43fb93.json 2024-01-11 13:56:19 +08:00
tsuky_chen
bb8bade1ca Update ecc2413d-8a48-416e-a3a2-d30106ca36cb.json 2024-01-11 13:54:37 +08:00
tsuky_chen
7fa1bf5cdd Update 6ada715d-3aae-4a32-a6a7-429b2e43fb93.json 2024-01-11 13:51:50 +08:00
tsuky_chen
ab2a49fed9 Update 4bcb1253-a636-4df4-8cb0-a35c04dfef31.json 2024-01-11 13:47:22 +08:00
tsuky_chen
e45cc2673b Update 4bcb1253-a636-4df4-8cb0-a35c04dfef31.json 2024-01-11 13:07:19 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
tsuky_chen
f04e625ad9 add eval libreoffice writer compare image & centering & check file existence 2023-12-31 03:17:53 +08:00
tsuky_chen
52af1b6dd4 add eval libreoffice writer compare font & subscript & page number 2023-12-31 02:33:39 +08:00
tsuky_chen
c937e31b18 add eval libreoffice writer compare table & equation 2023-12-31 01:02:27 +08:00
tsuky_chen
2d493759e3 add eval libreoffice write compare content 2023-12-30 18:21:39 +08:00
tsuky_chen
24f33dc9bf add eval libreoffice writer font & page break 2023-12-30 16:32:15 +08:00
Timothyxxx
e891eedfde libreoffice impress and writer initialization 2023-12-25 01:40:39 +08:00