Commit Graph

14 Commits

Author SHA1 Message Date
yuanmengqi
e433f35c1f feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
  - Set to `true` for 8 files listed in google_chrome.json multi_apps
  - Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
  - Set to "low" for newly added fields
  - Preserve existing values (4 medium, 2 high) for 6 files that already had this field

This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
tsuky_chen
2f87aae0cb minor fix 2024-02-08 01:51:40 +08:00
tsuky_chen
0e07964c63 update writer vscode examples 2024-02-08 01:40:39 +08:00
rhythmcao
8b42d699af fix Desktop path error, revise main.py and update google writer tutorial 2024-02-06 21:45:03 +08:00
Timothyxxx
f8ff612b85 Fix errors found in libreoffice writer examples 2024-01-27 14:09:39 +08:00
tsuky_chen
96d2e09054 update writer examples 9-15 2024-01-11 21:28:11 +08:00
tsuky_chen
ab2a49fed9 Update 4bcb1253-a636-4df4-8cb0-a35c04dfef31.json 2024-01-11 13:47:22 +08:00
tsuky_chen
e45cc2673b Update 4bcb1253-a636-4df4-8cb0-a35c04dfef31.json 2024-01-11 13:07:19 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
tsuky_chen
f04e625ad9 add eval libreoffice writer compare image & centering & check file existence 2023-12-31 03:17:53 +08:00
Timothyxxx
e891eedfde libreoffice impress and writer initialization 2023-12-25 01:40:39 +08:00