Commit Graph

14 Commits

Author SHA1 Message Date
yuanmengqi
e433f35c1f feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
  - Set to `true` for 8 files listed in google_chrome.json multi_apps
  - Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
  - Set to "low" for newly added fields
  - Preserve existing values (4 medium, 2 high) for 6 files that already had this field

This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
tsuky_chen
2f87aae0cb minor fix 2024-02-08 01:51:40 +08:00
tsuky_chen
0e07964c63 update writer vscode examples 2024-02-08 01:40:39 +08:00
rhythmcao
8b42d699af fix Desktop path error, revise main.py and update google writer tutorial 2024-02-06 21:45:03 +08:00
Timothyxxx
343813a29b Add impress examples; remove the auto-saving pyautogui commands change to libreoffice pre-setting 2024-01-29 21:34:58 +08:00
Timothyxxx
63852755d2 Make up postconfig for libreoffice writer examples 2024-01-27 11:40:05 +08:00
tsuky_chen
96d2e09054 update writer examples 9-15 2024-01-11 21:28:11 +08:00
tsuky_chen
b20027884a update writer examples 1-8 2024-01-11 18:52:30 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
tsuky_chen
c937e31b18 add eval libreoffice writer compare table & equation 2023-12-31 01:02:27 +08:00
Timothyxxx
e891eedfde libreoffice impress and writer initialization 2023-12-25 01:40:39 +08:00