Commit Graph

22 Commits

Author SHA1 Message Date
yuanmengqi
0651495d88 fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
Jason Lee
812be97a41 Merge branch 'main' of github.com:xlang-ai/DesktopEnv 2024-03-10 14:50:17 +08:00
Jason Lee
775cef744f xiaochuan correct his bugs in multiapp examples, you can try it again now 2024-03-10 14:48:56 +08:00
Timothyxxx
447c886b0a Fix multiple apps 5990457f-2adb-467b-a4af-5c857c92d762 2024-03-09 20:54:52 +08:00
Timothyxxx
b0607c4f79 Fix bugs imported by Xiaochuan xs 2024-03-09 19:32:05 +08:00
Jason Lee
6ea3dd856f fix multiapps bug : "26660ad1-6ebb-4f59-8cba-a8432dfe8d38" 2024-03-09 14:03:26 +08:00
Tianbao Xie
f01153cadd Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-03-08 20:45:49 +08:00
Jason Lee
62fd8feebb xiaochuan's multiapp examples 2024-03-08 19:24:15 +08:00
rhythmcao
da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang
19214f2107 ver Jan17thv2
updated compare_table with compare the shown value through exported csv
2024-01-17 22:43:26 +08:00
Timothyxxx
a1c3e4c294 Finish Chrome example loading v1 2024-01-13 22:56:50 +08:00
David Chang
cebae4b183 Merge branch 'main' into zdy 2024-01-10 22:16:25 +08:00
David Chang
1515b05666 ver Jan10thv2
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
Timothyxxx
abcafce750 VLC updates, and some infra bugs fix 2024-01-09 23:14:06 +08:00
David Chang
26b7d9010d Merge branch 'zdy' 2024-01-05 15:55:41 +08:00
David Chang
5fedf5b891 ver Jan4th
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx
ab71ebb2ba Initialize VLC getters and metrics, fix some bugs in infra logic, needs to be refactored later on 2024-01-04 17:05:17 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
David Chang
aa76a57437 Merge branch 'zdy' 2023-12-22 14:11:26 +08:00
David Chang
f4664bd069 ver Dec22nd
re-organized the evaluator structure to improve the extensibility
2023-12-22 14:01:26 +08:00