sci-gui-agent-benchmark

Files

lizhanyuan d71f1f976d feat: vllm_eval 关键帧采样 + Gemini OpenAI 代理支持

- vllm_eval.py: 新增 _sample_key_frames 关键帧采样函数
- vllm_eval.py: 当截图超过 max_eval_images 时均匀采样
- vllm_eval.py: Gemini 模型支持通过 OpenAI 兼容代理调用
- test_single.json: 更新测试任务配置

2026-03-04 16:39:24 +08:00

__init__.py

feat(evaluator): export vllm_eval module

2026-02-05 16:54:16 +08:00

basic_os.py

Code clean

2024-03-14 12:54:10 +08:00

chrome.py

Increase timeout for page load stability in Chrome evaluator

2025-07-18 14:16:16 +00:00

docs.py

feat: add X11 image handling and enhanced OCR processing

2025-07-18 19:26:29 +00:00

general.py

fix: Enhance error handling and logging across multiple evaluators

2025-07-14 05:43:17 +00:00

gimp.py

feat: enhance image comparison functionality in gimp.py

2025-07-30 06:07:49 +00:00

libreoffice.py

Code clean

2024-03-14 12:54:10 +08:00

others.py

fix: improve EPUB processing by checking for file existence before reading

2025-07-26 20:42:18 +00:00

pdf.py

Code clean

2024-03-14 12:54:10 +08:00

slides.py

fix: Enhance error handling and logging across multiple evaluators

2025-07-14 05:43:17 +00:00

table.py

Task fix batch (#383 )

2025-11-19 17:24:25 +08:00

thunderbird.py

Code clean

2024-03-14 12:54:10 +08:00

utils.py

Calc eval fix (#273 )

2025-07-19 17:15:40 +08:00

vlc.py

fix compare_videos in vlc.py (#242 )

2025-07-08 16:25:00 +08:00

vllm_eval.py

feat: vllm_eval 关键帧采样 + Gemini OpenAI 代理支持

2026-03-04 16:39:24 +08:00

vscode.py

Enhance check_python_file_by_test_suite function with robust error handling and logging. Added validation for file existence, module loading, and function execution. Improved resource cleanup and working directory management to ensure stability and reliability during test execution.

2025-07-16 11:44:46 +00:00