|
|
dc5fd173f1
|
data: update avogadro building-metal-complexes task1 & task3
|
2026-03-13 17:19:44 +08:00 |
|
|
|
19795a674b
|
chore: gitignore 添加 demo_task3 录制产物
|
2026-03-11 11:13:23 +08:00 |
|
|
|
349f2142fb
|
fix: vllm_eval 默认使用原始分辨率进行评估
|
2026-03-11 11:06:01 +08:00 |
|
|
|
a943c1e961
|
feat: 更新 Jade/VESTA 任务定义 + 最终评测清单
- Jade: 15个任务JSON更新 (instruction细化 + metadata.steps详细展开)
- VESTA: 10个任务JSON重构 (统一使用NaCl.cif/anatase_TiO2.cif + 步骤重写)
- VESTA: 删除task1, 新增2个CIF数据文件
- 新增 test_final.json (11 jade + 10 vesta = 21 tasks)
- run_proxmox.sh: MODEL→gpt-5.4, MAX_STEPS→35, TEST_META→test_final.json
|
2026-03-11 11:02:26 +08:00 |
|
|
|
d71f1f976d
|
feat: vllm_eval 关键帧采样 + Gemini OpenAI 代理支持
- vllm_eval.py: 新增 _sample_key_frames 关键帧采样函数
- vllm_eval.py: 当截图超过 max_eval_images 时均匀采样
- vllm_eval.py: Gemini 模型支持通过 OpenAI 兼容代理调用
- test_single.json: 更新测试任务配置
|
2026-03-04 16:39:24 +08:00 |
|
|
|
4bde685bbd
|
feat: 新增 Proxmox provider 支持及 inject_steps 参数
- 新增 desktop_env/providers/proxmox/ (manager + provider)
- desktop_env.py: 添加 proxmox 到 provider 名称列表
- providers/__init__.py: 工厂函数注册 proxmox provider
- run.py: 新增 --inject_steps/--no_inject_steps 参数
- run_proxmox.sh: Proxmox 运行脚本
|
2026-03-04 16:39:08 +08:00 |
|
|
|
e70f1335f0
|
config: 更新测试任务配置文件
|
2026-03-04 10:44:00 +08:00 |
|
|
|
9431bd5bfc
|
data: 精炼已有 avogadro/imagej/origin/ovito/pymol/vesta 任务的 metadata steps
|
2026-03-04 10:43:49 +08:00 |
|
|
|
b1052c79cf
|
data: 新增 jade/avogadro/ovito/pymol 评测任务数据
|
2026-03-04 10:43:29 +08:00 |
|
|
|
ac3f38ed58
|
feat: 新增 refine_metadata 脚本,更新 extract_instructions_v2
|
2026-03-04 10:43:14 +08:00 |
|
|
|
e4b039fc02
|
refine jade metadata steps: add shortcuts & merge menu operations to avoid timeout
|
2026-02-27 18:19:04 +08:00 |
|
|
|
b75f6bf341
|
feat: 增强任务步骤注入与a11y状态表达,提升树形交互稳定性
- 打通 metadata.steps 传递链路,将任务步骤注入 agent 预测上下文
- 优化 a11y tree 线性化输出:使用中心坐标并新增 states 列(expanded/collapsed/selected 等)
- 放宽可保留节点条件,保留无文本输入类控件(edit/textfield/searchbox 等)
- 强化输出约束:单轮仅允许动作代码或 WAIT/DONE/FAIL,禁止动作与 DONE 同轮返回
- 补充 avogadro 示例步骤:展开 aromatics 并选择 benzene.cjson
|
2026-02-26 18:56:53 +08:00 |
|
|
|
07e66490dd
|
feat: 增强科研软件的 a11y tree 支持
- 扩展 heuristic_retrieve.py 白名单以覆盖科研软件 GUI 框架:
- 新增 prefix 规则: sunawt (Java Swing), qt5q/qt6q (Qt), ovito, pymol,
contentspanel, wx (wxWidgets), afx (MFC), thunderrt (VB6)
- 新增 endswith 规则: edit, widget, box, dialog, view, frame, menuitem,
menubar, toolbar, tabitem, treeitem, window
- 新增 Qt 控件和 Win32 控件的精确匹配
- 在 agent.py 中添加原始 a11y tree 的调试日志
- 修复 run.py 中 agent 初始化缺少 platform='windows' 的问题
- 添加 NO_PROXY 绕过本地/VM IP (兼容 Clash 全局代理)
- lib_run_single.py 中应用启动等待时间增加到 15 秒
- 新增 test_each_domain_a11y_tree.json (每个域一个任务用于 a11y 验证)
|
2026-02-26 15:04:28 +08:00 |
|
|
|
9899d4a0c7
|
feat: 新增科研软件 benchmark 任务数据
- 新增 avogadro/imagej/jade/origin/ovito/pymol/vesta 等科研软件任务 JSON
- 修改 vllm_eval.py,修改图片文件名称为第x步
- desktop_env.py 添加额外数据参数 config 和 metadata
|
2026-02-25 15:19:36 +08:00 |
|
cui0711
|
613f55f0da
|
feat(tools): add instructions extraction script for generating test cases
|
2026-02-09 17:47:02 +08:00 |
|
cui0711
|
ba03784196
|
fix(env): handle None result_getter for vllm_eval evaluator
|
2026-02-09 17:46:05 +08:00 |
|
cui0711
|
3890ee5fc3
|
fix(vllm_eval): add image compression to prevent 413 error with large max_steps
|
2026-02-09 14:24:59 +08:00 |
|
cui0711
|
9bc54c0a66
|
feat(vllm_eval): add structured JSON response format with step analysis
|
2026-02-09 13:58:14 +08:00 |
|
cui0711
|
1e9281a1ab
|
feat(cli): add eval_model argument
|
2026-02-05 16:56:39 +08:00 |
|
cui0711
|
63484c7b7b
|
fix(runner): pass result_dir to evaluate and re-enable environment reset
|
2026-02-05 16:55:49 +08:00 |
|
cui0711
|
ad46acc5f3
|
refactor(example): replace check_include_exclude with vllm_eval evaluator
|
2026-02-05 16:55:03 +08:00 |
|
cui0711
|
58d411bf86
|
feat(evaluator): export vllm_eval module
|
2026-02-05 16:54:16 +08:00 |
|
cui0711
|
be24e77d93
|
feat(env): add eval_model parameter and result_dir support for vllm evaluation
|
2026-02-05 16:53:12 +08:00 |
|
cui0711
|
dd58a1de03
|
feat(evaluator): add vision-language model evaluator
|
2026-02-05 16:52:35 +08:00 |
|
cui0711
|
231f7a8fbc
|
feat(eval): add jade test case and update test categories
|
2026-01-30 16:29:05 +08:00 |
|
cui0711
|
716d82f4d1
|
feat: add flexible recording control and improve execution logging
|
2026-01-30 16:28:13 +08:00 |
|
cui0711
|
47bcfc0f0b
|
feat(agent): add screenshot compression and dynamic resolution support
|
2026-01-30 16:28:02 +08:00 |
|
cui0711
|
7e9090e115
|
fix(prompts): fix template variable syntax and add dynamic resolution
|
2026-01-30 16:28:02 +08:00 |
|
cui0711
|
308282e830
|
feat(server): add cross-platform support and improve screenshot handling
|
2026-01-30 16:27:49 +08:00 |
|
cui0711
|
788b248dbc
|
fix(logger): add Windows platform support for file locking
|
2026-01-30 16:27:49 +08:00 |
|
alexandruilie7
|
5463d3bb89
|
uipath v2 (#413)
* submission v2
* small updates
|
2026-01-09 08:47:20 +08:00 |
|
蘑菇先生
|
5ef8bdfa35
|
EvoCUA Update (2025.01.05) (#412)
* evocua init
* setup max_token
* evocua update
---------
Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
|
2026-01-05 16:14:53 +08:00 |
|
Bowen Yang
|
439e178a2e
|
fix(os_symphony_evaluation) (#410)
* fix(os_symphony)
* Update desktop_env_os_symphony.py
* fix(os_symphony_desktop)
* fix(os_symphony_start)
* Add docstring to run_multienv_os_symphony.py
Added documentation header for the evaluation script.
|
2026-01-04 15:56:51 +08:00 |
|
Bowen Yang
|
951e1928c8
|
fix(desktop_os_symphony):support aws (#406)
* fix(os_symphony)
* Update desktop_env_os_symphony.py
|
2026-01-01 11:27:34 +08:00 |
|
Bowen Yang
|
02a35be067
|
fix(os_symphony) (#405)
|
2025-12-30 22:43:47 +08:00 |
|
Bowen Yang
|
662826f57e
|
fix(os_symphony):prompt (#402)
* add_os_symphony
* fix(os_symphony)
* fix(os_symphony):prompt
---------
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
|
2025-12-29 20:45:36 +08:00 |
|
xuetf
|
410ec63a89
|
Add EvoCUA Support (#401)
* evocua init
* setup max_token
---------
Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
|
2025-12-23 20:46:23 +08:00 |
|
Bowen Yang
|
031696e83c
|
fix os_symphony (#400)
* add_os_symphony
* fix(os_symphony)
---------
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
|
2025-12-23 20:45:30 +08:00 |
|
Bowen Yang
|
f593f35b1c
|
add_os_symphony (#399)
|
2025-12-23 14:30:44 +08:00 |
|
Ubuntu
|
ac31778ee3
|
Update: requirements.txt for seed agent
|
2025-12-15 11:47:56 +00:00 |
|
Ubuntu
|
60caa52fc4
|
Update: requirements.txt for seed agent
|
2025-12-15 11:47:40 +00:00 |
|
Ubuntu
|
41477a9c40
|
Update: seed agent
|
2025-12-15 11:45:57 +00:00 |
|
Ubuntu
|
78433ecfcf
|
Add agent: seed agent
|
2025-12-12 05:35:20 +00:00 |
|
Meshal Nayim
|
9540454b0a
|
Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388)
|
2025-12-09 15:59:25 +08:00 |
|
MillanK
|
cbc3b590ff
|
Task fix batch (#383)
* update 873cafdd-a581-47f6-8b33-b9696ddb7b05 task eval
* c1fa57f3-c3db-4596-8f09-020701085416 fix, add tolerance to url matching
* 8df7e444-8e06-4f93-8a1a-c5c974269d82 add more clear instruction to the filename for compress
* add address string normalization for 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a
---------
Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
|
2025-11-19 17:24:25 +08:00 |
|
Qichen Fu
|
903ed36715
|
Add Claude Sonnet 4.5 support and improve action handling (#362)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-11-14 13:54:32 +08:00 |
|
Subash Shibu
|
3167339e45
|
Add hosted GBOX agent for OSWorld evaluation (#376)
|
2025-11-13 13:13:31 +08:00 |
|
Pengxiang-Li
|
00b6468eb7
|
feat/dart_gui (#371)
|
2025-11-07 21:50:01 +08:00 |
|
yiqilin
|
6d43dbc532
|
Update GIMP evaluation examples to replace local file paths with cloud file URLs for consistency and accessibility. (#372)
|
2025-11-07 21:49:49 +08:00 |
|
Timothyxxx
|
8365edc975
|
Add new section in README for OSWorld-MCP project
|
2025-10-30 06:06:48 +00:00 |
|