- 扩展 heuristic_retrieve.py 白名单以覆盖科研软件 GUI 框架:
- 新增 prefix 规则: sunawt (Java Swing), qt5q/qt6q (Qt), ovito, pymol,
contentspanel, wx (wxWidgets), afx (MFC), thunderrt (VB6)
- 新增 endswith 规则: edit, widget, box, dialog, view, frame, menuitem,
menubar, toolbar, tabitem, treeitem, window
- 新增 Qt 控件和 Win32 控件的精确匹配
- 在 agent.py 中添加原始 a11y tree 的调试日志
- 修复 run.py 中 agent 初始化缺少 platform='windows' 的问题
- 添加 NO_PROXY 绕过本地/VM IP (兼容 Clash 全局代理)
- lib_run_single.py 中应用启动等待时间增加到 15 秒
- 新增 test_each_domain_a11y_tree.json (每个域一个任务用于 a11y 验证)
Evaluation examples
Here we put the data examples to benchmark the ability of agents when interacting with GUI.
The examples are stored in ./examples where each data item formatted as:
{
"id": "uid", # unique id
"snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
"instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
"source": "website_url", # where we know this example, some forum, or some website, or some paper
"config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
# (coming in next project) "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
"related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
"evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example
…
}
The ./trajectories file contains the annotated trajectories for each data item in ./examples for finishing the task.
For now, it is under construction, and only tested on Windows 10. Please:
- Modify the path accordingly to run the evaluation;
- Remind us if some parts are overfit to our environment.