Commit Graph

47 Commits

Author SHA1 Message Date
b75f6bf341 feat: 增强任务步骤注入与a11y状态表达,提升树形交互稳定性
- 打通 metadata.steps 传递链路,将任务步骤注入 agent 预测上下文

- 优化 a11y tree 线性化输出:使用中心坐标并新增 states 列(expanded/collapsed/selected 等)

- 放宽可保留节点条件,保留无文本输入类控件(edit/textfield/searchbox 等)

- 强化输出约束:单轮仅允许动作代码或 WAIT/DONE/FAIL,禁止动作与 DONE 同轮返回

- 补充 avogadro 示例步骤:展开 aromatics 并选择 benzene.cjson
2026-02-26 18:56:53 +08:00
07e66490dd feat: 增强科研软件的 a11y tree 支持
- 扩展 heuristic_retrieve.py 白名单以覆盖科研软件 GUI 框架:
  - 新增 prefix 规则: sunawt (Java Swing), qt5q/qt6q (Qt), ovito, pymol,
    contentspanel, wx (wxWidgets), afx (MFC), thunderrt (VB6)
  - 新增 endswith 规则: edit, widget, box, dialog, view, frame, menuitem,
    menubar, toolbar, tabitem, treeitem, window
  - 新增 Qt 控件和 Win32 控件的精确匹配
- 在 agent.py 中添加原始 a11y tree 的调试日志
- 修复 run.py 中 agent 初始化缺少 platform='windows' 的问题
- 添加 NO_PROXY 绕过本地/VM IP (兼容 Clash 全局代理)
- lib_run_single.py 中应用启动等待时间增加到 15 秒
- 新增 test_each_domain_a11y_tree.json (每个域一个任务用于 a11y 验证)
2026-02-26 15:04:28 +08:00
cui0711
47bcfc0f0b feat(agent): add screenshot compression and dynamic resolution support 2026-01-30 16:28:02 +08:00
Meshal Nayim
9540454b0a Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388) 2025-12-09 15:59:25 +08:00
yuanmengqi
523d553e88 feat: add client password argument to multiple agents and scripts
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
2025-07-27 16:11:23 +00:00
张逸群
bf78b6d05e Add OPENAI_BASE_URL support for custom OpenAI-compatible endpoints (#283)
Enables GPT models to use custom API endpoints through OPENAI_BASE_URL environment variable. This addresses the limitation where only Azure OpenAI supported custom endpoints while standard GPT models were hardcoded to api.openai.com.

- Add intelligent URL handling to avoid duplicate /v1 paths
- Maintain backward compatibility with default OpenAI API
- Update README with configuration instructions
- Non-breaking change preserving existing functionality

Fixes API integration issues for users with custom OpenAI-compatible services.
2025-07-24 12:31:08 +08:00
uvheart
a845824f06 add azure_gpt_4o (#197) 2025-05-23 03:57:42 +08:00
Thomas Kuntz
7d88283f8a feat: Support newer Gemini models (#188) 2025-05-06 16:04:30 +08:00
Timothyxxx
2c8e8a58f6 Fix minor bug caused by new logging feat in aguvis agent traj 2024-12-05 15:45:09 +08:00
Tianbao Xie
a156f8a3d6 Modify the namespace of a11y tree (#62) 2024-07-25 20:20:34 +08:00
Timothyxxx
cfc5500a8a Merge remote-tracking branch 'origin/main' 2024-05-21 21:08:43 +08:00
Timothyxxx
306dcbda71 Add Support for QWEN VL models from API (QWEN-VL-max, etc.); Improve on the robustness of getting observation/files, etc. 2024-05-21 21:08:22 +08:00
Timothyxxx
5568dfd141 Handling more exceptions; Fix hyperparameter passing 2024-05-20 17:22:07 +08:00
Timothyxxx
f9594e476e Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation 2024-05-20 00:47:43 +08:00
Timothyxxx
54905380e6 Add Llama3-70B Support (from Groq) 2024-05-09 02:04:02 +08:00
Timothyxxx
97b567a287 Update README and ROADMAP; Fix typos; optimize the code for llm calling in agent.py 2024-04-26 13:32:41 +08:00
Timothyxxx
eaceddf917 Add Gemini Pro 1.5 Support 2024-04-24 18:19:25 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
26ed70ef70 Clean Code; Refactor README 2024-03-27 16:21:49 +08:00
Timothyxxx
607cf8e554 Fix max traj length 2024-03-25 18:09:43 +08:00
Timothyxxx
172123ab2c Support downsampling; Fix bugs in windows a11y tree; Add a11y_tree trim 2024-03-25 18:02:48 +08:00
Yiheng Xu
5f2802292a Update agent.py 2024-03-22 12:54:22 +08:00
Timothyxxx
3ce7636abd Fix one multi_app example; remove some broken examples; Support downsampling 2024-03-21 22:05:16 +08:00
Fangyu Lei
3e581c8108 Update agent.py claude 2024-03-21 07:52:58 +08:00
Siheng Zhao
04a9df627c Merge branch 'main' of github.com:ztjhz/DesktopEnv 2024-03-20 22:42:01 +08:00
Siheng Zhao
6927d9e39d [feature] add image downsample func 2024-03-20 22:41:05 +08:00
Timothyxxx
d1e2b12b41 Fix GIMP bug; Speedup the environment, when there is not a11y tree needed, we can do no controller.get 2024-03-20 22:22:59 +08:00
David Chang
4df088e2ad ver Mar19thv2
supplemented at info back for som setting
2024-03-19 18:41:55 +08:00
David Chang
05336a8ecf Merge branch 'main' into zdy 2024-03-19 17:47:23 +08:00
Fangyu Lei
41db4b44e7 Update agent.py mixtral 2024-03-19 12:06:33 +08:00
David Chang
3db0591868 ver Mar18th
checked Claude agent
2024-03-18 17:42:13 +08:00
Timothyxxx
204a2b949f Update claude endpoint 2024-03-18 14:56:23 +08:00
Jason Lee
716cf7b9ff add wandb settings 2024-03-17 22:31:43 +08:00
Jason Lee
48aedb09a7 add wandb settings, remember to set WANDB_KEY 2024-03-17 22:30:29 +08:00
lfy79001
acc2d41bdb add mixtral cogagent 2024-03-17 22:27:59 +08:00
Timothyxxx
e156a20e3d Update new func 2024-03-17 22:25:13 +08:00
lfy79001
505e772463 claude3_agent_code 2024-03-16 11:57:49 +08:00
lfy79001
684b4a1b7b claude3_agnet_code 2024-03-16 11:27:09 +08:00
lfy79001
3b13046745 add claude3 agent code 2024-03-16 01:40:41 +08:00
lfy79001
017dde8966 add claude3 agent code 2024-03-16 01:37:42 +08:00
David Chang
57f2257254 ver Mar15th
fixed bugs about infeasible task evaluation
2024-03-15 22:49:35 +08:00
Timothyxxx
1ad4527e8b Change SoM input and output 2024-03-15 22:10:35 +08:00
Timothyxxx
4db207fc27 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	mm_agents/agent.py
#	run.py
2024-03-15 21:10:32 +08:00
Timothyxxx
5cbf1b28ca Fix bugs 2024-03-15 21:06:50 +08:00
Jason Lee
815c7ab67c filter unfinished examples and add timer to ensure upper limit of each example 2024-03-15 16:52:17 +08:00
Timothyxxx
44ff027801 Refactor experiments and agent implementation 2024-03-14 22:32:49 +08:00
Timothyxxx
71ca8fbe1c refactor on exp code 2024-03-14 19:25:25 +08:00