Commit Graph

1384 Commits

Author SHA1 Message Date
cui0711
613f55f0da feat(tools): add instructions extraction script for generating test cases 2026-02-09 17:47:02 +08:00
cui0711
ba03784196 fix(env): handle None result_getter for vllm_eval evaluator 2026-02-09 17:46:05 +08:00
cui0711
3890ee5fc3 fix(vllm_eval): add image compression to prevent 413 error with large max_steps 2026-02-09 14:24:59 +08:00
cui0711
9bc54c0a66 feat(vllm_eval): add structured JSON response format with step analysis 2026-02-09 13:58:14 +08:00
cui0711
1e9281a1ab feat(cli): add eval_model argument 2026-02-05 16:56:39 +08:00
cui0711
63484c7b7b fix(runner): pass result_dir to evaluate and re-enable environment reset 2026-02-05 16:55:49 +08:00
cui0711
ad46acc5f3 refactor(example): replace check_include_exclude with vllm_eval evaluator 2026-02-05 16:55:03 +08:00
cui0711
58d411bf86 feat(evaluator): export vllm_eval module 2026-02-05 16:54:16 +08:00
cui0711
be24e77d93 feat(env): add eval_model parameter and result_dir support for vllm evaluation 2026-02-05 16:53:12 +08:00
cui0711
dd58a1de03 feat(evaluator): add vision-language model evaluator 2026-02-05 16:52:35 +08:00
cui0711
231f7a8fbc feat(eval): add jade test case and update test categories 2026-01-30 16:29:05 +08:00
cui0711
716d82f4d1 feat: add flexible recording control and improve execution logging 2026-01-30 16:28:13 +08:00
cui0711
47bcfc0f0b feat(agent): add screenshot compression and dynamic resolution support 2026-01-30 16:28:02 +08:00
cui0711
7e9090e115 fix(prompts): fix template variable syntax and add dynamic resolution 2026-01-30 16:28:02 +08:00
cui0711
308282e830 feat(server): add cross-platform support and improve screenshot handling 2026-01-30 16:27:49 +08:00
cui0711
788b248dbc fix(logger): add Windows platform support for file locking 2026-01-30 16:27:49 +08:00
alexandruilie7
5463d3bb89 uipath v2 (#413)
* submission v2

* small updates
2026-01-09 08:47:20 +08:00
蘑菇先生
5ef8bdfa35 EvoCUA Update (2025.01.05) (#412)
* evocua init

* setup max_token

* evocua update

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2026-01-05 16:14:53 +08:00
Bowen Yang
439e178a2e fix(os_symphony_evaluation) (#410)
* fix(os_symphony)

* Update desktop_env_os_symphony.py

* fix(os_symphony_desktop)

* fix(os_symphony_start)

* Add docstring to run_multienv_os_symphony.py

Added documentation header for the evaluation script.
2026-01-04 15:56:51 +08:00
Bowen Yang
951e1928c8 fix(desktop_os_symphony):support aws (#406)
* fix(os_symphony)

* Update desktop_env_os_symphony.py
2026-01-01 11:27:34 +08:00
Bowen Yang
02a35be067 fix(os_symphony) (#405) 2025-12-30 22:43:47 +08:00
Bowen Yang
662826f57e fix(os_symphony):prompt (#402)
* add_os_symphony

* fix(os_symphony)

* fix(os_symphony):prompt

---------

Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-29 20:45:36 +08:00
xuetf
410ec63a89 Add EvoCUA Support (#401)
* evocua init

* setup max_token

---------

Co-authored-by: xuetaofeng <xuetaofeng@meituan.com>
Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-23 20:46:23 +08:00
Bowen Yang
031696e83c fix os_symphony (#400)
* add_os_symphony

* fix(os_symphony)

---------

Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2025-12-23 20:45:30 +08:00
Bowen Yang
f593f35b1c add_os_symphony (#399) 2025-12-23 14:30:44 +08:00
Ubuntu
ac31778ee3 Update: requirements.txt for seed agent 2025-12-15 11:47:56 +00:00
Ubuntu
60caa52fc4 Update: requirements.txt for seed agent 2025-12-15 11:47:40 +00:00
Ubuntu
41477a9c40 Update: seed agent 2025-12-15 11:45:57 +00:00
Ubuntu
78433ecfcf Add agent: seed agent 2025-12-12 05:35:20 +00:00
Meshal Nayim
9540454b0a Fix demo agent (PromptAgent) reset(): add vm_ip and kwargs for compatibility with lib_run_single.py (#388) 2025-12-09 15:59:25 +08:00
MillanK
cbc3b590ff Task fix batch (#383)
* update 873cafdd-a581-47f6-8b33-b9696ddb7b05 task eval

* c1fa57f3-c3db-4596-8f09-020701085416 fix, add tolerance to url matching

* 8df7e444-8e06-4f93-8a1a-c5c974269d82 add more clear instruction to the filename for compress

* add address string normalization for 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-11-19 17:24:25 +08:00
Qichen Fu
903ed36715 Add Claude Sonnet 4.5 support and improve action handling (#362)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-14 13:54:32 +08:00
Subash Shibu
3167339e45 Add hosted GBOX agent for OSWorld evaluation (#376) 2025-11-13 13:13:31 +08:00
Pengxiang-Li
00b6468eb7 feat/dart_gui (#371) 2025-11-07 21:50:01 +08:00
yiqilin
6d43dbc532 Update GIMP evaluation examples to replace local file paths with cloud file URLs for consistency and accessibility. (#372) 2025-11-07 21:49:49 +08:00
Timothyxxx
8365edc975 Add new section in README for OSWorld-MCP project 2025-10-30 06:06:48 +00:00
Daphne Barretto
21c2b7629b Add consistent scores validation (#368)
* Add consistent scores validation

* revert osworld_run_maestro.py changes
2025-10-29 01:44:48 +08:00
Timothyxxx
3bf54c92a9 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-10-23 14:28:14 +08:00
Timothyxxx
a484f2e484 Update setup.py for version bump and dependency adjustments
- Bump version from 1.0.0 to 1.0.1
- Update numpy dependency to allow versions >=1.26 and <3
- Adjust pandas dependency to allow versions >=2.2 and <2.3
- Add new __init__.py file in the docker provider directory
2025-10-23 14:27:52 +08:00
Atharva Gundawar
9f97535ef9 oswrold agent wrapper for trained v7 (#360) 2025-10-18 02:29:15 +08:00
ludunjie.ldj
afd29115da support aliyun eval of qwen3vl 2025-10-16 16:20:54 +08:00
Dunjie Lu
55372c4432 Fix API base URLs for OpenAI and DashScope
Updated the base URLs for OpenAI and DashScope API calls.
2025-10-14 12:57:00 +08:00
Dunjie Lu
d25464c203 Djlu/qwen3vl dash (#356)
* support dashscopoe sdk to call qwen3-vl-plus

* support dashscopoe sdk to call qwen3-vl-plus

---------

Co-authored-by: Timothyxxx <Timothyxxx@users.noreply.github.com>
2025-10-13 16:31:06 +08:00
Xinyuan Wang
f9e9273b3b OpenCUA-72B (#354)
* use aws pub ip

* os task fix: set the default dim screen time to be 300s

* OpenCUA-72B

* update password

* update

* update

* update opencua72b agent

* change provider ip

---------

Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
2025-10-13 10:39:33 +08:00
Yan98
ddb8372a6c init public release (#350) 2025-10-06 22:16:31 +08:00
eun2ce
5eff00a9e3 Fix #347: Fix NameError in open_file timeout message (#351)
- Fix undefined 'timeout' variable in error message
- Use defined TIMEOUT constant instead of undefined timeout variable
- Prevents NameError when LibreOffice crashes during file opening
2025-10-06 22:14:15 +08:00
Timothyxxx
ff6285cfbb Add safe browsing feature to Chrome evaluator
- Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system.
- Updated the `__init__.py` to include the new function.
- Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing.
- Added necessary commands in the JSON examples for setting up preferences for safe browsing.
2025-10-05 04:56:08 +00:00
Danyang Zhang
afd5952e44 ver Oct3rd (#349)
updated a series of instructions to ask the agent not to do any
unnecessary actions.
2025-10-04 00:13:29 +08:00
Timothyxxx
1572068035 Refactor evaluator functions in JSON examples to use URL pattern matching. Update expected URL formats to regex patterns for better validation in chrome evaluation examples. 2025-10-01 19:20:06 +00:00
Timothyxxx
9be518435c Update GIMP evaluation examples to replace local file paths with cloud file URLs for consistency and accessibility. 2025-10-01 09:54:52 +00:00