Commit Graph

23 Commits

Author SHA1 Message Date
yuanmengqi
39e5baf5ae fix: remove unnecessary sleep and observation retrieval in run_single_example function 2025-07-25 15:51:20 +00:00
Yuan Mengqi
0a37cccd53 update claude (#280)
* add uitars agent code

* improve claude

* improve claude

* improve claude

* improve claude

* improve claude
2025-07-23 03:35:49 +08:00
yuanmengqi
91bc6bb6ce Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-20 07:55:57 +00:00
yuanmengqi
88d5639a2a Compatible with agents that cannot use runtime log 2025-07-20 07:55:53 +00:00
Xinyuan Wang
e10dd9267c Wxy/opencua (#274)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines

* update parallel; clean code; use sleep 3s

* ui-tars-0717
2025-07-20 15:52:23 +08:00
Xinyuan Wang
0f2655249c Wxy/opencua (#260)
* OpenCUA Agent code base

* update url

* debug, modify url input

* debug opencua

* show result

* debug agent history overlap

* modify opencua agent; add comment lines
2025-07-16 17:53:12 +08:00
Xinyuan Wang
db83b9cb2c Wxy/opencua (#256)
* OpenCUA Agent code base

* update url

* debug, modify url input
2025-07-14 20:26:39 +08:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
adlsdztony
ae473d8673 fix: remove pending checks from actions to prevent json serialization issues 2025-06-04 14:20:02 +00:00
yuanmengqi
228849ab03 add openai cua agent 2025-05-31 11:22:38 +00:00
Xinyuan Wang
d626cc90d9 Modify: wait for initial env to be ready (#139) 2025-02-25 21:08:25 +08:00
Junli Wang
1503eb3994 Finish Aguvis eval on OSWorld (#107)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Aguvis Grounding

* Add Aguvis as planner

* fix parse bug

* fix pause

* fix planner prompt

* Aguvis Grounding

* fix

* fix

* fix

* add logger for each example

* Modify Aguvis Planner Prompts

* fix logger setup

* fix absolute coordinates

* Finish Aguvis Evaluation on OSWorld

* Merge origin/main into junli/aguvis

* Remove screenshot

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-24 16:43:25 +08:00
tsuky_chen
1fd8b66fde debug/timeout (#59) 2024-07-24 08:31:42 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
8e760fd450 Disable wandb temporarily, speedup the environment step speed by remove useless a11y tree re-get and terminal output 2024-03-19 08:57:05 +08:00
Timothyxxx
f992d1f694 Disable a11y tree temporarily 2024-03-18 21:43:35 +08:00
Timothyxxx
c1c7ac298f Update claude endpoint 2024-03-18 14:59:02 +08:00
Jason Lee
576248ae18 uncomment timer 2024-03-18 12:02:34 +08:00
Jason Lee
8080828a84 update wandb settings 2024-03-18 00:02:41 +08:00
Jason Lee
48aedb09a7 add wandb settings, remember to set WANDB_KEY 2024-03-17 22:30:29 +08:00
Timothyxxx
639f8c7db8 Fix small bugs in max time limit setting 2024-03-16 14:34:40 +08:00
Jason Lee
053da203b8 new timer, but need to set in setting.json file, need to be upgraded into parameters 2024-03-16 12:36:23 +08:00
Jason Lee
44679724b8 try new timer 2024-03-16 11:54:45 +08:00