Finish Aguvis eval on OSWorld (#107)

* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Aguvis Grounding

* Add Aguvis as planner

* fix parse bug

* fix pause

* fix planner prompt

* Aguvis Grounding

* fix

* fix

* fix

* add logger for each example

* Modify Aguvis Planner Prompts

* fix logger setup

* fix absolute coordinates

* Finish Aguvis Evaluation on OSWorld

* Merge origin/main into junli/aguvis

* Remove screenshot

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: Timothyxxx <384084775@qq.com>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
This commit is contained in:
Junli Wang
2024-11-24 16:43:25 +08:00
committed by GitHub
parent 7d84a21962
commit 1503eb3994
6 changed files with 407 additions and 247 deletions

View File

@@ -82,7 +82,7 @@ def config() -> argparse.Namespace:
)
parser.add_argument("--screen_width", type=int, default=1920)
parser.add_argument("--screen_height", type=int, default=1080)
parser.add_argument("--sleep_after_execution", type=float, default=0.0)
parser.add_argument("--sleep_after_execution", type=float, default=2.0)
parser.add_argument("--max_steps", type=int, default=15)
# agent config
@@ -91,8 +91,9 @@ def config() -> argparse.Namespace:
)
# lm config
parser.add_argument("--planner_model", type=str, default="gpt-4o")
parser.add_argument("--executor_model", type=str, default="/mnt/chuzhe.hby/hf_ckpts/qwen-aguvis-7b")
parser.add_argument("--planner_model", type=str, default=None)
parser.add_argument("--executor_model", type=str, default="aguvis-72b-415")
parser.add_argument("--temperature", type=float, default=0)
parser.add_argument("--top_p", type=float, default=0.9)
parser.add_argument("--max_tokens", type=int, default=1500)