Files
sci-gui-agent-benchmark/README.md
2024-03-27 16:21:49 +08:00

81 lines
2.9 KiB
Markdown

# OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
<p align="center">
<a href="">Website</a>
<a href="">Paper</a>
</p>
![Overview]()
## Updates
- 2024-03-28: We released our [paper](), [environment and benchmark](), and [project page](https://os-world.github.io/). Check it out!
## Install
1. Install VMWare and configure `vmrun` command, and verify by:
```bash
vmrun -T ws list
```
2. Install the environment package, download the examples and the virtual machine image.
For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:
Remove the `nogui` parameter if you want to see what happens in the virtual machine.
```bash
git clone https://github.com/xlang-ai/OSWorld
cd OSWorld
pip install -r requirements.txt
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"
```
For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands:
```bash
gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"
```
## Quick Start
Run the following minimal example to interact with the environment:
```python
from desktop_env.envs.desktop_env import DesktopEnv
example = {
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
"instruction": "I want to install Spotify on my current system. Could you please help me?",
"config": [{"type": "execute", "parameters": {"command": ["python","-c","import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}], "evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line","command": "which spotify"}, "expected": {"type": "rule","rules": {"include": ["spotify"], "exclude": ["not found"]}}}
}
env = DesktopEnv(
path_to_vm="Ubuntu/Ubuntu.vmx",
action_space="pyautogui",
task_config=example
)
obs = env.reset()
obs, reward, done, info = env.step("pyautogui.rightClick()")
```
## Run Benchmark
### Run the Baseline Agent
If you want to run the baseline agent we use in our paper, you can run the following command as an example:
```bash
```
### Run Evaluation of Your Agent
Please first read through the [agent interface](https://github.com/xlang-ai/OSWorld/mm_agents/README.md) and the [environment interface](https://github.com/xlang-ai/OSWorld/desktop_env/README.md).
And implement the agent interface correctly and import you customized one in the `run.py` file.
Then, you can run the following command to evaluate your agent:
## Citation
If you find this environment useful, please consider citing our work:
```
@article{DesktopEnv,
title={},
author={},
journal={arXiv preprint arXiv:xxxx.xxxx},
year={2024}
}
```