81 lines
2.9 KiB
Markdown
81 lines
2.9 KiB
Markdown
# OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
|
|
|
|
<p align="center">
|
|
<a href="">Website</a> •
|
|
<a href="">Paper</a>
|
|
</p>
|
|
|
|
![Overview]()
|
|
|
|
## Updates
|
|
- 2024-03-28: We released our [paper](), [environment and benchmark](), and [project page](https://os-world.github.io/). Check it out!
|
|
|
|
## Install
|
|
1. Install VMWare and configure `vmrun` command, and verify by:
|
|
```bash
|
|
vmrun -T ws list
|
|
```
|
|
|
|
2. Install the environment package, download the examples and the virtual machine image.
|
|
For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:
|
|
Remove the `nogui` parameter if you want to see what happens in the virtual machine.
|
|
```bash
|
|
git clone https://github.com/xlang-ai/OSWorld
|
|
cd OSWorld
|
|
pip install -r requirements.txt
|
|
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
|
|
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
|
|
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"
|
|
```
|
|
|
|
For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands:
|
|
```bash
|
|
gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder
|
|
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
|
|
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"
|
|
```
|
|
|
|
## Quick Start
|
|
Run the following minimal example to interact with the environment:
|
|
```python
|
|
from desktop_env.envs.desktop_env import DesktopEnv
|
|
|
|
example = {
|
|
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
|
|
"instruction": "I want to install Spotify on my current system. Could you please help me?",
|
|
"config": [{"type": "execute", "parameters": {"command": ["python","-c","import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}], "evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line","command": "which spotify"}, "expected": {"type": "rule","rules": {"include": ["spotify"], "exclude": ["not found"]}}}
|
|
}
|
|
env = DesktopEnv(
|
|
path_to_vm="Ubuntu/Ubuntu.vmx",
|
|
action_space="pyautogui",
|
|
task_config=example
|
|
)
|
|
obs = env.reset()
|
|
obs, reward, done, info = env.step("pyautogui.rightClick()")
|
|
```
|
|
|
|
## Run Benchmark
|
|
### Run the Baseline Agent
|
|
If you want to run the baseline agent we use in our paper, you can run the following command as an example:
|
|
```bash
|
|
|
|
```
|
|
|
|
### Run Evaluation of Your Agent
|
|
Please first read through the [agent interface](https://github.com/xlang-ai/OSWorld/mm_agents/README.md) and the [environment interface](https://github.com/xlang-ai/OSWorld/desktop_env/README.md).
|
|
And implement the agent interface correctly and import you customized one in the `run.py` file.
|
|
Then, you can run the following command to evaluate your agent:
|
|
|
|
|
|
|
|
## Citation
|
|
If you find this environment useful, please consider citing our work:
|
|
```
|
|
@article{DesktopEnv,
|
|
title={},
|
|
author={},
|
|
journal={arXiv preprint arXiv:xxxx.xxxx},
|
|
year={2024}
|
|
}
|
|
```
|