87 lines
3.9 KiB
Markdown
87 lines
3.9 KiB
Markdown
# OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
|
|
|
|
<p align="center">
|
|
<a href="">Website</a> •
|
|
<a href="">Paper</a>
|
|
</p>
|
|
|
|
## Updates
|
|
- 2024-04-04: We released our [paper](), [environment and benchmark](https://github.com/xlang-ai/OSWorld), and [project page](https://os-world.github.io/). Check it out!
|
|
|
|
## Install
|
|
### Non-virtualized platform
|
|
Suppose you are on a system that has not yet been virtualized, meaning you are not on an AWS, Azure, or k8s virtualized environment.
|
|
Otherwise, refer to the [virtualized platform](https://github.com/xlang-ai/OSWorld?tab=readme-ov-file#virtualized-platform) part.
|
|
1. Install [VMware Work Station Pro](https://www.vmware.com/products/workstation-pro/workstation-pro-evaluation.html) (for Apple Chips, it should be [VMware Fusion](https://www.vmware.com/go/getfusion)) and configure `vmrun` command, and verify successful installation by:
|
|
```bash
|
|
vmrun -T ws list
|
|
```
|
|
If the installation along with the environment variable set is successful, you will see the message showing the current running virtual machines.
|
|
|
|
2. Install the environment package, and download the examples and the virtual machine image.
|
|
For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:
|
|
Remove the `nogui` parameter if you want to see what happens in the virtual machine.
|
|
```bash
|
|
git clone https://github.com/xlang-ai/OSWorld
|
|
cd OSWorld
|
|
pip install -r requirements.txt
|
|
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
|
|
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
|
|
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"
|
|
```
|
|
|
|
For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands:
|
|
```bash
|
|
gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder
|
|
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
|
|
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"
|
|
```
|
|
|
|
### Virtualized platform
|
|
We are working on supporting it👷, hold tight!
|
|
|
|
## Quick Start
|
|
Run the following minimal example to interact with the environment:
|
|
```python
|
|
from desktop_env.envs.desktop_env import DesktopEnv
|
|
|
|
example = {
|
|
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
|
|
"instruction": "I want to install Spotify on my current system. Could you please help me?",
|
|
"config": [{"type": "execute", "parameters": {
|
|
"command": ["python", "-c", "import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}],
|
|
"evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line", "command": "which spotify"},
|
|
"expected": {"type": "rule", "rules": {"include": ["spotify"], "exclude": ["not found"]}}}
|
|
}
|
|
env = DesktopEnv(
|
|
path_to_vm="Ubuntu/Ubuntu.vmx",
|
|
action_space="pyautogui",
|
|
task_config=example
|
|
)
|
|
obs = env.reset()
|
|
obs, reward, done, info = env.step("pyautogui.rightClick()")
|
|
```
|
|
|
|
## Run Benchmark
|
|
### Run the Baseline Agent
|
|
If you want to run the baseline agent we use in our paper, you can run the following command to run under the GPT-4V pure-screenshot setting as an example:
|
|
```bash
|
|
python run.py --path_to_vm Ubuntu/Ubuntu.vmx --headless --observation_type screenshot --model gpt-4-vision-preview
|
|
```
|
|
|
|
### Run Evaluation of Your Agent
|
|
Please first read through the [agent interface](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/README.md) and the [environment interface](https://github.com/xlang-ai/OSWorld/blob/main/desktop_env/README.md).
|
|
Implement the agent interface correctly and import your customized one in the `run.py` file.
|
|
Then, you can run a similar command as the previous section to run the benchmark on your agent.
|
|
|
|
## Citation
|
|
If you find this environment useful, please consider citing our work:
|
|
```
|
|
@article{DesktopEnv,
|
|
title={},
|
|
author={},
|
|
journal={arXiv preprint arXiv:xxxx.xxxx},
|
|
year={2024}
|
|
}
|
|
```
|