# OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
![Overview]() ## Updates - 2024-03-28: We released our [paper](), [environment and benchmark](), and [project page](https://os-world.github.io/). Check it out! ## Install 1. Install VMWare and configure `vmrun` command, and verify by: ```bash vmrun -T ws list ``` 2. Install the environment package, download the examples and the virtual machine image. For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands: Remove the `nogui` parameter if you want to see what happens in the virtual machine. ```bash git clone https://github.com/xlang-ai/OSWorld cd OSWorld pip install -r requirements.txt gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state" ``` For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands: ```bash gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder vmrun -T fusion start "Ubuntu/Ubuntu.vmx" vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state" ``` ## Quick Start Run the following minimal example to interact with the environment: ```python from desktop_env.envs.desktop_env import DesktopEnv example = { "id": "94d95f96-9699-4208-98ba-3c3119edf9c2", "instruction": "I want to install Spotify on my current system. Could you please help me?", "config": [{"type": "execute", "parameters": {"command": ["python","-c","import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}], "evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line","command": "which spotify"}, "expected": {"type": "rule","rules": {"include": ["spotify"], "exclude": ["not found"]}}} } env = DesktopEnv( path_to_vm="Ubuntu/Ubuntu.vmx", action_space="pyautogui", task_config=example ) obs = env.reset() obs, reward, done, info = env.step("pyautogui.rightClick()") ``` ## Run Benchmark ### Run the Baseline Agent If you want to run the baseline agent we use in our paper, you can run the following command as an example: ```bash ``` ### Run Evaluation of Your Agent Please first read through the [agent interface](https://github.com/xlang-ai/OSWorld/mm_agents/README.md) and the [environment interface](https://github.com/xlang-ai/OSWorld/desktop_env/README.md). And implement the agent interface correctly and import you customized one in the `run.py` file. Then, you can run the following command to evaluate your agent: ## Citation If you find this environment useful, please consider citing our work: ``` @article{DesktopEnv, title={}, author={}, journal={arXiv preprint arXiv:xxxx.xxxx}, year={2024} } ```