70 lines
3.8 KiB
Markdown
70 lines
3.8 KiB
Markdown
# aworldGUIAgent-v1
|
|
|
|
aworldGUIAgent-v1 built on the [AWorld Framework](https://github.com/inclusionAI/AWorld), specifically designed to tackle complex desktop automation tasks within the [OSWorld-verified](https://os-world.github.io/) benchmark.
|
|
|
|
The core logic for our agent's perception and reasoning is adapted from the great work of the [Agent-S project](https://github.com/simular-ai/Agent-S). We have built upon their foundation by introducing a suite of new executable tools that enhance the agent's ability to interact with the OS environment.
|
|
|
|
## Quick Start
|
|
|
|
Follow these steps to set up the environment and reproduce our results.
|
|
|
|
1. **Create Environment & Set Up OSWorld**:
|
|
* First, create a dedicated Conda environment with **Python 3.11**.
|
|
```bash
|
|
conda create -n osworld_env python=3.11
|
|
conda activate osworld_env
|
|
```
|
|
* Next, follow the official setup guide in the [OSWorld README](https://github.com/xlang-ai/OSWorld) to install OSWorld and its dependencies.
|
|
|
|
2. **Install AWorld Framework**:
|
|
* Install the specific version of the AWorld Framework into the **same environment**.
|
|
```bash
|
|
# Make sure your osworld_env is still activated
|
|
git clone https://github.com/inclusionAI/AWorld.git
|
|
cd AWorld
|
|
git checkout osworld_benchmark
|
|
python setup.py install
|
|
```
|
|
|
|
3. **Run the Evaluation Script**:
|
|
* Our results were achieved using `openai/o3` for reasoning and `bytedance/ui-tars-1.5-7b` for visual grounding, both accessed via OpenRouter.
|
|
* Remember to replace placeholders like `YOUR_OPENROUTER_API_KEY` and `/path/to/your/vm/Ubuntu.vmx` with your actual credentials and paths.
|
|
|
|
```bash
|
|
# Activate your OSWorld conda environment (e.g., osworld_env)
|
|
conda activate osworld_env
|
|
|
|
# Run the evaluation with the recommended settings
|
|
python run_multienv_aworldguiagent.py \
|
|
--headless \
|
|
--ground_url YOUR_BASE_URL \
|
|
--ground_api_key YOUR_API_KEY \
|
|
--ground_model bytedance/ui-tars-1.5-7b \
|
|
--ground_provider open_router \
|
|
--model_url YOUR_BASE_URL \
|
|
--model_api_key YOUR_API_KEY \
|
|
--model_temperature 1.0 \
|
|
--provider_name vmware \
|
|
--path_to_vm /path/to/your/vm/Ubuntu.vmx \
|
|
--max_steps 50 \
|
|
--model_provider open_router \
|
|
--model openai/o3 \
|
|
--grounding_width 1920 \
|
|
--grounding_height 1080 \
|
|
--test_all_meta_path evaluation_examples/test_all.json \
|
|
--result_dir ./results \
|
|
--observation_type screenshot \
|
|
--num_envs 1 \
|
|
--region us-east-1 \
|
|
--client_password osworld-public-evaluation
|
|
```
|
|
|
|
## Acknowledgements
|
|
|
|
This work would not have been possible without building upon the foundations of several incredible open-source projects.
|
|
|
|
- **AWorld Framework**: We thank the developers of the [AWorld Framework](https://github.com/inclusionAI/AWorld) for providing a powerful and flexible platform for agent development. The AWorld Framework is designed for agent training and is especially suited for complex multi-agent scenarios. If you have requirements for designing or experimenting with multi-agent systems, we highly recommend you explore the AWorld Framework further.
|
|
|
|
- **Agent-S**: We extend our sincere gratitude to the creators of the [Agent-S project](https://github.com/simular-ai/Agent-S). The core agent logic in our implementation is adapted and enhanced from their codebase. We built upon their work by adding a suite of executable tools to improve the agent's interaction with the OS environment, which effectively boosted the stability and capability of our CUA Agent.
|
|
|
|
- **OSWorld Benchmark**: We are grateful to the creators of the [OSWorld Benchmark](https://os-world.github.io/) for developing a challenging and comprehensive testbed for GUI agents. |