Files
sci-gui-agent-benchmark/mm_agents/aworldguiagent/README.md
2025-09-23 16:50:29 +08:00

70 lines
3.8 KiB
Markdown

# aworldGUIAgent-v1
aworldGUIAgent-v1 built on the [AWorld Framework](https://github.com/inclusionAI/AWorld), specifically designed to tackle complex desktop automation tasks within the [OSWorld-verified](https://os-world.github.io/) benchmark.
The core logic for our agent's perception and reasoning is adapted from the great work of the [Agent-S project](https://github.com/simular-ai/Agent-S). We have built upon their foundation by introducing a suite of new executable tools that enhance the agent's ability to interact with the OS environment.
## Quick Start
Follow these steps to set up the environment and reproduce our results.
1. **Create Environment & Set Up OSWorld**:
* First, create a dedicated Conda environment with **Python 3.11**.
```bash
conda create -n osworld_env python=3.11
conda activate osworld_env
```
* Next, follow the official setup guide in the [OSWorld README](https://github.com/xlang-ai/OSWorld) to install OSWorld and its dependencies.
2. **Install AWorld Framework**:
* Install the specific version of the AWorld Framework into the **same environment**.
```bash
# Make sure your osworld_env is still activated
git clone https://github.com/inclusionAI/AWorld.git
cd AWorld
git checkout osworld_benchmark
python setup.py install
```
3. **Run the Evaluation Script**:
* Our results were achieved using `openai/o3` for reasoning and `bytedance/ui-tars-1.5-7b` for visual grounding, both accessed via OpenRouter.
* Remember to replace placeholders like `YOUR_OPENROUTER_API_KEY` and `/path/to/your/vm/Ubuntu.vmx` with your actual credentials and paths.
```bash
# Activate your OSWorld conda environment (e.g., osworld_env)
conda activate osworld_env
# Run the evaluation with the recommended settings
python run_multienv_aworldguiagent.py \
--headless \
--ground_url YOUR_BASE_URL \
--ground_api_key YOUR_API_KEY \
--ground_model bytedance/ui-tars-1.5-7b \
--ground_provider open_router \
--model_url YOUR_BASE_URL \
--model_api_key YOUR_API_KEY \
--model_temperature 1.0 \
--provider_name vmware \
--path_to_vm /path/to/your/vm/Ubuntu.vmx \
--max_steps 50 \
--model_provider open_router \
--model openai/o3 \
--grounding_width 1920 \
--grounding_height 1080 \
--test_all_meta_path evaluation_examples/test_all.json \
--result_dir ./results \
--observation_type screenshot \
--num_envs 1 \
--region us-east-1 \
--client_password osworld-public-evaluation
```
## Acknowledgements
This work would not have been possible without building upon the foundations of several incredible open-source projects.
- **AWorld Framework**: We thank the developers of the [AWorld Framework](https://github.com/inclusionAI/AWorld) for providing a powerful and flexible platform for agent development. The AWorld Framework is designed for agent training and is especially suited for complex multi-agent scenarios. If you have requirements for designing or experimenting with multi-agent systems, we highly recommend you explore the AWorld Framework further.
- **Agent-S**: We extend our sincere gratitude to the creators of the [Agent-S project](https://github.com/simular-ai/Agent-S). The core agent logic in our implementation is adapted and enhanced from their codebase. We built upon their work by adding a suite of executable tools to improve the agent's interaction with the OS environment, which effectively boosted the stability and capability of our CUA Agent.
- **OSWorld Benchmark**: We are grateful to the creators of the [OSWorld Benchmark](https://os-world.github.io/) for developing a challenging and comprehensive testbed for GUI agents.