sci-gui-agent-benchmark/mm_agents/aworldguiagent/README.md

# aworldGUIAgent-v1

aworldGUIAgent-v1 built on the [AWorld Framework](https://github.com/inclusionAI/AWorld), specifically designed to tackle complex desktop automation tasks within the [OSWorld-verified](https://os-world.github.io/) benchmark.

The core logic for our agent's perception and reasoning is adapted from the great work of the [Agent-S project](https://github.com/simular-ai/Agent-S). We have built upon their foundation by introducing a suite of new executable tools that enhance the agent's ability to interact with the OS environment.

##  Quick Start

Follow these steps to set up the environment and reproduce our results.

1.  **Create Environment & Set Up OSWorld**:
   *   First, create a dedicated Conda environment with **Python 3.11**.
       ```bash
       conda create -n osworld_env python=3.11
       conda activate osworld_env
       ```
  *   Next, follow the official setup guide in the [OSWorld README](https://github.com/xlang-ai/OSWorld) to install OSWorld and its dependencies.

2.  **Install AWorld Framework**:
   *    Install the specific version of the AWorld Framework into the **same environment**.
         ```bash
         # Make sure your osworld_env is still activated
         git clone https://github.com/inclusionAI/AWorld.git
         cd AWorld
         git checkout osworld_benchmark
         python setup.py install
         ```

3.  **Run the Evaluation Script**:
    *   Our results were achieved using `openai/o3` for reasoning and `bytedance/ui-tars-1.5-7b` for visual grounding, both accessed via OpenRouter.
    *    Remember to replace placeholders like `YOUR_OPENROUTER_API_KEY` and `/path/to/your/vm/Ubuntu.vmx` with your actual credentials and paths.

    ```bash
    # Activate your OSWorld conda environment (e.g., osworld_env)
    conda activate osworld_env

    # Run the evaluation with the recommended settings
    python run_multienv_aworldguiagent.py \
        --headless \
        --ground_url YOUR_BASE_URL \
        --ground_api_key YOUR_API_KEY \
        --ground_model bytedance/ui-tars-1.5-7b \
        --ground_provider open_router \
        --model_url YOUR_BASE_URL \
        --model_api_key YOUR_API_KEY \
        --model_temperature 1.0 \
        --provider_name vmware \
        --path_to_vm /path/to/your/vm/Ubuntu.vmx \
        --max_steps 50 \
        --model_provider open_router \
        --model openai/o3 \
        --grounding_width 1920 \
        --grounding_height 1080 \
        --test_all_meta_path evaluation_examples/test_all.json \
        --result_dir ./results \
        --observation_type screenshot \
        --num_envs 1 \
        --region us-east-1 \
        --client_password osworld-public-evaluation
    ```

## Acknowledgements

This work would not have been possible without building upon the foundations of several incredible open-source projects.

-   **AWorld Framework**: We thank the developers of the [AWorld Framework](https://github.com/inclusionAI/AWorld) for providing a powerful and flexible platform for agent development. The AWorld Framework is designed for agent training and is especially suited for complex multi-agent scenarios. If you have requirements for designing or experimenting with multi-agent systems, we highly recommend you explore the AWorld Framework further.

-   **Agent-S**: We extend our sincere gratitude to the creators of the [Agent-S project](https://github.com/simular-ai/Agent-S). The core agent logic in our implementation is adapted and enhanced from their codebase. We built upon their work by adding a suite of executable tools to improve the agent's interaction with the OS environment, which effectively boosted the stability and capability of our CUA Agent.

-   **OSWorld Benchmark**: We are grateful to the creators of the [OSWorld Benchmark](https://os-world.github.io/) for developing a challenging and comprehensive testbed for GUI agents.