Files
sci-gui-agent-benchmark/mm_agents/aworldguiagent/README.md
2025-09-23 16:50:29 +08:00

3.8 KiB

aworldGUIAgent-v1

aworldGUIAgent-v1 built on the AWorld Framework, specifically designed to tackle complex desktop automation tasks within the OSWorld-verified benchmark.

The core logic for our agent's perception and reasoning is adapted from the great work of the Agent-S project. We have built upon their foundation by introducing a suite of new executable tools that enhance the agent's ability to interact with the OS environment.

Quick Start

Follow these steps to set up the environment and reproduce our results.

  1. Create Environment & Set Up OSWorld:
  • First, create a dedicated Conda environment with Python 3.11.
    conda create -n osworld_env python=3.11
    conda activate osworld_env
    
  • Next, follow the official setup guide in the OSWorld README to install OSWorld and its dependencies.
  1. Install AWorld Framework:
  • Install the specific version of the AWorld Framework into the same environment.
    # Make sure your osworld_env is still activated
    git clone https://github.com/inclusionAI/AWorld.git
    cd AWorld
    git checkout osworld_benchmark
    python setup.py install
    
  1. Run the Evaluation Script:

    • Our results were achieved using openai/o3 for reasoning and bytedance/ui-tars-1.5-7b for visual grounding, both accessed via OpenRouter.
    • Remember to replace placeholders like YOUR_OPENROUTER_API_KEY and /path/to/your/vm/Ubuntu.vmx with your actual credentials and paths.
    # Activate your OSWorld conda environment (e.g., osworld_env)
    conda activate osworld_env
    
    # Run the evaluation with the recommended settings
    python run_multienv_aworldguiagent.py \
        --headless \
        --ground_url YOUR_BASE_URL \
        --ground_api_key YOUR_API_KEY \
        --ground_model bytedance/ui-tars-1.5-7b \
        --ground_provider open_router \
        --model_url YOUR_BASE_URL \
        --model_api_key YOUR_API_KEY \
        --model_temperature 1.0 \
        --provider_name vmware \
        --path_to_vm /path/to/your/vm/Ubuntu.vmx \
        --max_steps 50 \
        --model_provider open_router \
        --model openai/o3 \
        --grounding_width 1920 \
        --grounding_height 1080 \
        --test_all_meta_path evaluation_examples/test_all.json \
        --result_dir ./results \
        --observation_type screenshot \
        --num_envs 1 \
        --region us-east-1 \
        --client_password osworld-public-evaluation
    

Acknowledgements

This work would not have been possible without building upon the foundations of several incredible open-source projects.

  • AWorld Framework: We thank the developers of the AWorld Framework for providing a powerful and flexible platform for agent development. The AWorld Framework is designed for agent training and is especially suited for complex multi-agent scenarios. If you have requirements for designing or experimenting with multi-agent systems, we highly recommend you explore the AWorld Framework further.

  • Agent-S: We extend our sincere gratitude to the creators of the Agent-S project. The core agent logic in our implementation is adapted and enhanced from their codebase. We built upon their work by adding a suite of executable tools to improve the agent's interaction with the OS environment, which effectively boosted the stability and capability of our CUA Agent.

  • OSWorld Benchmark: We are grateful to the creators of the OSWorld Benchmark for developing a challenging and comprehensive testbed for GUI agents.