Files

Zilong Zhou 74b7c189af Feat/monitor (#254 )

* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI

2025-07-14 13:43:41 +08:00

static

Feat/monitor (#254 )

2025-07-14 13:43:41 +08:00

templates

Feat/monitor (#254 )

2025-07-14 13:43:41 +08:00

.env

Feat/monitor (#254 )

2025-07-14 13:43:41 +08:00

.gitignore

feat: add .env configuration file and update README with configuration details

2025-06-01 07:07:47 +00:00

docker-compose.yml

feat&fix: update environment configuration for Docker compatibility and enhance result path handling

2025-06-06 02:53:20 +00:00

Dockerfile

feat&fix: update paths in configuration, enhance error handling, and improve UI elements

2025-06-01 04:48:50 +00:00

main.py

Feat/monitor (#254 )

2025-07-14 13:43:41 +08:00

README.md

refactor&fix: update README and main.py for improved configuration and task status handling

2025-06-06 12:55:13 +00:00

requirements.txt

feat: Implement task monitoring web application

2025-06-01 10:31:27 +08:00

README.md

OSWorld Monitor

A web-based monitoring dashboard for OSWorld tasks and executions.

Overview

This monitor provides a visual interface to track the status, progress, and results of OSWorld tasks. It allows you to:

View all tasks grouped by type
Monitor task execution status in real-time
See detailed execution steps with screenshots and videos
Check task results

Important! Make sure you run the monitor after the main runner has started executing tasks. Otherwise, it may cause issues when executing tasks.

Configuration

The monitor can be configured by editing the .env file in the monitor directory. The following variables can be customized:

Variable	Description	Default Value
TASK_CONFIG_PATH	Path to the task configuration file	../evaluation_examples/test.json
EXAMPLES_BASE_PATH	Base path for example files	../evaluation_examples/examples
RESULTS_BASE_PATH	Base path for storing results	../results
ACTION_SPACE	Action space type (e.g., pyautogui, keyboard)	pyautogui
OBSERVATION_TYPE	Type of observation (e.g., screenshot, video)	screenshot
MODEL_NAME	Name of the model to use for task execution	computer-use-preview
MAX_STEPS	Maximum steps to display for a task	150
FLASK_PORT	Port for the web server	80
FLASK_HOST	Host address for the web server	0.0.0.0
FLASK_DEBUG	Enable debug mode (true/false)	false

For example:

# .env
TASK_CONFIG_PATH=../evaluation_examples/test.json
EXAMPLES_BASE_PATH=../evaluation_examples/examples
RESULTS_BASE_PATH=../results
ACTION_SPACE=pyautogui
OBSERVATION_TYPE=screenshot
MODEL_NAME=computer-use-preview
MAX_STEPS=150
FLASK_PORT=80
FLASK_HOST=0.0.0.0
FLASK_DEBUG=true

Running with Docker

The recommended way to run the monitor is using Docker with the provided Docker Compose configuration.

Prerequisites

Docker and Docker Compose installed on your system
OSWorld repository cloned to your local machine
Environment variables set in the .env file

Starting the Monitor

Navigate to the monitor directory:
```
cd /path/to/OSWorld/monitor
```
Edit the .env file if you need to customize any settings.
Build and start the Docker container:
```
docker-compose up -d
```
Access the monitor in your web browser at:
```
http://{your-ip-address}:{FLASK_PORT}
```

Stopping the Monitor

To stop the monitor:

docker-compose down

Viewing Logs

To view the monitor logs:

docker-compose logs -f

Running Without Docker

If you prefer to run the monitor directly, make sure you have created a .env file with the necessary configurations. You will also need to install the required Python packages.

Install the required Python packages:
```
pip install -r requirements.txt
```
Start the monitor:
```
python main.py
```

Features

Task Overview: View all tasks with their status, progress, and basic information
Task Filtering: Filter tasks by status (all, active, completed)
Task Details: Detailed view of each task showing step-by-step execution
Screenshots: View screenshots captured during task execution

Troubleshooting

If you encounter issues:

Check the logs for errors
Verify the paths in .env file point to valid directories
Ensure the Docker daemon is running (if using Docker)
Check that the port is not already in use by another application
Make sure you set the security group rules to allow access to the specified port