* feat: add claude support * feat: add script for end-to-end evaluation with logging and task distribution * feat&fix: add tool result handling and update model default in evaluation script * chore: remove run_test_env.py script * feat&fix: implement action parsing for tool calls and update default action space * fix: update text formatting in action parsing and replace logger import * feat&fix: implement action parsing for tool calls and add screen size handling * feat: add setup instructions for Anthropic API integration * feat: add notice about image size limitations for Anthropic API * Delete test_env/logger.py * Delete test_env/utils.py * fix: update logger usage to use global logger and improve error handling * feat&fix: add configuration management API endpoints and update UI for configuration selection * feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness * feat&fix: add configuration toggle button in UI and improve task loading performance * feat&fix: add accuracy percentage display to score and style updates for UI
14 lines
419 B
Bash
14 lines
419 B
Bash
# This file is only used to configure the monitor.
|
|
# Do not write any secret keys or sensitive information here.
|
|
|
|
# Monitor configuration
|
|
TASK_CONFIG_PATH=../evaluation_examples/test_all.json
|
|
EXAMPLES_BASE_PATH=../evaluation_examples/examples
|
|
RESULTS_BASE_PATH=../results
|
|
ACTION_SPACE=pyautogui
|
|
OBSERVATION_TYPE=screenshot
|
|
MODEL_NAME=computer-use-preview
|
|
MAX_STEPS=150
|
|
FLASK_PORT=80
|
|
FLASK_HOST=0.0.0.0
|
|
FLASK_DEBUG=false |