sci-gui-agent-benchmark

Files

Zilong Zhou 74b7c189af Feat/monitor (#254 )

* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI

2025-07-14 13:43:41 +08:00

.gitignore

feat: Implement task monitoring web application

2025-06-01 10:31:27 +08:00

favicon.ico

feat&fix: update paths in configuration, enhance error handling, and improve UI elements

2025-06-01 04:48:50 +00:00

favicon.png

feat&fix: update paths in configuration, enhance error handling, and improve UI elements

2025-06-01 04:48:50 +00:00

index.css

Feat/monitor (#254 )