- Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity.
- Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination.
- Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters.
- Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
- Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility.
- Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration.
- Modified evaluation guidelines to reflect the new client password requirement.
- Ensured existing logic remains intact while enhancing functionality for better user experience.
* feat: add claude support
* feat: add script for end-to-end evaluation with logging and task distribution
* feat&fix: add tool result handling and update model default in evaluation script
* chore: remove run_test_env.py script
* feat&fix: implement action parsing for tool calls and update default action space
* fix: update text formatting in action parsing and replace logger import
* feat&fix: implement action parsing for tool calls and add screen size handling
* feat: add setup instructions for Anthropic API integration
* feat: add notice about image size limitations for Anthropic API
* Delete test_env/logger.py
* Delete test_env/utils.py