Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333)
* Added a **pyproject.toml** file to define project metadata and dependencies. * Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic. * Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis. * Added a **tools module** containing utility functions and tool configurations to improve code reusability. * Updated the **README** and documentation with usage examples and module descriptions. These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience. Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
This commit is contained in:
0
mm_agents/maestro/tools/__init__.py
Normal file
0
mm_agents/maestro/tools/__init__.py
Normal file
385
mm_agents/maestro/tools/model.md
Normal file
385
mm_agents/maestro/tools/model.md
Normal file
@@ -0,0 +1,385 @@
|
||||
# Supported Model Providers and Model Lists
|
||||
|
||||
## LLM Model Providers
|
||||
|
||||
### 1. OpenAI
|
||||
|
||||
**Provider**
|
||||
|
||||
- `openai`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `gpt-5` Window: 400,000 Max Output Tokens: 128,000
|
||||
- `gpt-5-mini` Window: 400,000 Max Output Tokens: 128,000
|
||||
- `gpt-4.1-nano` Window: 400,000 Max Output Tokens: 128,000
|
||||
- `gpt-4.1` Window: 1,047,576 Max Output Tokens: 32,768
|
||||
- `gpt-4.1-mini` Window: 1,047,576 Max Output Tokens: 32,768
|
||||
- `gpt-4.1-nano` Window: 1,047,576 Max Output Tokens: 32,768
|
||||
- `gpt-4o` Window: 128,000 Max Output Tokens: 16,384
|
||||
- `gpt-4o-mini` Window: 128,000 Max Output Tokens: 16,384
|
||||
- `o1` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o1-pro` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o1-mini` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o3` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o3-pro` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o3-mini` Window: 200,000 Max Output Tokens: 100,000
|
||||
- `o4-mini` Window: 200,000 Max Output Tokens: 100,000
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `text-embedding-3-small`
|
||||
- `text-embedding-3-large`
|
||||
- `text-embedding-ada-002`
|
||||
|
||||
📚 **Reference Link:** <https://platform.openai.com/docs/pricing>
|
||||
|
||||
---
|
||||
|
||||
### 2. Anthropic Claude
|
||||
|
||||
**Provider**
|
||||
|
||||
- `anthropic`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `claude-opus-4-1-20250805` Context window: 200K Max output: 32000
|
||||
- `claude-opus-4-20250514` Context window: 200K Max output: 32000
|
||||
- `claude-sonnet-4-20250514` Context window: 200K Max output: 64000
|
||||
- `claude-3-7-sonnet-20250219` Context window: 200K Max output: 64000
|
||||
- - `claude-3-5-sonnet-20240620` Context window: 200K Max output: 64000
|
||||
- `claude-3-5-haiku-20241022` Context window: 200K Max output: 8192
|
||||
|
||||
📚 **Reference Link:** <https://www.anthropic.com/api>
|
||||
|
||||
---
|
||||
|
||||
### 3. AWS Bedrock
|
||||
|
||||
**Provider**
|
||||
|
||||
- `bedrock`
|
||||
|
||||
|
||||
**Supported Claude Models:**
|
||||
|
||||
- `Claude-Opus-4`
|
||||
- `Claude-Sonnet-4`
|
||||
- `Claude-Sonnet-3.7`
|
||||
- `Claude-Sonnet-3.5`
|
||||
|
||||
📚 **Reference Link:** <https://aws.amazon.com/bedrock/>
|
||||
|
||||
---
|
||||
|
||||
### 4. Google Gemini
|
||||
|
||||
**Provider**
|
||||
|
||||
- `gemini`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `gemini-2.5-pro` in: 1,048,576 out: 65536
|
||||
- `gemini-2.5-flash` in: 1,048,576 out: 65536
|
||||
- `gemini-2.0-flash` in: 1,048,576 out: 8192
|
||||
- `gemini-1.5-pro` in: 2,097,152 out: 8192
|
||||
- `gemini-1.5-flash` in: 1,048,576 out: 8192
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `gemini-embedding-001`
|
||||
|
||||
📚 **Reference Link:** <https://ai.google.dev/gemini-api/docs/pricing>
|
||||
|
||||
---
|
||||
|
||||
### 5. Groq
|
||||
|
||||
**Provider**
|
||||
|
||||
- `groq`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `Kimi-K2-Instruct`
|
||||
- `Llama-4-Scout-17B-16E-Instruct`
|
||||
- `Llama-4-Maverick-17B-128E-Instruct`
|
||||
- `Llama-Guard-4-12B`
|
||||
- `DeepSeek-R1-Distill-Llama-70B`
|
||||
- `Qwen3-32B`
|
||||
- `Llama-3.3-70B-Instruct`
|
||||
|
||||
📚 **Reference Link:** <https://groq.com/pricing>
|
||||
|
||||
---
|
||||
|
||||
### 6. Monica (Proxy Platform)
|
||||
|
||||
**Provider**
|
||||
|
||||
- `monica`
|
||||
|
||||
**OpenAI Models:**
|
||||
|
||||
- `gpt-4.1`
|
||||
- `gpt-4.1-mini`
|
||||
- `gpt-4.1-nano`
|
||||
- `gpt-4o-2024-11-20`
|
||||
- `gpt-4o-mini-2024-07-18`
|
||||
- `o4-mini`
|
||||
- `o3`
|
||||
|
||||
**Anthropic Claude Models:**
|
||||
|
||||
- `claude-opus-4-20250514`
|
||||
- `claude-sonnet-4-20250514`
|
||||
- `claude-3-7-sonnet-latest`
|
||||
- `claude-3-5-sonnet-20241022`
|
||||
- `claude-3-5-sonnet-20240620`
|
||||
- `claude-3-5-haiku-20241022`
|
||||
|
||||
|
||||
**Google Gemini Models:**
|
||||
|
||||
- `gemini-2.5-pro-preview-03-25`
|
||||
- `gemini-2.5-flash-lite`
|
||||
- `gemini-2.5-flash-preview-05-20`
|
||||
- `gemini-2.0-flash-001`
|
||||
- `gemini-1.5-pro-002`
|
||||
- `gemini-1.5-flash-002`
|
||||
|
||||
**DeepSeek Models:**
|
||||
|
||||
- `deepseek-reasoner`
|
||||
- `deepseek-chat`
|
||||
|
||||
**Meta Llama Models:**
|
||||
|
||||
- `Llama-4-Scout-17B-16E-Instruct` Context length: 10M tokens
|
||||
- `Llama-4-Maverick-17B-128E-Instruct ` Context length: 1M tokens
|
||||
- `llama-3.3-70b-instruct`
|
||||
- `llama-3-70b-instruct`
|
||||
- `llama-3.1-405b-instruct`
|
||||
|
||||
**xAI Grok Models:**
|
||||
|
||||
- `grok-3-beta`
|
||||
- `grok-beta`
|
||||
|
||||
📚 **Reference Link:** <https://platform.monica.im/docs/en/models-and-pricing>
|
||||
|
||||
---
|
||||
|
||||
### 7. OpenRouter (Proxy Platform)
|
||||
|
||||
**Provider**
|
||||
|
||||
- `openrouter`
|
||||
|
||||
**OpenAI Models:**
|
||||
|
||||
- `gpt-4.1`
|
||||
- `gpt-4.1-mini`
|
||||
- `o1`
|
||||
- `o1-pro`
|
||||
- `o1-mini`
|
||||
- `o3`
|
||||
- `o3-pro`
|
||||
- `o3-mini`
|
||||
- `o4-mini`
|
||||
|
||||
**xAI Grok Models:**
|
||||
|
||||
- `grok-4` Total Context: 256K Max Output: 256K
|
||||
- `grok-3`
|
||||
- `grok-3-mini`
|
||||
|
||||
**Anthropic Claude Models:**
|
||||
|
||||
- `claude-opus-4`
|
||||
- `claude-sonnet-4`
|
||||
|
||||
**Google Gemini Models:**
|
||||
|
||||
- `gemini-2.5-flash`
|
||||
- `gemini-2.5-pro`
|
||||
|
||||
📚 **Reference Link:** <https://openrouter.ai/models>
|
||||
|
||||
---
|
||||
|
||||
### 8. Azure OpenAI
|
||||
|
||||
**Provider**
|
||||
|
||||
- `azure`
|
||||
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `gpt-4.1`
|
||||
- `gpt-4.1-mini`
|
||||
- `gpt-4.1-nano`
|
||||
- `o1`
|
||||
- `o3`
|
||||
- `o4-mini`
|
||||
|
||||
📚 **Reference Link:** <https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/>
|
||||
|
||||
---
|
||||
|
||||
### 9. Lybic AI
|
||||
|
||||
**Provider:**
|
||||
|
||||
- `lybic`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `gpt-5`
|
||||
- `gpt-4.1`
|
||||
- `gpt-4.1-mini`
|
||||
- `gpt-4.1-nano`
|
||||
- `gpt-4.5-preview`
|
||||
- `gpt-4o`
|
||||
- `gpt-4o-realtime-preview`
|
||||
- `gpt-4o-mini`
|
||||
- `o1`
|
||||
- `o1-pro`
|
||||
- `o1-mini`
|
||||
- `o3`
|
||||
- `o3-pro`
|
||||
- `o3-mini`
|
||||
- `o4-mini`
|
||||
|
||||
**Note:** Lybic AI provides OpenAI-compatible API endpoints with the same model names and pricing structure.
|
||||
|
||||
📚 **Reference Link:** <https://aigw.lybicai.com/>
|
||||
|
||||
---
|
||||
|
||||
### 10. DeepSeek
|
||||
|
||||
**Provider**
|
||||
|
||||
- `deepseek`
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `deepseek-chat` Context length: 128K, Output length: Default 4K, Max 8K
|
||||
- `deepseek-reasoner` Context length: 128K, Output length: Default 32K, Max 64K
|
||||
|
||||
📚 **Reference Link:** <https://platform.deepseek.com/>
|
||||
|
||||
---
|
||||
|
||||
### 11. Alibaba Cloud Qwen
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `qwen-max-latest` Context window: 32,768 Max input token length: 30,720 Max generation token length: 8,192
|
||||
- `qwen-plus-latest` Context window: 131,072 Max input token length: 98,304 (thinking) Max generation token length: 129,024 Max output: 16,384
|
||||
- `qwen-turbo-latest` Context window: 1,000,000 Max input token length: 1,000,000 Max generation token length: 16,384
|
||||
- `qwen-vl-max-latest` (Grounding) Context window: 131,072 Max input token length: 129,024 Max generation token length: 8,192
|
||||
- `qwen-vl-plus-latest` (Grounding) Context window: 131,072 Max input token length: 129,024 Max generation token length: 8,192
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `text-embedding-v4`
|
||||
- `text-embedding-v3`
|
||||
|
||||
📚 **Reference Link:** <https://bailian.console.aliyun.com/?tab=doc#/doc/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2840914.html&renderType=iframe>
|
||||
|
||||
---
|
||||
|
||||
### 12. ByteDance Doubao
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `doubao-seed-1-6-flash-250615` Context window: 256k Max input token length: 224k Max generation token length: 32k Max thinking content token length: 32k
|
||||
- `doubao-seed-1-6-thinking-250715` Context window: 256k Max input token length: 224k Max generation token length: 32k Max thinking content token length: 32k
|
||||
- `doubao-seed-1-6-250615` Context window: 256k Max input token length: 224k Max generation token length: 32k Max thinking content token length: 32k
|
||||
- `doubao-1.5-vision-pro-250328` (Grounding) Context window: 128k Max input token length: 96k Max generation token length: 16k Max thinking content token length: 32k
|
||||
- `doubao-1-5-thinking-vision-pro-250428` (Grounding) Context window: 128k Max input token length: 96k Max generation token length: 16k Max thinking content token length: 32k
|
||||
- `doubao-1-5-ui-tars-250428` (Grounding) Context window: 128k Max input token length: 96k Max generation token length: 16k Max thinking content token length: 32k
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `doubao-embedding-large-text-250515`
|
||||
- `doubao-embedding-text-240715`
|
||||
|
||||
📚 **Reference Link:** <https://console.volcengine.com/ark/region:ark+cn-beijing/model?vendor=Bytedance&view=LIST_VIEW>
|
||||
|
||||
---
|
||||
|
||||
### 13. Zhipu GLM
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `GLM-4.5` Max in: 128k Max output: 0.2K
|
||||
- `GLM-4.5-X` Max in: 128k Max output: 0.2K
|
||||
- `GLM-4.5-Air` Max in: 128k Max output: 0.2K
|
||||
- `GLM-4-Plus`
|
||||
- `GLM-4-Air-250414`
|
||||
- `GLM-4-AirX` (Grounding)
|
||||
- `GLM-4V-Plus-0111` (Grounding)
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `Embedding-3`
|
||||
- `Embedding-2`
|
||||
|
||||
📚 **Reference Link:** <https://open.bigmodel.cn/pricing>
|
||||
|
||||
---
|
||||
|
||||
### 14. SiliconFlow
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
- `Kimi-K2-Instruct` Context Length: 128K
|
||||
- `DeepSeek-V3`
|
||||
- `DeepSeek-R1`
|
||||
- `Qwen3-32B`
|
||||
|
||||
📚 **Reference Link:** <https://cloud.siliconflow.cn/sft-d1pi8rbk20jc73c62gm0/models>
|
||||
|
||||
---
|
||||
|
||||
## 🔤 Dedicated Embedding Providers
|
||||
|
||||
### 15. Jina AI
|
||||
|
||||
**Embedding Models:**
|
||||
|
||||
- `jina-embeddings-v4`
|
||||
- `jina-embeddings-v3`
|
||||
|
||||
📚 **Reference Link:** <https://jina.ai/embeddings>
|
||||
|
||||
---
|
||||
|
||||
## 🔍 AI Search Engines
|
||||
|
||||
### 16. Bocha AI
|
||||
|
||||
**Service Type:** AI Research & Search
|
||||
|
||||
📚 **Reference Link:** <https://open.bochaai.com/overview>
|
||||
|
||||
---
|
||||
|
||||
### 17. Exa
|
||||
|
||||
**Service Type:** AI Research & Search
|
||||
|
||||
**Pricing Model:**
|
||||
|
||||
- $5.00 / 1k agent searches
|
||||
- $5.00 / 1k exa-research agent page reads
|
||||
- $10.00 / 1k exa-research-pro agent page reads
|
||||
- $5.00 / 1M reasoning tokens
|
||||
|
||||
📚 **Reference Link:** <https://dashboard.exa.ai/home>
|
||||
826
mm_agents/maestro/tools/new_tools.py
Normal file
826
mm_agents/maestro/tools/new_tools.py
Normal file
@@ -0,0 +1,826 @@
|
||||
"""
|
||||
Tools module for GUI agents.
|
||||
|
||||
This module provides various tools for GUI agents to perform tasks such as web search,
|
||||
context fusion, subtask planning, trajectory reflection, memory retrieval, grounding,
|
||||
evaluation, and action generation.
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import base64
|
||||
import requests
|
||||
import time
|
||||
from typing import Dict, Any, Optional, List, Union, Tuple
|
||||
from abc import ABC, abstractmethod
|
||||
import logging
|
||||
from ..core.mllm import LLMAgent, WebSearchAgent, EmbeddingAgent
|
||||
import threading
|
||||
from ..prompts import get_prompt, module
|
||||
|
||||
logger = logging.getLogger("desktopenv.tools")
|
||||
|
||||
class BaseTool(ABC):
|
||||
"""Base class for all tools."""
|
||||
_prompts_dict = None
|
||||
_prompts_dict_lock = threading.Lock()
|
||||
# Directory retained for backward compatibility; no longer scanned directly
|
||||
_prompts_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "prompts")
|
||||
|
||||
@classmethod
|
||||
def _load_prompts_dict(cls):
|
||||
# Deprecated: kept for compatibility if other code accesses _prompts_dict.
|
||||
# Now pull prompts via the registry to avoid direct filesystem coupling.
|
||||
if cls._prompts_dict is None:
|
||||
with cls._prompts_dict_lock:
|
||||
if cls._prompts_dict is None:
|
||||
cls._prompts_dict = {}
|
||||
|
||||
def __init__(self, provider: str, model_name: str, tool_name: str):
|
||||
"""
|
||||
Initialize the base tool.
|
||||
Args:
|
||||
provider: API provider name (e.g., "gemini", "openai")
|
||||
model_name: Model name to use (e.g., "gemini-2.5-pro")
|
||||
tool_name: Name of the tool (used as key in prompts files)
|
||||
"""
|
||||
self.provider = provider
|
||||
self.model_name = model_name
|
||||
self.tool_name = tool_name
|
||||
self._load_prompts_dict()
|
||||
self._prompt_template = self._get_prompt_template()
|
||||
# Create LLMAgent instance for tool usage
|
||||
self.engine_params = {
|
||||
"engine_type": provider,
|
||||
"model": model_name
|
||||
}
|
||||
self.llm_agent = LLMAgent(engine_params=self.engine_params, system_prompt=self._prompt_template)
|
||||
|
||||
def _get_prompt_template(self) -> str:
|
||||
if self.tool_name is None:
|
||||
return ""
|
||||
# Prefer reading prompt text directly from gui_agents.prompts.module
|
||||
try:
|
||||
prompt_category_map = {
|
||||
# manager prompts
|
||||
"query_formulator": ("manager", "query_formulator"),
|
||||
"narrative_summarization": ("manager", "narrative_summarization"),
|
||||
"context_fusion": ("manager", "context_fusion"),
|
||||
"planner_role": ("manager", "planner_role"),
|
||||
"supplement_role": ("manager", "supplement_role"),
|
||||
"dag_translator": ("manager", "dag_translator"),
|
||||
"objective_alignment": ("manager", "objective_alignment"),
|
||||
# worker prompts
|
||||
"operator_role": ("worker", "operator_role"),
|
||||
"technician_role": ("worker", "technician_role"),
|
||||
"analyst_role": ("worker", "analyst_role"),
|
||||
"grounding": ("worker", "grounding"),
|
||||
"text_span": ("worker", "text_span"),
|
||||
"episode_summarization": ("worker", "episode_summarization"),
|
||||
# evaluator prompts
|
||||
"worker_success_role": ("evaluator", "worker_success_role"),
|
||||
"worker_stale_role": ("evaluator", "worker_stale_role"),
|
||||
"periodic_role": ("evaluator", "periodic_role"),
|
||||
"final_check_role": ("evaluator", "final_check_role"),
|
||||
}
|
||||
|
||||
# Tools that should be prefixed with system architecture info
|
||||
tools_require_system_prefix = {
|
||||
"planner_role",
|
||||
"supplement_role",
|
||||
"dag_translator",
|
||||
"operator_role",
|
||||
"technician_role",
|
||||
"analyst_role",
|
||||
"worker_success_role",
|
||||
"worker_stale_role",
|
||||
"periodic_role",
|
||||
"final_check_role",
|
||||
"objective_alignment",
|
||||
}
|
||||
|
||||
category_tuple = prompt_category_map.get(self.tool_name)
|
||||
|
||||
prompt_text = ""
|
||||
if category_tuple is None:
|
||||
# Try root-level attribute on module (e.g., system_architecture)
|
||||
if hasattr(module, self.tool_name):
|
||||
prompt_text = getattr(module, self.tool_name)
|
||||
else:
|
||||
return ""
|
||||
else:
|
||||
category_name, key_name = category_tuple
|
||||
category_obj = getattr(module, category_name, None)
|
||||
if category_obj is None:
|
||||
return ""
|
||||
value = getattr(category_obj, key_name, None)
|
||||
if isinstance(value, str) and value:
|
||||
prompt_text = value
|
||||
else:
|
||||
return ""
|
||||
|
||||
# Optionally prefix with system architecture information for selected tools
|
||||
if (
|
||||
isinstance(prompt_text, str)
|
||||
and prompt_text
|
||||
and self.tool_name in tools_require_system_prefix
|
||||
):
|
||||
system_info = getattr(module, "system_architecture", "")
|
||||
if isinstance(system_info, str) and system_info:
|
||||
return f"{system_info}\n\n{prompt_text}"
|
||||
|
||||
return prompt_text
|
||||
except Exception:
|
||||
# Fallback to registry to allow central overrides if available
|
||||
return ""
|
||||
|
||||
def _call_lmm(self, input_data: Dict[str, Any], temperature: float = 0.0):
|
||||
"""
|
||||
Call the LMM model for inference using the prompt template with retry mechanism
|
||||
|
||||
Args:
|
||||
input_data: Dictionary containing input data to format the prompt template
|
||||
temperature: Temperature parameter to control randomness of output
|
||||
|
||||
Returns:
|
||||
Model response as text
|
||||
"""
|
||||
# self.llm_agent.reset()
|
||||
|
||||
# Extract text and image inputs
|
||||
text_input = input_data.get('str_input', '')
|
||||
image_input = input_data.get('img_input', None)
|
||||
|
||||
# Add the message with the formatted prompt
|
||||
self.llm_agent.reset()
|
||||
self.llm_agent.add_message(text_input, image_content=image_input, role="user")
|
||||
|
||||
# Implement safe retry mechanism
|
||||
max_retries = 3
|
||||
attempt = 0
|
||||
content, total_tokens, cost_string = "", [0, 0, 0], ""
|
||||
|
||||
while attempt < max_retries:
|
||||
try:
|
||||
content, total_tokens, cost_string = self.llm_agent.get_response(temperature=temperature)
|
||||
break # If successful, break out of the loop
|
||||
except Exception as e:
|
||||
attempt += 1
|
||||
logger.error(f"LLM call attempt {attempt} failed: {str(e)}")
|
||||
if attempt == max_retries:
|
||||
logger.error("Max retries reached. Returning error message.")
|
||||
return f"Error: LLM call failed after {max_retries} attempts: {str(e)}", [0, 0, 0], ""
|
||||
time.sleep(1.0)
|
||||
return content, total_tokens, cost_string
|
||||
|
||||
@abstractmethod
|
||||
def execute(self, tool_input: Dict[str, Any]) -> Tuple[str, List[int], str]:
|
||||
"""
|
||||
Execute the tool with the given input.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the input for the tool
|
||||
Expected to have 'str_input' and/or 'img_input' keys
|
||||
|
||||
Returns:
|
||||
The output of the tool as a string
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class ToolFactory:
|
||||
"""Factory class for creating tools."""
|
||||
|
||||
@staticmethod
|
||||
def create_tool(tool_name: str, provider: str, model_name: str, **kwargs) -> 'BaseTool':
|
||||
"""
|
||||
Create a tool instance based on the tool name.
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool to create
|
||||
provider: API provider name
|
||||
model_name: Model name to use
|
||||
**kwargs: Additional parameters to pass to the tool
|
||||
|
||||
Returns:
|
||||
An instance of the specified tool
|
||||
|
||||
Raises:
|
||||
ValueError: If the tool name is not recognized
|
||||
"""
|
||||
tool_map = {
|
||||
"embedding": (EmbeddingTool, None), # all
|
||||
|
||||
"query_formulator": (QueryFormulatorTool, "query_formulator"), # manager
|
||||
"websearch": (WebSearchTool, None), # manager
|
||||
"narrative_summarization": (NarrativeSummarizationTool, "narrative_summarization"), # manager
|
||||
"context_fusion": (ContextFusionTool, "context_fusion"), # manager
|
||||
"planner_role": (SubtaskPlannerTool, "planner_role"), # manager
|
||||
"supplement_role": (SubtaskPlannerTool, "supplement_role"), # manager
|
||||
"dag_translator": (DAGTranslatorTool, "dag_translator"), # manager
|
||||
"objective_alignment": (ObjectiveAlignmentTool, "objective_alignment"), # manager
|
||||
|
||||
"operator_role": (ActionGeneratorTool, "operator_role"), # worker
|
||||
"technician_role": (ActionGeneratorTool, "technician_role"), # worker
|
||||
"analyst_role": (ActionGeneratorTool, "analyst_role"), # worker
|
||||
"grounding": (GroundingTool, "grounding"), # worker
|
||||
"text_span": (TextSpanTool, "text_span"), # worker
|
||||
"episode_summarization": (EpisodeSummarizationTool, "episode_summarization"), # worker
|
||||
|
||||
"worker_success_role": (EvaluatorTool, "worker_success_role"), # evaluator
|
||||
"worker_stale_role": (EvaluatorTool, "worker_stale_role"), # evaluator
|
||||
"periodic_role": (EvaluatorTool, "periodic_role"), # evaluator
|
||||
"final_check_role": (EvaluatorTool, "final_check_role"), # evaluator
|
||||
}
|
||||
|
||||
if tool_name not in tool_map:
|
||||
raise ValueError(f"Unknown tool name: {tool_name}")
|
||||
|
||||
tool_class, prompt_key = tool_map[tool_name]
|
||||
|
||||
# WebSearchTool and EmbeddingTool don't need a prompt
|
||||
if tool_name == "websearch":
|
||||
return tool_class(provider, model_name, None, **kwargs)
|
||||
if tool_name == "embedding":
|
||||
return tool_class(provider, model_name, None, **kwargs)
|
||||
|
||||
return tool_class(provider, model_name, prompt_key, **kwargs)
|
||||
|
||||
|
||||
class WebSearchTool(BaseTool):
|
||||
"""Tool for performing web searches."""
|
||||
|
||||
def __init__(self, provider: str, model_name: str, tool_name: str):
|
||||
"""
|
||||
Initialize the web search tool.
|
||||
|
||||
Args:
|
||||
provider: API provider name (e.g., "bocha", "exa")
|
||||
model_name: Model name to use (not used for WebSearchAgent)
|
||||
tool_name: Name of the tool (used as key in prompts.json)
|
||||
"""
|
||||
self.provider = provider
|
||||
|
||||
# Create WebSearchAgent instance for search
|
||||
self.engine_params = {
|
||||
"engine_type": provider,
|
||||
"model": model_name,
|
||||
}
|
||||
|
||||
# Initialize WebSearchAgent
|
||||
self.search_agent = WebSearchAgent(engine_params=self.engine_params)
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]) -> Tuple[str, List[int], str]:
|
||||
"""
|
||||
Execute a web search with the given query.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the search query
|
||||
Expected to have 'str_input' key with the search query
|
||||
|
||||
Returns:
|
||||
Search results as a string
|
||||
"""
|
||||
query = tool_input.get('str_input', '')
|
||||
if not query:
|
||||
return "Error: No search query provided", [0, 0, 0], ""
|
||||
|
||||
try:
|
||||
# Get the answer from the search results
|
||||
answer, total_tokens, cost = self.search_agent.get_answer(query)
|
||||
|
||||
# Return just the answer
|
||||
return answer, total_tokens, cost # type: ignore
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during web search: {str(e)}")
|
||||
return f"Error: Web search failed: {str(e)}", [0, 0, 0], ""
|
||||
|
||||
|
||||
class ContextFusionTool(BaseTool):
|
||||
"""Tool for fusing multiple contexts together."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Fuse multiple contexts together.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the contexts to fuse
|
||||
Expected to have 'str_input' key with JSON-formatted contexts
|
||||
|
||||
Returns:
|
||||
Fused context as a string
|
||||
"""
|
||||
contexts = tool_input.get('str_input', '')
|
||||
if not contexts:
|
||||
return "Error: No contexts provided"
|
||||
|
||||
# Use the prompt template and LMM for context fusion
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class SubtaskPlannerTool(BaseTool):
|
||||
"""Tool for planning subtasks."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Plan subtasks for a given task.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the task description
|
||||
Expected to have 'str_input' key with the task description
|
||||
May also have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Subtask plan as a string
|
||||
"""
|
||||
task = tool_input.get('str_input', '')
|
||||
if not task:
|
||||
return "Error: No task description provided"
|
||||
|
||||
# Use the prompt template and LMM for subtask planning
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class NarrativeSummarizationTool(BaseTool):
|
||||
"""Tool for summarizing narrative memories."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Summarize narrative memories.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the narrative memory data
|
||||
Expected to have 'str_input' key with the narrative memory data
|
||||
May also have 'img_input' key with relevant images
|
||||
|
||||
Returns:
|
||||
Summarized narrative as a string
|
||||
"""
|
||||
narrative_data = tool_input.get('str_input', '')
|
||||
if not narrative_data:
|
||||
return "Error: No narrative memory data provided"
|
||||
|
||||
# Use the prompt template and LMM for narrative summarization
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class EpisodeSummarizationTool(BaseTool):
|
||||
"""Tool for summarizing episodic memories."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Summarize episodic memories.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the episodic memory data
|
||||
Expected to have 'str_input' key with the episodic memory data
|
||||
May also have 'img_input' key with relevant images
|
||||
|
||||
Returns:
|
||||
Summarized episode as a string
|
||||
"""
|
||||
episode_data = tool_input.get('str_input', '')
|
||||
if not episode_data:
|
||||
return "Error: No episodic memory data provided"
|
||||
|
||||
# Use the prompt template and LMM for episode summarization
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class TextSpanTool(BaseTool):
|
||||
"""Tool for processing text spans."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Process text spans for a given input.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the text input
|
||||
Expected to have 'str_input' key with the text content
|
||||
May also have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Processed text spans as a string
|
||||
"""
|
||||
text = tool_input.get('str_input', '')
|
||||
if not text:
|
||||
return "Error: No text content provided"
|
||||
|
||||
# Use the prompt template and LMM for text span processing
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class DAGTranslatorTool(BaseTool):
|
||||
"""Tool for translating task descriptions into a DAG (Directed Acyclic Graph) structure."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Translate task descriptions into a DAG structure.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the task description
|
||||
Expected to have 'str_input' key with the task description
|
||||
May also have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
DAG representation as a string
|
||||
"""
|
||||
task = tool_input.get('str_input', '')
|
||||
if not task:
|
||||
return "Error: No task description provided"
|
||||
|
||||
# Use the prompt template and LMM for DAG translation
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class ObjectiveAlignmentTool(BaseTool):
|
||||
"""Tool for aligning and rewriting user objective with current screen context."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Align ambiguous or high-level user objective with the current desktop screenshot context
|
||||
and output a refined objective and assumptions.
|
||||
|
||||
Args:
|
||||
tool_input: Dict with keys:
|
||||
- 'str_input': the raw user objective or context text
|
||||
- 'img_input': optional screenshot image content
|
||||
|
||||
Returns:
|
||||
Refined objective as text (ideally JSON-structured), token count, and cost string
|
||||
"""
|
||||
text = tool_input.get('str_input', '')
|
||||
if not text:
|
||||
return "Error: No objective text provided", [0, 0, 0], ""
|
||||
# Forward to LMM with the prompt template
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class TrajReflectorTool(BaseTool):
|
||||
"""Tool for reflecting on execution trajectories."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Reflect on an execution trajectory.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the trajectory
|
||||
Expected to have 'str_input' key with the trajectory
|
||||
|
||||
Returns:
|
||||
Reflection as a string
|
||||
"""
|
||||
trajectory = tool_input.get('str_input', '')
|
||||
if not trajectory:
|
||||
return "Error: No trajectory provided"
|
||||
|
||||
# Use the prompt template and LMM for trajectory reflection
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
class GroundingTool(BaseTool):
|
||||
"""Tool for grounding agent actions in the environment."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Ground agent actions in the environment.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the action and environment state
|
||||
Expected to have 'str_input' key with the action
|
||||
Expected to have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Grounded action as a string
|
||||
"""
|
||||
action = tool_input.get('str_input', '')
|
||||
screenshot = tool_input.get('img_input')
|
||||
|
||||
if not action:
|
||||
return "Error: No action provided"
|
||||
if not screenshot:
|
||||
return "Error: No screenshot provided"
|
||||
|
||||
# Use the prompt template and LMM for action grounding
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
def get_grounding_wh(self):
|
||||
"""
|
||||
Get grounding width and height based on provider and model name.
|
||||
|
||||
Returns:
|
||||
If provider is doubao and model_name contains 'ui-tars', returns two values:
|
||||
grounding_width (int): Width value (1024)
|
||||
grounding_height (int): Height value (768)
|
||||
Otherwise returns None, None
|
||||
"""
|
||||
if self.provider == "doubao" and ("ui-tars" in self.model_name or "ep-" in self.model_name):
|
||||
grounding_width = 1000
|
||||
grounding_height = 1000
|
||||
return grounding_width, grounding_height
|
||||
return None, None
|
||||
|
||||
|
||||
class EvaluatorTool(BaseTool):
|
||||
"""Tool for evaluating agent performance."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Evaluate agent performance.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the evaluation data
|
||||
Expected to have 'str_input' key with the evaluation data
|
||||
|
||||
Returns:
|
||||
Evaluation result as a string
|
||||
"""
|
||||
eval_data = tool_input.get('str_input', '')
|
||||
if not eval_data:
|
||||
return "Error: No evaluation data provided"
|
||||
|
||||
# Use the prompt template and LMM for performance evaluation
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class ActionGeneratorTool(BaseTool):
|
||||
"""Tool for generating executable actions."""
|
||||
|
||||
def __init__(self, provider: str, model_name: str, tool_name: str, **kwargs):
|
||||
"""
|
||||
Initialize the action generator tool.
|
||||
|
||||
Args:
|
||||
provider: API provider name
|
||||
model_name: Model name to use
|
||||
tool_name: Name of the tool (used as key in prompts.json)
|
||||
**kwargs: Additional parameters, including:
|
||||
enable_search: Whether to enable web search functionality
|
||||
search_provider: Provider for web search (defaults to "bocha")
|
||||
search_model: Model for web search (defaults to "")
|
||||
"""
|
||||
super().__init__(provider, model_name, tool_name)
|
||||
|
||||
# Extract search-related parameters
|
||||
self.enable_search = kwargs.get("enable_search", False)
|
||||
search_provider = kwargs.get("search_provider", "bocha")
|
||||
search_model = kwargs.get("search_model", "")
|
||||
|
||||
# Initialize search tool if enabled
|
||||
self.search_tool = None
|
||||
if self.enable_search:
|
||||
self.search_tool = WebSearchTool(search_provider, search_model, "")
|
||||
logger.info(f"Web search enabled for {tool_name} using provider: {search_provider}")
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Generate executable actions.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the action request
|
||||
Expected to have 'str_input' key with the action request
|
||||
May also have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Generated action as a string
|
||||
"""
|
||||
action_request = tool_input.get('str_input', '')
|
||||
if not action_request:
|
||||
return "Error: No action request provided", [0, 0, 0], ""
|
||||
|
||||
# Check if search is enabled
|
||||
if self.enable_search and self.search_tool:
|
||||
try:
|
||||
# Use the input text directly as search query
|
||||
search_query = action_request
|
||||
logger.info(f"Performing web search for query: {search_query}")
|
||||
search_results, tokens, cost = self.search_tool.execute({"str_input": search_query})
|
||||
|
||||
# Enhance the action request with search results
|
||||
enhanced_request = f"[Action Request]\n{action_request}\n[End of Action Request]\n\n[Web Search Results for '{action_request}']\n{search_results}\n\n[End of Web Search Results]"
|
||||
tool_input["str_input"] = enhanced_request
|
||||
|
||||
logger.info(f"Search completed. Found information: {len(search_results)} characters")
|
||||
except Exception as e:
|
||||
logger.error(f"Error during web search: {e}")
|
||||
# Continue with original request if search fails
|
||||
|
||||
# Use the prompt template and LMM for action generation
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
|
||||
class FastActionGeneratorTool(BaseTool):
|
||||
"""Tool for directly generating executable actions without intermediate planning."""
|
||||
|
||||
def __init__(self, provider: str, model_name: str, tool_name: str, **kwargs):
|
||||
"""
|
||||
Initialize the fast action generator tool.
|
||||
|
||||
Args:
|
||||
provider: API provider name
|
||||
model_name: Model name to use
|
||||
tool_name: Name of the tool (used as key in prompts.json)
|
||||
**kwargs: Additional parameters, including:
|
||||
enable_search: Whether to enable web search functionality
|
||||
search_provider: Provider for web search (defaults to "bocha")
|
||||
search_model: Model for web search (defaults to "")
|
||||
"""
|
||||
super().__init__(provider, model_name, tool_name)
|
||||
|
||||
# Extract search-related parameters
|
||||
self.enable_search = kwargs.get("enable_search", False)
|
||||
search_provider = kwargs.get("search_provider", "bocha")
|
||||
search_model = kwargs.get("search_model", "")
|
||||
|
||||
# Initialize search tool if enabled
|
||||
self.search_tool = None
|
||||
if self.enable_search:
|
||||
self.search_tool = WebSearchTool(search_provider, search_model, "")
|
||||
logger.info(f"Web search enabled for {tool_name} using provider: {search_provider}")
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Generate executable actions directly from the instruction and screenshot.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the action request
|
||||
Expected to have 'str_input' key with the instruction
|
||||
Expected to have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Generated action as a string, token count, and cost
|
||||
"""
|
||||
action_request = tool_input.get('str_input', '')
|
||||
screenshot = tool_input.get('img_input')
|
||||
if not action_request:
|
||||
return "Error: No action request provided", [0, 0, 0], ""
|
||||
if not screenshot:
|
||||
return "Error: No screenshot provided", [0, 0, 0], ""
|
||||
# Check if search is enabled
|
||||
if self.enable_search and self.search_tool:
|
||||
try:
|
||||
# Use the input text directly as search query
|
||||
search_query = action_request
|
||||
logger.info(f"Performing web search for query: {search_query}")
|
||||
search_results, tokens, cost = self.search_tool.execute({"str_input": search_query})
|
||||
|
||||
# Enhance the action request with search results
|
||||
enhanced_request = f"[Action Request]\n{action_request}\n[End of Action Request]\n\n[Web Search Results for '{action_request}']\n{search_results}\n\n[End of Web Search Results]"
|
||||
tool_input["str_input"] = enhanced_request
|
||||
|
||||
logger.info(f"Search completed. Found information: {len(search_results)} characters")
|
||||
except Exception as e:
|
||||
logger.error(f"Error during web search: {e}")
|
||||
# Continue with original request if search fails
|
||||
|
||||
# Use the prompt template and LMM for action generation
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
def get_grounding_wh(self):
|
||||
"""
|
||||
Get grounding width and height based on provider and model name.
|
||||
|
||||
Returns:
|
||||
If provider is doubao and model_name contains 'ui-tars', returns two values:
|
||||
grounding_width (int): Width value (1024)
|
||||
grounding_height (int): Height value (768)
|
||||
Otherwise returns None, None
|
||||
"""
|
||||
if self.provider == "doubao" and "ui-tars" in self.model_name:
|
||||
grounding_width = 1000
|
||||
grounding_height = 1000
|
||||
return grounding_width, grounding_height
|
||||
return None, None
|
||||
|
||||
class EmbeddingTool(BaseTool):
|
||||
"""Tool for generating text embeddings."""
|
||||
|
||||
def __init__(self, provider: str, model_name: str, tool_name: str):
|
||||
"""
|
||||
Initialize the embedding tool.
|
||||
|
||||
Args:
|
||||
provider: API provider name (e.g., "openai", "gemini")
|
||||
model_name: Model name to use
|
||||
tool_name: Name of the tool (used as key in prompts.json)
|
||||
"""
|
||||
self.provider = provider
|
||||
self.model_name = model_name
|
||||
self.tool_name = tool_name
|
||||
|
||||
# Create EmbeddingAgent instance
|
||||
self.engine_params = {
|
||||
"engine_type": provider,
|
||||
"embedding_model": model_name
|
||||
}
|
||||
|
||||
# Initialize EmbeddingAgent
|
||||
self.embedding_agent = EmbeddingAgent(engine_params=self.engine_params)
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Generate embeddings for the given text.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the text to embed
|
||||
Expected to have 'str_input' key with the text
|
||||
|
||||
Returns:
|
||||
Embeddings as a JSON string
|
||||
"""
|
||||
text = tool_input.get('str_input', '')
|
||||
|
||||
if not text:
|
||||
return "Error: No text provided for embedding", [0, 0, 0], ""
|
||||
|
||||
try:
|
||||
# Get embeddings for the text
|
||||
embeddings, total_tokens, cost_string = self.embedding_agent.get_embeddings(text)
|
||||
return embeddings, total_tokens, cost_string
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during embedding operation: {str(e)}")
|
||||
return f"Error: Embedding operation failed: {str(e)}", [0, 0, 0], ""
|
||||
|
||||
class QueryFormulatorTool(BaseTool):
|
||||
"""Tool for formulating queries from tasks or contexts."""
|
||||
|
||||
def execute(self, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Formulate a query for a given task or context.
|
||||
|
||||
Args:
|
||||
tool_input: Dictionary containing the task or context description
|
||||
Expected to have 'str_input' key with the description
|
||||
May also have 'img_input' key with a screenshot
|
||||
|
||||
Returns:
|
||||
Formulated query as a string
|
||||
"""
|
||||
task = tool_input.get('str_input', '')
|
||||
if not task:
|
||||
return "Error: No task or context description provided"
|
||||
|
||||
# Use the prompt template and LMM for query formulation
|
||||
return self._call_lmm(tool_input)
|
||||
|
||||
class NewTools:
|
||||
"""Main Tools class that provides access to all available tools."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the Tools class."""
|
||||
self.tools = {}
|
||||
|
||||
def register_tool(self, tool_name: str, provider: str, model_name: str, **kwargs):
|
||||
"""
|
||||
Register a tool with the specified parameters.
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool to register
|
||||
provider: API provider name
|
||||
model_name: Model name to use
|
||||
**kwargs: Additional parameters to pass to the tool
|
||||
"""
|
||||
tool: BaseTool = ToolFactory.create_tool(tool_name, provider, model_name, **kwargs)
|
||||
self.tools[tool_name] = tool
|
||||
|
||||
def execute_tool(self, tool_name: str, tool_input: Dict[str, Any]):
|
||||
"""
|
||||
Execute a tool with the given input.
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool to execute
|
||||
tool_input: Input for the tool
|
||||
|
||||
Returns:
|
||||
The output of the tool as a string
|
||||
|
||||
Raises:
|
||||
ValueError: If the tool is not registered
|
||||
"""
|
||||
if tool_name not in self.tools:
|
||||
raise ValueError(f"Tool {tool_name} is not registered")
|
||||
|
||||
return self.tools[tool_name].execute(tool_input)
|
||||
|
||||
def reset(self, tool_name: Optional[str] = None):
|
||||
"""
|
||||
Reset tools by resetting their llm_agent if available.
|
||||
|
||||
Args:
|
||||
tool_name: Optional name of the specific tool to reset. If None, resets all tools.
|
||||
"""
|
||||
if tool_name is not None:
|
||||
# Reset a specific tool
|
||||
if tool_name not in self.tools:
|
||||
raise ValueError(f"Tool {tool_name} is not registered")
|
||||
|
||||
tool = self.tools[tool_name]
|
||||
if hasattr(tool, 'llm_agent') and tool.llm_agent is not None:
|
||||
tool.llm_agent.reset()
|
||||
else:
|
||||
# Reset all tools
|
||||
for tool in self.tools.values():
|
||||
# Only reset if the tool has an llm_agent attribute
|
||||
if hasattr(tool, 'llm_agent') and tool.llm_agent is not None:
|
||||
tool.llm_agent.reset()
|
||||
109
mm_agents/maestro/tools/new_tools_config.json
Normal file
109
mm_agents/maestro/tools/new_tools_config.json
Normal file
@@ -0,0 +1,109 @@
|
||||
{
|
||||
"tools": [
|
||||
{
|
||||
"tool_name": "embedding",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-embedding-text-240715"
|
||||
},
|
||||
{
|
||||
"tool_name": "query_formulator",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-seed-1-6-flash-250615"
|
||||
},
|
||||
{
|
||||
"tool_name": "websearch",
|
||||
"provider": "bocha",
|
||||
"model_name": ""
|
||||
},
|
||||
{
|
||||
"tool_name": "narrative_summarization",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-seed-1-6-flash-250615"
|
||||
},
|
||||
{
|
||||
"tool_name": "context_fusion",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-seed-1-6-flash-250615"
|
||||
},
|
||||
{
|
||||
"tool_name": "planner_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "supplement_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3",
|
||||
"enable_search": true
|
||||
},
|
||||
{
|
||||
"tool_name": "dag_translator",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "operator_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3",
|
||||
"enable_search": false,
|
||||
"search_provider": "bocha",
|
||||
"search_model": ""
|
||||
},
|
||||
{
|
||||
"tool_name": "technician_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3",
|
||||
"enable_search": false,
|
||||
"search_provider": "bocha",
|
||||
"search_model": ""
|
||||
},
|
||||
{
|
||||
"tool_name": "analyst_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3",
|
||||
"enable_search": false,
|
||||
"search_provider": "bocha",
|
||||
"search_model": ""
|
||||
},
|
||||
{
|
||||
"tool_name": "grounding",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-1-5-ui-tars-250428"
|
||||
},
|
||||
{
|
||||
"tool_name": "text_span",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-seed-1-6-flash-250615"
|
||||
},
|
||||
{
|
||||
"tool_name": "episode_summarization",
|
||||
"provider": "doubao",
|
||||
"model_name": "doubao-seed-1-6-flash-250615"
|
||||
},
|
||||
{
|
||||
"tool_name": "worker_success_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "worker_stale_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "periodic_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "final_check_role",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
},
|
||||
{
|
||||
"tool_name": "objective_alignment",
|
||||
"provider": "openrouter",
|
||||
"model_name": "openai/o3"
|
||||
}
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user