Update README.md of agents

This commit is contained in:
Tianbao Xie
2024-04-12 18:25:05 +08:00
committed by GitHub
parent 7aafd3d8f6
commit 38f4506ea3

View File

@@ -2,7 +2,7 @@
## Prompt-based Agents
### Supported Models
We currently support the following models as the foundation models for the agents:
We currently support the following models as the foundational models for the agents:
- `GPT-3.5` (gpt-3.5-turbo-16k, ...)
- `GPT-4` (gpt-4-0125-preview, gpt-4-1106-preview, ...)
- `GPT-4V` (gpt-4-vision-preview, ...)
@@ -11,13 +11,13 @@ We currently support the following models as the foundation models for the agent
- `Claude-3, 2` (claude-3-haiku-2024030, claude-3-sonnet-2024022, ...)
- ...
And those from open-source community:
And those from the open-source community:
- `Mixtral 8x7B`
- `QWEN`, `QWEN-VL`
- `CogAgent`
- ...
And we will integrate and support more foundation models to support digital agent in the future, stay tuned.
In the future, we will integrate and support more foundational models to enhance digital agents, so stay tuned.
### How to use
@@ -25,11 +25,11 @@ And we will integrate and support more foundation models to support digital agen
from mm_agents.agent import PromptAgent
agent = PromptAgent(
model="gpt-4-0125-preview",
model="gpt-4-vision-preview",
observation_type="screenshot",
)
agent.reset()
# say we have a instruction and observation
# say we have an instruction and observation
instruction = "Please help me to find the nearest restaurant."
obs = {"screenshot": "path/to/observation.jpg"}
response, actions = agent.predict(
@@ -40,16 +40,16 @@ response, actions = agent.predict(
### Observation Space and Action Space
We currently support the following observation spaces:
- `a11y_tree`: the a11y tree of the current screen
- `a11y_tree`: the accessibility tree of the current screen
- `screenshot`: a screenshot of the current screen
- `screenshot_a11y_tree`: a screenshot of the current screen with a11y tree
- `som`: the set-of-mark trick on the current screen, with a table metadata
- `screenshot_a11y_tree`: a screenshot of the current screen with the accessibility tree overlay
- `som`: the set-of-mark trick on the current screen, with table metadata included.
And the following action spaces:
- `pyautogui`: valid python code with `pyauotgui` code valid
- `pyautogui`: valid Python code with `pyautogui` code valid
- `computer_13`: a set of enumerated actions designed by us
To use feed an observation into the agent, you have to keep the obs variable as a dict with the corresponding information:
To feed an observation into the agent, you have to maintain the `obs` variable as a dict with the corresponding information:
```python
obs = {
"screenshot": "path/to/observation.jpg",