Clean Code; Refactor README

2024-03-27 16:21:49 +08:00
parent ee8e9451b4
commit 26ed70ef70
6 changed files with 128 additions and 91 deletions
--- a/mm_agents/README.md
+++ b/mm_agents/README.md
@@ -0,0 +1,65 @@
+# Agent
+## Prompt-based Agents
+
+### Supported Models
+We currently support the following models as the foundation models for the agents:
+- `GPT-3.5` (gpt-3.5-turbo-16k, ...)
+- `GPT-4` (gpt-4-0125-preview, gpt-4-1106-preview, ...)
+- `GPT-4V` (gpt-4-vision-preview, ...)
+- `Gemini-Pro`
+- `Gemini-Pro-Vision`
+- `Claude-3, 2` (claude-3-haiku-2024030, claude-3-sonnet-2024022, ...)
+- ...
+
+And those from open-source community:
+- `Mixtral 8x7B`
+- `QWEN`, `QWEN-VL`
+- `CogAgent`
+- ...
+
+And we will integrate and support more foundation models to support digital agent in the future, stay tuned.
+
+### How to use
+
+```python
+from mm_agents.agent import PromptAgent
+
+agent = PromptAgent(
+    model="gpt-4-0125-preview",
+    observation_type="screenshot",
+)
+agent.reset()
+# say we have a instruction and observation
+instruction = "Please help me to find the nearest restaurant."
+obs = {"screenshot": "path/to/observation.jpg"}
+response, actions = agent.predict(
+    instruction,
+    obs
+)
+```
+
+### Observation Space and Action Space
+We currently support the following observation spaces:
+- `a11y_tree`: the a11y tree of the current screen
+- `screenshot`: a screenshot of the current screen
+- `screenshot_a11y_tree`: a screenshot of the current screen with a11y tree
+- `som`: the set-of-mark trick on the current screen, with a table metadata
+
+And the following action spaces:
+- `pyautogui`: valid python code with `pyauotgui` code valid
+- `computer_13`: a set of enumerated actions designed by us
+
+To use feed an observation into the agent, you have to keep the obs variable as a dict with the corresponding information:
+```python
+obs = {
+    "screenshot": "path/to/observation.jpg",
+    "a11y_tree": ""  # [a11y_tree data]
+}
+response, actions = agent.predict(
+    instruction,
+    obs
+)
+```
+
+## Efficient Agents, Q* Agents, and more
+Stay tuned for more updates.