Fix https://github.com/xlang-ai/OSWorld/issues/21 ; Update README for multimodal agents; Add badge in README; Add setup.py

2024-04-15 18:47:54 +08:00
parent 9c75df5dce
commit 6777ea255a
4 changed files with 116 additions and 3 deletions
--- a/mm_agents/README.md
+++ b/mm_agents/README.md
@@ -31,7 +31,7 @@ agent = PromptAgent(
 agent.reset()
 # say we have an instruction and observation
 instruction = "Please help me to find the nearest restaurant."
-obs = {"screenshot": "path/to/observation.jpg"}
+obs = {"screenshot": open("path/to/observation.jpg", 'rb').read()}
 response, actions = agent.predict(
    instruction,
    obs
@@ -51,8 +51,9 @@ And the following action spaces:

 To feed an observation into the agent, you have to maintain the `obs` variable as a dict with the corresponding information:
 ```python
+# continue from the previous code snippet
 obs = {
-    "screenshot": "path/to/observation.jpg",
+    "screenshot": open("path/to/observation.jpg", 'rb').read(),
    "a11y_tree": ""  # [a11y_tree data]
 }
 response, actions = agent.predict(