Fix https://github.com/xlang-ai/OSWorld/issues/21 ; Update README for multimodal agents; Add badge in README; Add setup.py

This commit is contained in:
Timothyxxx
2024-04-15 18:47:54 +08:00
parent 9c75df5dce
commit 6777ea255a
4 changed files with 116 additions and 3 deletions

View File

@@ -31,7 +31,7 @@ agent = PromptAgent(
agent.reset()
# say we have an instruction and observation
instruction = "Please help me to find the nearest restaurant."
obs = {"screenshot": "path/to/observation.jpg"}
obs = {"screenshot": open("path/to/observation.jpg", 'rb').read()}
response, actions = agent.predict(
instruction,
obs
@@ -51,8 +51,9 @@ And the following action spaces:
To feed an observation into the agent, you have to maintain the `obs` variable as a dict with the corresponding information:
```python
# continue from the previous code snippet
obs = {
"screenshot": "path/to/observation.jpg",
"screenshot": open("path/to/observation.jpg", 'rb').read(),
"a11y_tree": "" # [a11y_tree data]
}
response, actions = agent.predict(