Fix https://github.com/xlang-ai/OSWorld/issues/21 ; Update README for multimodal agents; Add badge in README; Add setup.py
This commit is contained in:
@@ -31,7 +31,7 @@ agent = PromptAgent(
|
||||
agent.reset()
|
||||
# say we have an instruction and observation
|
||||
instruction = "Please help me to find the nearest restaurant."
|
||||
obs = {"screenshot": "path/to/observation.jpg"}
|
||||
obs = {"screenshot": open("path/to/observation.jpg", 'rb').read()}
|
||||
response, actions = agent.predict(
|
||||
instruction,
|
||||
obs
|
||||
@@ -51,8 +51,9 @@ And the following action spaces:
|
||||
|
||||
To feed an observation into the agent, you have to maintain the `obs` variable as a dict with the corresponding information:
|
||||
```python
|
||||
# continue from the previous code snippet
|
||||
obs = {
|
||||
"screenshot": "path/to/observation.jpg",
|
||||
"screenshot": open("path/to/observation.jpg", 'rb').read(),
|
||||
"a11y_tree": "" # [a11y_tree data]
|
||||
}
|
||||
response, actions = agent.predict(
|
||||
|
||||
Reference in New Issue
Block a user