Commit Graph

84 Commits

Author SHA1 Message Date
Timothyxxx
f9594e476e Add Support for QWEN models from API (QWEN-max, etc.); Improve on the robustness of getting observation 2024-05-20 00:47:43 +08:00
Timothyxxx
e1e44d77c5 Improve the logic in auto-installation 2024-05-16 18:47:37 +08:00
Timothyxxx
cfdf296913 Fix the reset error when no config inputted; Improve on the robustness of auto installation of vms 2024-05-11 05:36:59 +08:00
Timothyxxx
a59d1f92aa Improve the logic in auto-installation; Add auto-remove on the failed files 2024-05-10 20:17:40 +08:00
Timothyxxx
e2741c64b0 Improve the logic in auto-installation; Add auto-remove on the failed files 2024-05-10 16:41:20 +08:00
Timothyxxx
bead06afc8 Improve the logic in auto-installation; Add auto-remove on the failed files 2024-05-10 16:24:07 +08:00
zdy023
c1a719d141 ver May1st
a temporary workround for Mac hosts.
2024-05-01 21:40:35 +08:00
Timothyxxx
aebbac7285 Increase the waiting time for vm setup 2024-04-25 15:53:12 +08:00
Timothyxxx
b3acf21333 Fix https://github.com/xlang-ai/OSWorld/issues/27 2024-04-23 14:04:59 +08:00
Timothyxxx
0b3e7dca24 Add support for automatic VM download and configuration, enable auto-scaling management; move metadata retrieval out of the init function to speed up environment initialization. 2024-04-21 19:51:15 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
172123ab2c Support downsampling; Fix bugs in windows a11y tree; Add a11y_tree trim 2024-03-25 18:02:48 +08:00
Timothyxxx
d1e2b12b41 Fix GIMP bug; Speedup the environment, when there is not a11y tree needed, we can do no controller.get 2024-03-20 22:22:59 +08:00
Timothyxxx
8e760fd450 Disable wandb temporarily, speedup the environment step speed by remove useless a11y tree re-get and terminal output 2024-03-19 08:57:05 +08:00
David Chang
2b9772174e ver Mar15th
fixed bugs about infeasible task evaluation
2024-03-15 12:25:41 +08:00
Timothyxxx
35ed7cec89 Update todos 2024-03-14 22:36:33 +08:00
Timothyxxx
44ff027801 Refactor experiments and agent implementation 2024-03-14 22:32:49 +08:00
Timothyxxx
c2aa009ed8 Update server script, baseline and running script 2024-03-13 15:04:19 +08:00
Timothyxxx
81863b26dd Improve on eval script on web browsing tasks; Add one setup example 2024-02-23 11:57:50 +08:00
Timothyxxx
938a3bb918 Fix get IP error 2024-02-22 20:39:07 +08:00
Timothyxxx
e1cf8da4e0 Fix the infeasible examples support 2024-02-21 21:22:12 +08:00
Timothyxxx
9373773169 add nogui parameter to environment 2024-02-20 17:59:31 +08:00
Timothyxxx
3f59ff46dc Add infeasible support 2024-02-14 11:59:50 +08:00
Timothyxxx
59e2417a08 Add Mistral, Qwen, Gemini support; Fix minor bugs 2024-02-01 16:55:38 +08:00
rhythmcao
fc15a33b70 finish multi-app examples 2024-02-01 00:53:31 +08:00
Timothyxxx
d65b6994d3 Fix minor bugs of multiple apps examples 2024-01-31 19:40:41 +08:00
David Chang
5a486b6b37 ver Jan27th
debugged at+screenshot implementation, no issues found
fixed a little bugs
2024-01-27 23:10:48 +08:00
Timothyxxx
b9ae4174b1 Fix OS examples annotated by Yitao 2024-01-25 19:57:32 +08:00
Timothyxxx
5dea912d01 Finish Chrome v2 loading 2024-01-24 23:05:28 +08:00
Timothyxxx
bdd21d06ca Fix minor bugs 2024-01-19 20:34:11 +08:00
rhythmcao
91824f754c 1. extend evaluator to list (compatible with single evaluator) 2. fix a variable name error in metrics/general.py 2024-01-18 14:12:54 +08:00
Timothyxxx
b60eb2a933 VM resolution adjust support 2024-01-18 01:43:57 +08:00
Timothyxxx
8efa692951 Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs 2024-01-16 11:58:23 +08:00
Timothyxxx
493b719821 Add gemini agent implementation; Add missed requirements; Minor fix some small bugs 2024-01-15 21:58:33 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00
Timothyxxx
d52b692ee5 Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc 2024-01-14 18:30:49 +08:00
Timothyxxx
bc88ee0c41 Minor fix of the logic of vm ip get 2024-01-12 21:18:59 +08:00
rhythmcao
d4116458ff 1. fix quote and \ characters in execute_command ; 2. add terminal output text as extra observation ; 3. move get_vm_*() to reset() 2024-01-12 18:09:05 +08:00
Timothyxxx
5a93a32958 Update on Chrome examples; Refactor on logic of controlling 2024-01-12 17:24:47 +08:00
Timothyxxx
820579a5a2 Make up missing getters and metrics; Update VLC scripts; Start to work on Chrome, update examples instructions 2024-01-11 21:27:40 +08:00
Timothyxxx
287876affc Merge remote-tracking branch 'origin/main'
# Conflicts:
#	desktop_env/evaluators/getters/__init__.py
#	desktop_env/evaluators/metrics/__init__.py
#	requirements.txt
2024-01-10 23:20:49 +08:00
Timothyxxx
49ece15ac3 VLC v1 finished, improve on instructions, improve on infra 2024-01-10 23:18:30 +08:00
David Chang
cf5d480f44 ver Jan10th
new Thunderbird task config
2024-01-10 17:36:59 +08:00
Timothyxxx
fa84b20ea5 VLC updates, and some infra bugs fix 2024-01-09 09:30:11 +08:00
David Chang
26b7d9010d Merge branch 'zdy' 2024-01-05 15:55:41 +08:00
David Chang
eeb8a120d6 ver Jan5th
debugged
2024-01-05 15:20:47 +08:00
David Chang
5fedf5b891 ver Jan4th
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx
ab71ebb2ba Initialize VLC getters and metrics, fix some bugs in infra logic, needs to be refactored later on 2024-01-04 17:05:17 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
David Chang
4e5920264a ver Dec27thv2
updated a task config
updated documents
fixed the options feature of evaluator
updated with new properties of charts
current load_charts should be ok, I think
2023-12-27 17:51:41 +08:00