Commit Graph

21 Commits

Author SHA1 Message Date
Timothyxxx
20b1d950a0 FIx corner cases (val connection in chrome when using playwright, and action parsing for agent, and accessibility tree xml handling) 2024-01-16 22:00:01 +08:00
Timothyxxx
186bf2e97c Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent 2024-01-16 16:43:32 +08:00
Timothyxxx
48a86d36cf Minor updates 2024-01-16 12:15:21 +08:00
Timothyxxx
8efa692951 Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs 2024-01-16 11:58:23 +08:00
Timothyxxx
493b719821 Add gemini agent implementation; Add missed requirements; Minor fix some small bugs 2024-01-15 21:58:33 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00
Timothyxxx
fa84b20ea5 VLC updates, and some infra bugs fix 2024-01-09 09:30:11 +08:00
Timothyxxx
3cbb57f24c Add the GUI set-of-mark object detector data collection script 2024-01-05 11:00:31 +08:00
Hilbert-Johnson
8ac88e9617 pass test case 2024-01-02 01:10:46 +08:00
Hilbert-Johnson
7560f4dc46 update SoM_agent 2023-12-31 19:13:17 +08:00
Hilbert-Johnson
86c6a473e2 add initail SoM_agent 2023-12-28 13:43:44 +08:00
Timothyxxx
30064ff816 Fix conflicts 2023-12-16 21:32:43 +08:00
Timothyxxx
e51ef4b91d Make up 2023-12-02 18:02:45 +08:00
Timothyxxx
9b214b3d23 Action space thoughts 2023-12-02 18:02:06 +08:00
Timothyxxx
992d8f8fce Refactor with pyautogui 2023-12-02 17:52:00 +08:00
Timothyxxx
e52ba2ab13 Fix the width and height of vm, make agent perform more accurate 2023-11-30 12:10:41 +08:00
Timothyxxx
80b148793d Initialize visual components such as SAM for assistance 2023-11-29 20:22:48 +08:00
Timothyxxx
3d0d9d7758 Run through gpt_4v agent pipeline 2023-11-29 20:21:57 +08:00
Timothyxxx
8470264884 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:38:22 +08:00
Timothyxxx
054f545942 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:23:50 +08:00
Timothyxxx
8272e93953 Add DuckTrack as initial annotation tool; Initial multimodal test 2023-11-27 00:34:57 +08:00