Commit Graph

14 Commits

Author SHA1 Message Date
Timothyxxx
6f27c5bf50 Wrap up SeeAct implementation 2024-01-20 19:19:37 +08:00
Timothyxxx
f88331416c Refactor baselines code implementations 2024-01-20 18:55:21 +08:00
Timothyxxx
09f3e776ae Initialize all baselines: screenshot, a11y tree, both, SoM, SeeAct 2024-01-20 00:13:46 +08:00
Timothyxxx
46bd3386dd Support input screenshot and a11y tree altogether 2024-01-19 20:34:47 +08:00
Timothyxxx
186bf2e97c Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent 2024-01-16 16:43:32 +08:00
Timothyxxx
8efa692951 Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs 2024-01-16 11:58:23 +08:00
Timothyxxx
493b719821 Add gemini agent implementation; Add missed requirements; Minor fix some small bugs 2024-01-15 21:58:33 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00
Timothyxxx
30064ff816 Fix conflicts 2023-12-16 21:32:43 +08:00
Timothyxxx
992d8f8fce Refactor with pyautogui 2023-12-02 17:52:00 +08:00
Timothyxxx
e52ba2ab13 Fix the width and height of vm, make agent perform more accurate 2023-11-30 12:10:41 +08:00
Timothyxxx
3d0d9d7758 Run through gpt_4v agent pipeline 2023-11-29 20:21:57 +08:00
Timothyxxx
8470264884 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:38:22 +08:00
Timothyxxx
054f545942 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:23:50 +08:00