Commit Graph

45 Commits

Author SHA1 Message Date
David Chang
f08fa4912c ver Mar10th
changed AT element filtering
2024-03-10 18:03:02 +08:00
Timothyxxx
030574e316 Improve on mmagents prompts; initialize online tasks from Mind2Web 2024-02-22 22:01:22 +08:00
Timothyxxx
068c6f5769 122324154 2024-02-02 14:36:53 +08:00
Timothyxxx
32bcdd0937 Modify the logic of SoM agent 2024-02-01 18:58:22 +08:00
Timothyxxx
c31c9f4e7d Merge remote-tracking branch 'origin/main'
# Conflicts:
#	mm_agents/gpt_4v_agent.py
2024-02-01 16:57:01 +08:00
Timothyxxx
59e2417a08 Add Mistral, Qwen, Gemini support; Fix minor bugs 2024-02-01 16:55:38 +08:00
David Chang
5d436a6b66 ver Feb1st
human evaluation and SoM experiments on Thunderbird
2024-02-01 11:38:46 +08:00
Timothyxxx
606fab4cfa Fix minor bug when get a11y tree and linearize for agent 2024-01-31 00:29:51 +08:00
David Chang
da306376da ver Jan30th
updated function to get AT on Windows
2024-01-30 20:06:58 +08:00
David Chang
9e91b8a5a8 ver Jan29thv2
check som implementation
2024-01-30 00:25:00 +08:00
David Chang
d8a497a417 ver Jan29th
updated the position of SoM marks
2024-01-29 21:49:53 +08:00
Timothyxxx
cc21c3a6b1 Fix some errors found in calc examples 2024-01-28 21:19:18 +08:00
David Chang
8525825fb2 Merge branch 'zdy' 2024-01-27 23:18:33 +08:00
David Chang
5a486b6b37 ver Jan27th
debugged at+screenshot implementation, no issues found
fixed a little bugs
2024-01-27 23:10:48 +08:00
Timothyxxx
909aa868f3 Improve on agent codes; add auto-running experiments code; Fix some examples 2024-01-27 19:47:47 +08:00
David Chang
eef5158663 ver Jan26thv3
fixed bug caused by an empty node.text
remove nodes whose name and text are all empty
2024-01-26 23:49:15 +08:00
David Chang
73de0e387a Merge branch 'zdy' 2024-01-26 23:31:41 +08:00
Timothyxxx
6952b45de4 Improve on agent and tasks configs 2024-01-26 23:30:04 +08:00
David Chang
773c5ed40c ver Jan26thv4
updated linearized_accessibility_tree to add a column of "text"
removed replacement chars like uFFFC in thunderbird
2024-01-26 23:29:09 +08:00
David Chang
8d358d63ed ver Jan26thv3
updated agent history handling
2024-01-26 22:07:38 +08:00
Timothyxxx
6f27c5bf50 Wrap up SeeAct implementation 2024-01-20 19:19:37 +08:00
Timothyxxx
f88331416c Refactor baselines code implementations 2024-01-20 18:55:21 +08:00
Timothyxxx
09f3e776ae Initialize all baselines: screenshot, a11y tree, both, SoM, SeeAct 2024-01-20 00:13:46 +08:00
Timothyxxx
46bd3386dd Support input screenshot and a11y tree altogether 2024-01-19 20:34:47 +08:00
Timothyxxx
20b1d950a0 FIx corner cases (val connection in chrome when using playwright, and action parsing for agent, and accessibility tree xml handling) 2024-01-16 22:00:01 +08:00
Timothyxxx
186bf2e97c Implement heuristic cutting on the accessibility tree to get the important nodes; Finish accessibility tree text agent 2024-01-16 16:43:32 +08:00
Timothyxxx
48a86d36cf Minor updates 2024-01-16 12:15:21 +08:00
Timothyxxx
8efa692951 Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs 2024-01-16 11:58:23 +08:00
Timothyxxx
493b719821 Add gemini agent implementation; Add missed requirements; Minor fix some small bugs 2024-01-15 21:58:33 +08:00
Timothyxxx
f153a4c253 Add 'WAIT', 'FAIL', 'DONE' to the action space; Debug basic prompting-based GPT-4 and Gemini agents; Initialize experiments script; 2024-01-14 23:36:19 +08:00
Timothyxxx
fa84b20ea5 VLC updates, and some infra bugs fix 2024-01-09 09:30:11 +08:00
Timothyxxx
3cbb57f24c Add the GUI set-of-mark object detector data collection script 2024-01-05 11:00:31 +08:00
Hilbert-Johnson
8ac88e9617 pass test case 2024-01-02 01:10:46 +08:00
Hilbert-Johnson
7560f4dc46 update SoM_agent 2023-12-31 19:13:17 +08:00
Hilbert-Johnson
86c6a473e2 add initail SoM_agent 2023-12-28 13:43:44 +08:00
Timothyxxx
30064ff816 Fix conflicts 2023-12-16 21:32:43 +08:00
Timothyxxx
e51ef4b91d Make up 2023-12-02 18:02:45 +08:00
Timothyxxx
9b214b3d23 Action space thoughts 2023-12-02 18:02:06 +08:00
Timothyxxx
992d8f8fce Refactor with pyautogui 2023-12-02 17:52:00 +08:00
Timothyxxx
e52ba2ab13 Fix the width and height of vm, make agent perform more accurate 2023-11-30 12:10:41 +08:00
Timothyxxx
80b148793d Initialize visual components such as SAM for assistance 2023-11-29 20:22:48 +08:00
Timothyxxx
3d0d9d7758 Run through gpt_4v agent pipeline 2023-11-29 20:21:57 +08:00
Timothyxxx
8470264884 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:38:22 +08:00
Timothyxxx
054f545942 Initialize GPT-4v agent, and prompt for current observation space 2023-11-28 00:23:50 +08:00
Timothyxxx
8272e93953 Add DuckTrack as initial annotation tool; Initial multimodal test 2023-11-27 00:34:57 +08:00