a3706004d011b1c256707374fe6c1212fce50a70
DesktopEnv: A Learning Environment for Human-like Computer Task Mastery
Setup guide
- Download OS image
- Download kubuntu from https://kubuntu.org/getkubuntu/
- Download ubuntu from https://ubuntu.com/download/desktop
- Download Windows from https://www.microsoft.com/en-au/software-download/windows10ISO
Download MacOS(Not possible to download legally)
- Setup virtual machine
- Create
Host Only Adapterand add it to the network adapter in the settings
- Create
- Set up bridge for connecting to VM
- Set up SSH server on VM: https://averagelinuxuser.com/ssh-into-virtualbox/
sudo apt install openssh-serversudo systemctl enable ssh --nowsudo ufw disable(disable firewall - safe for local network, otherwisesudo ufw allow ssh)ip a- find ip address- ssh username@<ip_address>
- On host, run
ssh-copy-id <username>@<ip_address>
- Install screenshot tool (in vm)
sudo apt install imagemagick-6.q16hdriDISPLAY=:0 import -window root screenshot.png
- Get screenshot
scp user@192.168.7.128:~/screenshot.png screenshot.pngrm -rf ~/screenshot.png
- Set up python and install mouse and keyboard
Road map (Proposed)
- Explore VMWare, and whether it can be connected and control through mouse package
- Explore Windows and MacOS, whether it can be installed
- MacOS is closed source and cannot be legally installed
- Windows is available legally and can be installed
- Build gym-like python interface for controlling the VM
- Recording of actions (mouse movement, click, keyboard) for human to annotate, and we can replay it
- This part may be conflict with work from Aran Komatsuzaki team, a.k.a. Duck AI
- Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
- Set up a pipeline and build agents implementation (zero-shot) for the task
- Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
- Start to annotate the examples for training and testing
Description
Languages
Python
98.4%
Shell
1.5%
Batchfile
0.1%