diff --git a/PUBLIC_EVALUATION_GUIDELINE.md b/PUBLIC_EVALUATION_GUIDELINE.md index adca7de..de4cf0f 100644 --- a/PUBLIC_EVALUATION_GUIDELINE.md +++ b/PUBLIC_EVALUATION_GUIDELINE.md @@ -158,6 +158,10 @@ The setup is straightforward: 1. Launch the host instance in the EC2 console via the AWS console and note the **VPC ID** and **Subnet ID** shown in its network settings. 2. Record the **Subnet ID** as you will need to set it as the environment variable `AWS_SUBNET_ID` on the host machine before starting the client code. +

+ pubeval_subnet +

+ ### 1.3 Get AWS Access Keys & Secret Access Key Click on **Security Credentials** from the drop-down menu under your account in the top-right corner. @@ -280,7 +284,23 @@ Then, open your Host's **public IP** on port `8080` in a browser. (eg. `http://< For more, see: [MONITOR_README](./monitor/README.md) +

+ pubeval_monitor +

+

+ pubeval_monitor +

+ + ### 4.2 VNC Remote Desktop Access We pre-install vnc for every virtual machine so you can have a look on it during the running. You can access via VNC at`http://:5910/vnc.html` The password set default is `osworld-public-evaluation` to prevent attack. + +## 5. Contact the team to update leaderboard and fix errors (optional) + +If you want your results to be displayed on the leaderboard, please send a message to the OSWorld leaderboard maintainers (tianbaoxiexxx@gmail.com, yuanmengqi732@gmail.com) and open a pull request. We can update the results in the self-reported section. + +If you want your results to be verified and displayed in the verified leaderboard section, we need you to schedule a meeting with us to run your agent code on our side to obtain results and have us report them. Alternatively, if you are from a trusted institution, you can share your monitor and trajectories with us. + +If you discover new errors or the environment has undergone some changes, please contact us via GitHub issues or email. diff --git a/assets/pubeval_monitor1.jpg b/assets/pubeval_monitor1.jpg new file mode 100644 index 0000000..e51ab12 Binary files /dev/null and b/assets/pubeval_monitor1.jpg differ diff --git a/assets/pubeval_monitor2.jpg b/assets/pubeval_monitor2.jpg new file mode 100644 index 0000000..4d5453d Binary files /dev/null and b/assets/pubeval_monitor2.jpg differ diff --git a/assets/pubeval_subnet.png b/assets/pubeval_subnet.png new file mode 100644 index 0000000..32884cd Binary files /dev/null and b/assets/pubeval_subnet.png differ