Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure.
This commit is contained in:
@@ -158,6 +158,10 @@ The setup is straightforward:
|
||||
1. Launch the host instance in the EC2 console via the AWS console and note the **VPC ID** and **Subnet ID** shown in its network settings.
|
||||
2. Record the **Subnet ID** as you will need to set it as the environment variable `AWS_SUBNET_ID` on the host machine before starting the client code.
|
||||
|
||||
<p align="center">
|
||||
<img src="./assets/pubeval_subnet.png" alt="pubeval_subnet" style="width: 80%;" />
|
||||
</p>
|
||||
|
||||
### 1.3 Get AWS Access Keys & Secret Access Key
|
||||
|
||||
Click on **Security Credentials** from the drop-down menu under your account in the top-right corner.
|
||||
@@ -280,7 +284,23 @@ Then, open your Host's **public IP** on port `8080` in a browser. (eg. `http://<
|
||||
|
||||
For more, see: [MONITOR_README](./monitor/README.md)
|
||||
|
||||
<p align="center">
|
||||
<img src="./assets/pubeval_monitor1.jpg" alt="pubeval_monitor" style="width:80%;" />
|
||||
</p>
|
||||
<p align="center">
|
||||
<img src="./assets/pubeval_monitor2.jpg" alt="pubeval_monitor" style="width:80%;" />
|
||||
</p>
|
||||
|
||||
|
||||
### 4.2 VNC Remote Desktop Access
|
||||
We pre-install vnc for every virtual machine so you can have a look on it during the running.
|
||||
You can access via VNC at`http://<client-public-ip>:5910/vnc.html`
|
||||
The password set default is `osworld-public-evaluation` to prevent attack.
|
||||
|
||||
## 5. Contact the team to update leaderboard and fix errors (optional)
|
||||
|
||||
If you want your results to be displayed on the leaderboard, please send a message to the OSWorld leaderboard maintainers (tianbaoxiexxx@gmail.com, yuanmengqi732@gmail.com) and open a pull request. We can update the results in the self-reported section.
|
||||
|
||||
If you want your results to be verified and displayed in the verified leaderboard section, we need you to schedule a meeting with us to run your agent code on our side to obtain results and have us report them. Alternatively, if you are from a trusted institution, you can share your monitor and trajectories with us.
|
||||
|
||||
If you discover new errors or the environment has undergone some changes, please contact us via GitHub issues or email.
|
||||
|
||||
Reference in New Issue
Block a user