Commit Graph

1099 Commits

Author SHA1 Message Date
zdy023
3cf80eaab8 ver Jul3rd
fixed several tasks
2025-07-03 20:55:30 +08:00
zdy023
c4b47886d9 ver Jul2nd
updated task requiring set up new email account
2025-07-02 20:46:04 +08:00
Zilong Zhou
595a704aff fix: fix proxy setup (#227)
* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example
2025-07-02 01:36:32 +08:00
Danyang Zhang
d4273d992e Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
Tianbao Xie
30138c5db1 VLC fix (#224)
* Enhance SetupController with improved logging and error handling during setup and file upload processes. Update instance type to t3.xlarge and AMI ID for AWS configuration. Add download progress logging and exception handling for better debugging.

* Enhance VLC status evaluation by adding multiple paths for file and URL information extraction, improving robustness against varying VLC XML structures. Implement detailed logging for better debugging and error handling in case of mismatches or missing data. Update example JSON for VLC evaluation to use a valid HLS stream URL.

* Improve audio comparison robustness in VLC evaluator by adding error handling for audio file loading and extraction. Implement detailed logging for empty or corrupt files, and normalize DTW distance calculation for more accurate similarity scoring. Remove deprecated audio fingerprint comparison function.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-29 20:18:44 +08:00
Tianbao Xie
0cc93543a8 Environment is_used flag; OS domain fix (#219)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

* Enhance DesktopEnv to track environment usage for optimized snapshot management. Introduce is_environment_used flag to determine if a snapshot revert is necessary based on provider type. Update setup and step methods to mark environment usage appropriately. Add new execute_with_verification method in SetupController for command execution with result verification, improving reliability. Change AWS instance type to m5.large for better performance and update AMI ID for compatibility. Update file opening logic in main.py to handle both file paths and application commands more effectively.

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-28 00:45:53 +08:00
MillanK
48ac57697a VSCode fix (#222) 2025-06-24 17:08:09 +08:00
Zilong Zhou
634e1c3d6f Reduce the startup time of the software on AWS from one minute to five seconds. (#221)
* feat: use SSD with high throughput

* fix&refactor: update AMI ID and change EBS volume type to gp3 with adjusted IOPS and throughput
2025-06-24 15:35:38 +08:00
Zilong Zhou
3d8f1779a2 feat: use SSD with high throughput (#218) 2025-06-17 18:39:42 +08:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
Kaixin Li
347238e17e Get VM IP again when getting screenshot fails (#215)
In rare cases, the IP of the VM changes after it launches. We can get the IP every time we retry to ensure the correct connection.
2025-06-16 02:40:40 +08:00
Yuan Mengqi
40354322e8 fix pub eval readme typo (#214)
* update clean code

* fix pub eval readme typo
2025-06-10 22:57:16 +08:00
Yuan Mengqi
362499330e update clean code (#213) 2025-06-10 22:18:03 +08:00
Yuan Mengqi
4ce05b89ae Merge pull request #212 from yuanmengqi/aws_clean
AWS OSWorld Provider Enhancement, Proxy Intergration, new Agent Operator Inplementation
2025-06-10 21:44:18 +08:00
yuanmengqi
8a1fc5c385 edit pub eval readme 2025-06-10 13:37:26 +00:00
yuanmengqi
b8d229cdb3 edit pub eval readme 2025-06-10 13:36:48 +00:00
yuanmengqi
fbe88799cf edit pub eval readme 2025-06-10 13:36:03 +00:00
yuanmengqi
3b5e4f3b15 edit pub eval readme 2025-06-10 13:34:42 +00:00
yuanmengqi
2d5439d062 edit pub eval readme 2025-06-10 13:32:24 +00:00
yuanmengqi
2d3347ca3e edit pub eval readme 2025-06-10 13:28:54 +00:00
yuanmengqi
1b09d63cb2 edit pub eval readme 2025-06-10 13:27:53 +00:00
yuanmengqi
2bae228803 merge upstream 2025-06-10 13:23:03 +00:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
caf487b7cc Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-10 02:36:46 +00:00
yuanmengqi
3da32fe5cf update operator prompt 2025-06-10 02:35:53 +00:00
yuanmengqi
caaa4e5baa fix: update AMI ID for us-east-1 region in AWS manager 2025-06-10 02:32:24 +00:00
yuanmengqi
02387f2cee feat: update DesktopEnv to support VMware provider and add proxy configuration
- Changed default provider name from "aws" to "vmware".
- Introduced `enable_proxy` parameter to control proxy support.
- Enhanced retry logic in the `reset` method to use a constant for maximum retries.
- Updated proxy handling to respect the new `enable_proxy` setting.
2025-06-09 16:35:13 +00:00
adlsdztony
168a2694f2 Merge branch 'feat/aws-provider-support' of https://github.com/xlang-ai/OSWorld into feat/aws-provider-support 2025-06-09 16:07:48 +00:00
adlsdztony
bfae51d74d fix: enhance setup method with retry logic and return status 2025-06-09 16:07:13 +00:00
yuanmengqi
692486f8e7 add GDrive guideline 2025-06-09 14:59:47 +00:00
yuanmengqi
630f92fd7c fix: correct URL encoding in JSON examples for invoice paths 2025-06-09 08:06:27 +00:00
yuanmengqi
b41339c5e5 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-09 04:27:07 +00:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
yuanmengqi
40edf0aba6 Merge remote-tracking branch 'origin/main' into feat/aws-provider-support 2025-06-08 14:40:38 +00:00
yuanmengqi
6029c9d496 Merge branch 'main' into feat/aws-provider-support 2025-06-08 14:24:44 +00:00
tsuky_chen
e55810809e Fix libreoffice impress evaluation (#209)
Co-authored-by: chenjix <211250101@smail.nju.edu.cn>
2025-06-08 22:12:56 +08:00
yuanmengqi
3e541bb393 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-08 04:01:35 +00:00
yuanmengqi
d8872634ee edit prompt 2025-06-08 03:59:31 +00:00
yuanmengqi
eaf7b9e48f refactor: replace hardcoded AMI ID with dynamic retrieval from IMAGE_ID_MAP in AWS DesktopEnv initialization 2025-06-07 21:17:18 +00:00
yuanmengqi
8853671220 fix: enhance instruction clarity and adjust timing in automation script for LibreOffice Impress example 2025-06-07 21:17:00 +00:00
yuanmengqi
ca65022137 fix: update AMI ID for us-east-1 region in AWS manager configuration 2025-06-07 21:16:26 +00:00
yuanmengqi
bba791f690 Merge remote-tracking branch 'chenjix/fix_impress' into feat/aws-provider-support 2025-06-07 17:41:50 +00:00
yuanmengqi
9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
yuanmengqi
54953f82fb Merge branch 'feat/aws-provider-support' 2025-06-07 16:02:39 +00:00
yuanmengqi
8471394cc1 add branch feat/aws-provider-support 2025-06-07 15:57:18 +00:00
yuanmengqi
f48d80002f Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-07 13:22:53 +00:00
yuanmengqi
c57b1d4e7a eval update 2025-06-07 13:19:22 +00:00
yuanmengqi
bbd4401ff5 Merge branch 'feat/aws-provider-support' of https://github.com/xlang-ai/OSWorld into feat/aws-provider-support 2025-06-07 11:40:26 +00:00
yuanmengqi
fc3ef6b2be fix: update AMI ID for us-east-1 region in AWS manager configuration 2025-06-07 11:40:09 +00:00
adlsdztony
0375f9d59f Merge branch 'feat/aws-provider-support' of https://github.com/xlang-ai/OSWorld into feat/aws-provider-support 2025-06-07 11:24:56 +00:00