Commit Graph

110 Commits

Author SHA1 Message Date
yuanmengqi
44bd66fc9a Increase timeout for page load stability in Chrome evaluator
- Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing.
- Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality.
- Enhanced logging to provide clearer insights into the page loading process.

These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
2025-07-18 14:16:16 +00:00
yuanmengqi
fcaefe7bb4 Enhance Chrome evaluator with improved error handling and retry mechanisms
- Added robust error handling for page processing, including checks for closed pages and HTTP status codes.
- Implemented retry logic for page loads and active tab checks to improve reliability.
- Enhanced logging throughout the process to capture detailed information about failures and successes.
- Preserved existing logic while ensuring better maintainability and robustness in the Chrome evaluator functions.
2025-07-18 07:13:13 +00:00
yuanmengqi
d1ddd3eacd feat: enhance VM wallpaper retrieval and image similarity checks
- Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation.
- Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling.
- Enhanced the SSIM structure check function with size validation and improved error handling for image processing.
- Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging.

These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
2025-07-17 18:19:09 +00:00
yuanmengqi
9d04624e41 feat: enhance Chrome evaluator with improved retry logic and logging
- Implemented retry mechanism for connecting to Chrome instances, allowing up to two attempts before failure.
- Increased timeout settings for page navigation and loading to enhance reliability.
- Added detailed logging for connection attempts, page loading status, and error handling to improve debugging and user experience.
- Ensured existing logic is preserved while enhancing error handling and operational robustness.

These changes improve the overall reliability and maintainability of the Chrome evaluator functions.
2025-07-17 11:15:47 +00:00
yuanmengqi
0651495d88 fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
yuanmengqi
877e75a013 Final review multi_apps fix Xinzhuang part 2025-07-12 16:34:55 +00:00
yuanmengqi
9be6fcd688 Check and fix on Chrome tasks
- Added `pytz` dependency to `requirements.txt` for timezone handling.
- Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility.
- Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability.
- Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context.
- Removed deprecated execution commands from JSON examples to streamline the evaluation process.
2025-07-06 07:52:37 +00:00
Yuan Mengqi
b2fb8b4222 fix chrome tasks (#230)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:32:41 +08:00
Zilong Zhou
4d9528f208 feat&fix: add proxy support in get_info_from_website function (#228) 2025-07-02 18:13:15 +08:00
Tianbao Xie
4e11eafd1d Robust Evaluation, Blocking File Open, Grader Sensitivity, and LibreOffice Writer Fixes (#217)
* Refactor evaluator structure in LibreOffice Writer example JSON to support multiple expected and result files, enhancing evaluation flexibility.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update instance type to t3.large and add VNC access URL logging for allocated VMs, enhancing remote access capabilities.

* Update time format in get_vm_file function to include hours, minutes, and seconds for more precise file naming with time suffix.

* More delay for 936321ce-5236-426a-9a20-e0e3c5dc536f; support one more potential solutions.

* Enhance SetupController with configurable retry limit and improved error handling for file opening requests. Introduce new function to compare unique training records, and update logging for better debugging. Adjust JSON examples for evaluation to support multiple expected and result files.

* Clean debug code

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-16 21:37:19 +08:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
Tianbao Xie
f4750701d4 Address https://github.com/xlang-ai/OSWorld/issues/130 2025-02-10 12:55:44 +08:00
MillanK
983283a86a patch: minor bug fixes for evaluator and task configurations, documentation update (#121)
* fix: /cursor_position api return format fix

* chore: update README.md to remove deprecated command

* fix: add base score for evaluators and minor bug fixes

* fix: add base score for setup configurations

---------

Co-authored-by: Jiaqi Deng <jiaqideng@Jiaqis-MacBook-Pro.local>
2025-01-18 22:25:18 +08:00
Tianbao Xie
20442244fa [Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld

* Debug

* Debug

* v1, internal version

* Add experiments script

* Fix minor bugs

* Update new endpoint

* Update ip

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix model name

* Fix docker close issues; update prompting

* Fix missed

* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'

* Fix server and chromium ports in setup

* Revert and add missed dependency

* Add VLC port for docker

* Update

* Clean

---------

Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
2024-11-11 12:36:16 +08:00
HappySix
6419d707bc Support Docker VM manager and provider (#75)
* Add docker provider framework

* Update VM download link

* Add stop container

* Update docker manager & provider

* Update

* Update

* Update provider
2024-09-28 21:10:40 +08:00
Jason Lee
0c5fbb8be4 fix local state file's location on macos M-chip computer 2024-06-23 08:43:53 -05:00
Jason Lee
7ab4ae360e fix bugs of functions in getters (macos M-chip version) 2024-06-23 08:30:34 -05:00
Jason Lee
0058add84d fix function: get_cookie_data (macos M-chip version) 2024-06-23 08:24:51 -05:00
Jason Lee
1ec95f7d61 fix function: "get_bookmarks" 2024-06-22 04:08:13 -05:00
Jason Lee
1c50770817 fix chrome evaluation bugs for macbook (#43) 2024-06-11 12:15:27 +08:00
Timothyxxx
9c75df5dce Clean code; Refactor environment to pass screenshot content instead of path 2024-04-13 23:34:01 +08:00
Timothyxxx
2d8eeaad58 Fix one bug in Chrome getter; fix one erro for corner case in doc 2024-04-02 14:50:29 +08:00
Timothyxxx
fad621093f Fix one bug in Chrome getter 2024-04-01 15:05:48 +08:00
Jason Lee
812be97a41 Merge branch 'main' of github.com:xlang-ai/DesktopEnv 2024-03-10 14:50:17 +08:00
Jason Lee
775cef744f xiaochuan correct his bugs in multiapp examples, you can try it again now 2024-03-10 14:48:56 +08:00
Timothyxxx
447c886b0a Fix multiple apps 5990457f-2adb-467b-a4af-5c857c92d762 2024-03-09 20:54:52 +08:00
Timothyxxx
b0607c4f79 Fix bugs imported by Xiaochuan xs 2024-03-09 19:32:05 +08:00
Jason Lee
2291af394f update google drive file link in json 2024-03-09 18:06:48 +08:00
Jason Lee
6ea3dd856f fix multiapps bug : "26660ad1-6ebb-4f59-8cba-a8432dfe8d38" 2024-03-09 14:03:26 +08:00
Tianbao Xie
f01153cadd Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-03-08 20:45:49 +08:00
tsuky_chen
3761de4a05 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 20:37:40 +08:00
tsuky_chen
4070b41fbd fix multi apps 2024-03-08 20:36:34 +08:00
rhythmcao
365c7798f1 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-08 19:26:04 +08:00
Jason Lee
62fd8feebb xiaochuan's multiapp examples 2024-03-08 19:24:15 +08:00
Timothyxxx
b8d54e8bac Fix a bug in Chrome evaluator 2024-03-08 11:47:14 +08:00
rhythmcao
89f0fc5410 update multi-apps 2024-03-08 00:03:08 +08:00
rhythmcao
d748a77c63 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-06 21:20:43 +08:00
rhythmcao
da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang
459e247736 ver Mar4thv3
some new multi_app configs
2024-03-04 23:26:22 +08:00
rhythmcao
2aff825945 create new refresh token for googledrive 2024-03-02 21:08:53 +08:00
Jason Lee
2c08a02206 fix the error caused by url encoding 2024-02-27 18:37:32 +08:00
Jason Lee
0edbcf404d insure no exception (if failed, return 0) and change 'load' to 'networkidle' 2024-02-26 22:07:08 +08:00
Timothyxxx
a66b36295a Fix examples, and evaluation on Chrome, handle corner cases; Initialize arm support 2024-02-26 12:34:27 +08:00
Tianbao Xie
79da405759 Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-02-26 09:21:50 +08:00
Jason Lee
1ab565b5ab Merge branch 'xiaochuanli/addChromeExtensions' of github.com:xlang-ai/DesktopEnv into xiaochuanli/addChromeExtensions 2024-02-25 23:17:22 +08:00
Jason Lee
ca24d2a649 fix selector bug and determine the path according to arch 2024-02-25 23:15:47 +08:00
Timothyxxx
506c375554 Fix some json typos from Chrome 2024-02-25 03:49:48 +08:00
Tianbao Xie
0a6b5b3f57 Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-02-25 00:45:17 +08:00
Jason Lee
3244098664 finish the rest part of chrome examples and verify them on mac arm64 2024-02-24 21:57:01 +08:00
Timothyxxx
f812436ad3 Update loaded Chrome examples 2024-02-23 14:15:16 +08:00