Commit Graph

48 Commits

Author SHA1 Message Date
yuanmengqi
0651495d88 fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
yuanmengqi
6f0382c0c2 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-10 22:35:42 +00:00
yuanmengqi
6897e5320d Enhance image text comparison functionality with detailed logging
- Added logging for OCR results and text matching outcomes in compare_image_text function.
- Updated JSON examples to support multiple expected results and improved structure for evaluator functions.
- Enhanced handling of expected text rules to include multiple variations for better matching accuracy.
2025-07-10 22:32:53 +00:00
st2rb8g
61f265a082 fix some multi_apps tasks (#245)
* fix chrome

* fix some multi_apps tasks.

* fix some multiapps tasks

* fix some multiapps tasks

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-11 06:32:13 +08:00
yuanmengqi
a1891f7d88 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-06 07:52:42 +00:00
yuanmengqi
9be6fcd688 Check and fix on Chrome tasks
- Added `pytz` dependency to `requirements.txt` for timezone handling.
- Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility.
- Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability.
- Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context.
- Removed deprecated execution commands from JSON examples to streamline the evaluation process.
2025-07-06 07:52:37 +00:00
zdy023
690f6ed6e7 ver Jul4th
fixed check_accessibility_tree function, updated the namespace
definitons according the values defined in server/main.py
2025-07-04 23:20:51 +08:00
yuanmengqi
7b2120c843 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-03 13:50:35 +00:00
yuanmengqi
cb4bed20a0 Refactor compare_python_pure_text function for improved normalization and error handling. Update JSON example to clarify instruction for extracting Python code from Colab, changing output file names for consistency. 2025-07-03 13:50:21 +00:00
Yuan Mengqi
b2fb8b4222 fix chrome tasks (#230)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-03 21:32:41 +08:00
Xubin Ren
1d10514125 Fix Search Engine Detection Discrepancy in Chrome Evaluation (#172)
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json

* Update __init__.py

* Update general.py
2025-04-10 17:24:50 +08:00
Pierre Carrier
924e0fcd17 metrics: fix time regex (#81) 2024-10-24 22:45:42 +08:00
Timothyxxx
fad621093f Fix one bug in Chrome getter 2024-04-01 15:05:48 +08:00
Timothyxxx
b4cb64d861 Fix bugs in multiple examples 2024-03-11 00:26:59 +08:00
Timothyxxx
b3d27f6387 Fix bugs in multiple examples 2024-03-10 23:52:29 +08:00
Jason Lee
775cef744f xiaochuan correct his bugs in multiapp examples, you can try it again now 2024-03-10 14:48:56 +08:00
Jason Lee
2291af394f update google drive file link in json 2024-03-09 18:06:48 +08:00
Timothyxxx
1e0a78a453 Add none file handling for general 2024-03-09 00:30:28 +08:00
Tianbao Xie
f01153cadd Merge branch 'main' into xiaochuanli/addChromeExtensions 2024-03-08 20:45:49 +08:00
Jason Lee
62fd8feebb xiaochuan's multiapp examples 2024-03-08 19:24:15 +08:00
David Chang
0e6ceeb168 Merge branch 'zdy' 2024-03-07 16:54:50 +08:00
David Chang
d6cd0936b3 ver Mar7th
updated instructions and set-up configs
2024-03-07 16:54:06 +08:00
rhythmcao
d748a77c63 Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv 2024-03-06 21:20:43 +08:00
rhythmcao
da0dafc32c add multi-apps 5 examples by ruisheng 2024-03-06 2024-03-06 21:20:26 +08:00
David Chang
459e247736 ver Mar4thv3
some new multi_app configs
2024-03-04 23:26:22 +08:00
David Chang
33ace6937b ver Feb28th
a new multi app task --- init a web extension project with web tool
2024-02-28 22:35:04 +08:00
Timothyxxx
a66b36295a Fix examples, and evaluation on Chrome, handle corner cases; Initialize arm support 2024-02-26 12:34:27 +08:00
Jason Lee
3244098664 finish the rest part of chrome examples and verify them on mac arm64 2024-02-24 21:57:01 +08:00
Jason Lee
17cd897780 add new examples for chrome 2024-02-18 22:11:16 +08:00
Timothyxxx
d5d9fc56de Fix minor bugs of get_terminal output caused by a11y tree depth 2024-01-30 18:48:00 +08:00
Timothyxxx
b9ae4174b1 Fix OS examples annotated by Yitao 2024-01-25 19:57:32 +08:00
Liu Yitao
93b4ff7d95 Update OS evals 2024-01-25 10:45:51 +08:00
rhythmcao
91824f754c 1. extend evaluator to list (compatible with single evaluator) 2. fix a variable name error in metrics/general.py 2024-01-18 14:12:54 +08:00
Timothyxxx
8efa692951 Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs 2024-01-16 11:58:23 +08:00
David Chang
00922923ee ver Jan15thv2
thunderbird example w.r.t. unified folder
2024-01-15 15:56:01 +08:00
David Chang
d4192d3d9c ver Jan12thv3
debugged
2024-01-13 00:06:11 +08:00
David Chang
e08df57129 ver Jan12thv2
sqlite3 metric
2024-01-12 23:07:00 +08:00
David Chang
127a101994 Merge branch 'main' into zdy 2024-01-11 23:02:00 +08:00
David Chang
3c04872dcf ver Jan11thv2
a new Thunderbird example w.r.t. email filter
2024-01-11 22:43:38 +08:00
Timothyxxx
820579a5a2 Make up missing getters and metrics; Update VLC scripts; Start to work on Chrome, update examples instructions 2024-01-11 21:27:40 +08:00
David Chang
27eaf2f5d5 ver Jan11th
finally set up a simple task, or which should be simple
2024-01-11 20:03:33 +08:00
David Chang
1515b05666 ver Jan10thv2
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
David Chang
cf5d480f44 ver Jan10th
new Thunderbird task config
2024-01-10 17:36:59 +08:00
David Chang
df8be17394 ver Jan8th
trying to going on setting up thunderbird, but nothing done by now
2024-01-08 23:15:21 +08:00
David Chang
eeb8a120d6 ver Jan5th
debugged
2024-01-05 15:20:47 +08:00
David Chang
5fedf5b891 ver Jan4th
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx
03e99a68fb Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest 2024-01-02 17:50:05 +08:00
Timothyxxx
86ce9e1497 Initialize getters for Chrome software and general ones; Fix some examples for chrome 2023-12-29 22:24:45 +08:00