yuanmengqi
0651495d88
fix: Enhance error handling and logging across multiple evaluators
...
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
yuanmengqi
6f0382c0c2
Merge branch 'main' of github.com:xlang-ai/OSWorld
2025-07-10 22:35:42 +00:00
yuanmengqi
6897e5320d
Enhance image text comparison functionality with detailed logging
...
- Added logging for OCR results and text matching outcomes in compare_image_text function.
- Updated JSON examples to support multiple expected results and improved structure for evaluator functions.
- Enhanced handling of expected text rules to include multiple variations for better matching accuracy.
2025-07-10 22:32:53 +00:00
st2rb8g
61f265a082
fix some multi_apps tasks ( #245 )
...
* fix chrome
* fix some multi_apps tasks.
* fix some multiapps tasks
* fix some multiapps tasks
---------
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn >
2025-07-11 06:32:13 +08:00
yuanmengqi
a1891f7d88
Merge branch 'main' of github.com:xlang-ai/OSWorld
2025-07-06 07:52:42 +00:00
yuanmengqi
9be6fcd688
Check and fix on Chrome tasks
...
- Added `pytz` dependency to `requirements.txt` for timezone handling.
- Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility.
- Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability.
- Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context.
- Removed deprecated execution commands from JSON examples to streamline the evaluation process.
2025-07-06 07:52:37 +00:00
zdy023
690f6ed6e7
ver Jul4th
...
fixed check_accessibility_tree function, updated the namespace
definitons according the values defined in server/main.py
2025-07-04 23:20:51 +08:00
yuanmengqi
7b2120c843
Merge branch 'main' of github.com:xlang-ai/OSWorld
2025-07-03 13:50:35 +00:00
yuanmengqi
cb4bed20a0
Refactor compare_python_pure_text function for improved normalization and error handling. Update JSON example to clarify instruction for extracting Python code from Colab, changing output file names for consistency.
2025-07-03 13:50:21 +00:00
Yuan Mengqi
b2fb8b4222
fix chrome tasks ( #230 )
...
* fix chrome
* fix: fix proxy setup
* feat&fix: add proxy support in setup and remove hardcoded proxy from example
* fix tasks
* fix chrome finished
* fix
* clean chrome_fix code
* clean chrome_fix code
---------
Co-authored-by: adlsdztony <zzl0712@connect.hku.hk >
2025-07-03 21:32:41 +08:00
Xubin Ren
1d10514125
Fix Search Engine Detection Discrepancy in Chrome Evaluation ( #172 )
...
* Update bb5e4c0d-f964-439c-97b6-bdb9747de3f4.json
* Update __init__.py
* Update general.py
2025-04-10 17:24:50 +08:00
Pierre Carrier
924e0fcd17
metrics: fix time regex ( #81 )
2024-10-24 22:45:42 +08:00
Timothyxxx
fad621093f
Fix one bug in Chrome getter
2024-04-01 15:05:48 +08:00
Timothyxxx
b4cb64d861
Fix bugs in multiple examples
2024-03-11 00:26:59 +08:00
Timothyxxx
b3d27f6387
Fix bugs in multiple examples
2024-03-10 23:52:29 +08:00
Jason Lee
775cef744f
xiaochuan correct his bugs in multiapp examples, you can try it again now
2024-03-10 14:48:56 +08:00
Jason Lee
2291af394f
update google drive file link in json
2024-03-09 18:06:48 +08:00
Timothyxxx
1e0a78a453
Add none file handling for general
2024-03-09 00:30:28 +08:00
Tianbao Xie
f01153cadd
Merge branch 'main' into xiaochuanli/addChromeExtensions
2024-03-08 20:45:49 +08:00
Jason Lee
62fd8feebb
xiaochuan's multiapp examples
2024-03-08 19:24:15 +08:00
David Chang
0e6ceeb168
Merge branch 'zdy'
2024-03-07 16:54:50 +08:00
David Chang
d6cd0936b3
ver Mar7th
...
updated instructions and set-up configs
2024-03-07 16:54:06 +08:00
rhythmcao
d748a77c63
Merge branch 'main' of https://github.com/xlang-ai/DesktopEnv
2024-03-06 21:20:43 +08:00
rhythmcao
da0dafc32c
add multi-apps 5 examples by ruisheng 2024-03-06
2024-03-06 21:20:26 +08:00
David Chang
459e247736
ver Mar4thv3
...
some new multi_app configs
2024-03-04 23:26:22 +08:00
David Chang
33ace6937b
ver Feb28th
...
a new multi app task --- init a web extension project with web tool
2024-02-28 22:35:04 +08:00
Timothyxxx
a66b36295a
Fix examples, and evaluation on Chrome, handle corner cases; Initialize arm support
2024-02-26 12:34:27 +08:00
Jason Lee
3244098664
finish the rest part of chrome examples and verify them on mac arm64
2024-02-24 21:57:01 +08:00
Jason Lee
17cd897780
add new examples for chrome
2024-02-18 22:11:16 +08:00
Timothyxxx
d5d9fc56de
Fix minor bugs of get_terminal output caused by a11y tree depth
2024-01-30 18:48:00 +08:00
Timothyxxx
b9ae4174b1
Fix OS examples annotated by Yitao
2024-01-25 19:57:32 +08:00
Liu Yitao
93b4ff7d95
Update OS evals
2024-01-25 10:45:51 +08:00
rhythmcao
91824f754c
1. extend evaluator to list (compatible with single evaluator) 2. fix a variable name error in metrics/general.py
2024-01-18 14:12:54 +08:00
Timothyxxx
8efa692951
Add raw accessibility-tree based prompting method (but the tokens are too large); Minor fix some small bugs
2024-01-16 11:58:23 +08:00
David Chang
00922923ee
ver Jan15thv2
...
thunderbird example w.r.t. unified folder
2024-01-15 15:56:01 +08:00
David Chang
d4192d3d9c
ver Jan12thv3
...
debugged
2024-01-13 00:06:11 +08:00
David Chang
e08df57129
ver Jan12thv2
...
sqlite3 metric
2024-01-12 23:07:00 +08:00
David Chang
127a101994
Merge branch 'main' into zdy
2024-01-11 23:02:00 +08:00
David Chang
3c04872dcf
ver Jan11thv2
...
a new Thunderbird example w.r.t. email filter
2024-01-11 22:43:38 +08:00
Timothyxxx
820579a5a2
Make up missing getters and metrics; Update VLC scripts; Start to work on Chrome, update examples instructions
2024-01-11 21:27:40 +08:00
David Chang
27eaf2f5d5
ver Jan11th
...
finally set up a simple task, or which should be simple
2024-01-11 20:03:33 +08:00
David Chang
1515b05666
ver Jan10thv2
...
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
David Chang
cf5d480f44
ver Jan10th
...
new Thunderbird task config
2024-01-10 17:36:59 +08:00
David Chang
df8be17394
ver Jan8th
...
trying to going on setting up thunderbird, but nothing done by now
2024-01-08 23:15:21 +08:00
David Chang
eeb8a120d6
ver Jan5th
...
debugged
2024-01-05 15:20:47 +08:00
David Chang
5fedf5b891
ver Jan4th
...
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx
03e99a68fb
Loading libreoffice writer examples and find few problems, will do another round tomorrow for the rest
2024-01-02 17:50:05 +08:00
Timothyxxx
86ce9e1497
Initialize getters for Chrome software and general ones; Fix some examples for chrome
2023-12-29 22:24:45 +08:00