Commit Graph

56 Commits

Author SHA1 Message Date
Danyang Zhang
afd5952e44 ver Oct3rd (#349)
updated a series of instructions to ask the agent not to do any
unnecessary actions.
2025-10-04 00:13:29 +08:00
Danyang Zhang
7364a720a6 Calc eval fix (#297)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

* ver Jul19th

added supports for cellIs and some other new types of conditional
formatting for calc evaluation

* ver Aug4th

updated some instructions

* ver Aug4thv2

fixed a typo

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-08-04 12:39:35 +08:00
Danyang Zhang
53ffc05042 Calc eval fix (#272)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

* ver Jul18th

added two try-excepts to handle possible formula parsing and calculation
failures

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-18 21:28:48 +08:00
yuanmengqi
e433f35c1f feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
  - Set to `true` for 8 files listed in google_chrome.json multi_apps
  - Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
  - Set to "low" for newly added fields
  - Preserve existing values (4 medium, 2 high) for 6 files that already had this field

This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
Danyang Zhang
d4273d992e Calc eval fix (#225)
* ver Jun17th

updating annotations

* ver Jun17th

corrected annotation of 1d17
added check for cell merge

* ver Jun17th

updated several annotations

* ver Jun20th

fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08

* fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations.

* ver Jun21st

updating calc evals

* ver Jun22nd

fixed an impress task

* ver Jun22ndv2

adjusted several calc tasks

* Clean scalfolds

---------

Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk>
Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-06-30 18:23:09 +08:00
yuanmengqi
9fa768d24d refactor: update URLs in multiple JSON files to ensure proper encoding of special characters 2025-06-07 17:26:45 +00:00
Timothyxxx
fb7bafb885 feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without 2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5 feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Timothyxxx
2f0f3f31aa Fix Duplicate ids; Remove unused JSON files across multiple applications 2025-02-10 15:49:54 +08:00
David Chang
2b9772174e ver Mar15th
fixed bugs about infeasible task evaluation
2024-03-15 12:25:41 +08:00
Timothyxxx
f9ccaa5773 Move sheetcopilot examples into libreoffice calc folder 2024-03-14 12:57:15 +08:00
Timothyxxx
8d69eec68f Update infeasible examples from Chrome and Calc 2024-02-14 16:51:07 +08:00
tsuky_chen
62f50cdc26 Update 7a4e4bc8-922c-4c84-865c-25ba34136be1.json 2024-02-01 16:10:47 +08:00
tsuky_chen
ee851aeb54 Update 0cecd4f3-74de-457b-ba94-29ad6b5dafb6.json 2024-02-01 16:09:24 +08:00
David Chang
4897211a46 ver Jan31stv6
finished calc human evaluation
updated calc configs with an extra sleep to guarantee the integrity of
downloaded xlsx file
2024-01-31 22:55:47 +08:00
David Chang
14dbc708a4 ver Jan30thv2
debugged on windows platform with new _create_pywinauto_node function
migrated example task from calc to excel
2024-01-30 21:09:53 +08:00
Timothyxxx
37e09a994e Fix some errors found in impress and thunderbird examples 2024-01-29 13:23:06 +08:00
Timothyxxx
cc21c3a6b1 Fix some errors found in calc examples 2024-01-28 21:19:18 +08:00
Timothyxxx
353ab6607d Fix some errors found in thunderbird examples 2024-01-28 16:51:38 +08:00
Timothyxxx
be17bd3307 Fix some errors found in thunderbird examples 2024-01-28 15:35:31 +08:00
Timothyxxx
c875cad3e5 Fix some errors found in thunderbird examples 2024-01-28 15:32:14 +08:00
David Chang
8025bf19f0 ver Jan27th
corrected usage of pyautogui in calc postconfig
2024-01-27 19:46:06 +08:00
Timothyxxx
63852755d2 Make up postconfig for libreoffice writer examples 2024-01-27 11:40:05 +08:00
David Chang
342440929b ver Jan26thv2
replaced the file of calc/0cecd4f3 with a more complicated one from
39aa4e37
2024-01-26 17:27:29 +08:00
David Chang
0d05add432 ver Jan26th
fixed path of trajectory in cacl/39aa4e37
2024-01-26 12:46:43 +08:00
tsuky_chen
932b73c67d load libreoffice writer eval -batch 2 2024-01-26 02:15:42 +08:00
tsuky_chen
3e7cfa8699 load libreoffice writer eval -batch 2 2024-01-26 02:07:26 +08:00
David Chang
fbe26e2311 ver Jan23rdv2
added read_cell_value function to load the real value of an exact excel
  cell exactly by the coordinate
2024-01-23 23:57:00 +08:00
David Chang
93229ce98c ver Jan22ndv3
updated style metric to compare_table
2024-01-22 23:45:15 +08:00
David Chang
c97f43ce95 ver Jan22ndv2
fixed a bug for checking data validation in excel
2024-01-22 15:21:16 +08:00
David Chang
7a85c76369 ver Jan22nd
updated all the existing calc configs
2024-01-22 12:42:50 +08:00
David Chang
552491f765 ver Jan21stv2
fixed bugs
updated parts of configs
2024-01-21 23:55:04 +08:00
David Chang
a97c865c0c ver Jan18th
completed all the incomplete tasks stored under libreoffice_calc before
added metric check_data_validations
2024-01-18 17:54:53 +08:00
David Chang
19214f2107 ver Jan17thv2
updated compare_table with compare the shown value through exported csv
2024-01-17 22:43:26 +08:00
David Chang
ffc4c32bac ver Jan17th
updated the existing task configs
2024-01-17 17:27:08 +08:00
David Chang
5e2a03720d ver Jan10thv4
updated /home/david to /home/user
2024-01-10 22:33:33 +08:00
David Chang
6e6ef03bc9 ver Jan2nd
calc metrics are prapared by and large
2024-01-02 21:03:57 +08:00
David Chang
d41c674a91 Merge branch 'main' into zdy 2023-12-31 14:37:01 +08:00
David Chang
19b99a13e2 Merge branch 'zdy' 2023-12-30 20:53:54 +08:00
tsuky_chen
24f33dc9bf add eval libreoffice writer font & page break 2023-12-30 16:32:15 +08:00
David Chang
aaca06ba40 ver Dec29thv4
updated check_zoom
2023-12-29 22:46:32 +08:00
David Chang
f73f6e1d4f ver Dec29thv3
updated links
2023-12-29 21:56:52 +08:00
David Chang
6f225b2a02 ver Dec29thv2
re-organized functions w.r.t. comparing xlsx with a golden one
2023-12-29 21:43:33 +08:00
David Chang
e4fac09945 ver Dec29th
metric compare_with_formats
2023-12-29 21:19:52 +08:00
David Chang
5a14cf40db Merge branch 'main' into zdy 2023-12-28 21:20:57 +08:00
David Chang
2a9e5cc373 ver Dec27th
merged zdy into main
2023-12-27 20:40:23 +08:00
David Chang
7320f0aec4 ver Dec27thv3
added chart property of bar direction
2023-12-27 18:00:16 +08:00
David Chang
4e5920264a ver Dec27thv2
updated a task config
updated documents
fixed the options feature of evaluator
updated with new properties of charts
current load_charts should be ok, I think
2023-12-27 17:51:41 +08:00
David Chang
50b82167d0 Merge branch 'zdy' 2023-12-26 21:06:39 +08:00
David Chang
fe0a59583a ver Dec26thv2
implemented _load_charts and compare_with_charts according to codes in
openpyxl
2023-12-26 20:59:19 +08:00