ChenYXxxx
873f8a0359
Update 10a730d5-d414-4b40-b479-684bed1ae522.json
...
change the ight 2 the night
2025-07-24 15:44:52 +08:00
yuanmengqi
e433f35c1f
feat: standardize configuration fields across all evaluation examples
...
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
2025-07-16 13:45:34 +00:00
Danyang Zhang
2339db20ca
ver Jul7th ( #255 )
...
pip-installing directly from PyPI fails misteriously in postconfig
execution, possible owing to proxy configuration in the VM, adjusted
strategy by downloading the wheel on host and pip-installing it locally
on VM in thunderbird/d38192b0-17dc-4e1d-99c3-786d0117de77
2025-07-14 20:26:29 +08:00
Danyang Zhang
adc9ad88c2
Thunderbird eval fix ( #233 )
...
* ver Jul2nd
updated task requiring set up new email account
* ver Jul3rd
fixed several tasks
2025-07-03 21:55:55 +08:00
yuanmengqi
9fa768d24d
refactor: update URLs in multiple JSON files to ensure proper encoding of special characters
2025-06-07 17:26:45 +00:00
Timothyxxx
fb7bafb885
feat: Add proxy configuration to all 369 evaluation examples - 55 with proxy, 314 without
2025-06-05 18:46:53 +08:00
Timothyxxx
34748567a5
feat: Migrate OSWorld files to HuggingFace cache with comprehensive documentation
...
- Add detailed README for file cache repository
- Implement migration script with retry logic and browser simulation
- Support automatic file type detection and deduplication
- Ensure reliable hosting for OSWorld evaluation files
2025-05-28 04:29:37 +08:00
Danyang Zhang
7bf99cb823
Update 15c3b339-88f7-4a86-ab16-e71c58dcb01e.json
2025-05-06 16:29:35 +08:00
Danyang Zhang
e4097783bb
Update dfac9ee8-9bc4-4cdc-b465-4a4bfcd2f397.json
2025-05-06 16:28:52 +08:00
Thomas Kuntz
af993b3a3d
fix: Broken profile path in 3 Thunderbird tasks
2025-05-04 14:03:06 +02:00
Timothyxxx
13127de01e
Fix id
2025-03-03 18:26:32 +08:00
Timothyxxx
2f0f3f31aa
Fix Duplicate ids; Remove unused JSON files across multiple applications
2025-02-10 15:49:54 +08:00
Timothyxxx
25e808cc91
Fix known errors found from feedback (DBUS problems, pulseaudio start, one vlc example with error. typos)
2024-05-18 04:49:29 +08:00
David Chang
74e400783a
ver May16th
...
updated a task config of thunderbird
2024-05-16 10:58:31 +08:00
David Chang
459e247736
ver Mar4thv3
...
some new multi_app configs
2024-03-04 23:26:22 +08:00
Timothyxxx
e1cf8da4e0
Fix the infeasible examples support
2024-02-21 21:22:12 +08:00
Timothyxxx
1184cffd5f
Update infeasible of libreoffice writer, vlc and thunderbird
2024-02-15 15:31:58 +08:00
David Chang
5d436a6b66
ver Feb1st
...
human evaluation and SoM experiments on Thunderbird
2024-02-01 11:38:46 +08:00
David Chang
a1e02c6d57
ver Jan31stv7
...
fixed an error in thunderbird evaluation
2024-02-01 00:17:41 +08:00
Timothyxxx
a9cb9dcf79
Fix some errors found in thunderbird examples
2024-01-27 21:59:18 +08:00
David Chang
fbe26e2311
ver Jan23rdv2
...
added read_cell_value function to load the real value of an exact excel
cell exactly by the coordinate
2024-01-23 23:57:00 +08:00
David Chang
ffc4c32bac
ver Jan17th
...
updated the existing task configs
2024-01-17 17:27:08 +08:00
David Chang
42afb708c5
ver Jan16thv2
...
remained thunderbird examples
2024-01-16 22:45:52 +08:00
David Chang
4c87e11864
ver Jan15thv3
...
scheduled closing?
2024-01-15 16:46:34 +08:00
David Chang
00922923ee
ver Jan15thv2
...
thunderbird example w.r.t. unified folder
2024-01-15 15:56:01 +08:00
David Chang
b9d8e6c631
ver Jan15th
...
attachment task of thunderbird
2024-01-15 11:49:43 +08:00
David Chang
59fdd9f1a2
ver Jan14th
...
setup method for Thunderbird composing tasks
2024-01-14 23:16:54 +08:00
David Chang
e08df57129
ver Jan12thv2
...
sqlite3 metric
2024-01-12 23:07:00 +08:00
David Chang
5160619783
ver Jan12th
...
quickly fixed two thunderbird examples
2024-01-12 12:19:23 +08:00
David Chang
3c04872dcf
ver Jan11thv2
...
a new Thunderbird example w.r.t. email filter
2024-01-11 22:43:38 +08:00
David Chang
27eaf2f5d5
ver Jan11th
...
finally set up a simple task, or which should be simple
2024-01-11 20:03:33 +08:00
David Chang
1515b05666
ver Jan10thv2
...
a new example config for Thunderbird
fixed several bugs
2024-01-10 21:58:29 +08:00
David Chang
cf5d480f44
ver Jan10th
...
new Thunderbird task config
2024-01-10 17:36:59 +08:00
David Chang
eeb8a120d6
ver Jan5th
...
debugged
2024-01-05 15:20:47 +08:00
David Chang
5fedf5b891
ver Jan4th
...
updated interfaces for thunderbird evaluation, not tested
2024-01-04 22:41:57 +08:00
Timothyxxx
5257747133
thunderbird and vscode initialization
2023-12-25 01:11:27 +08:00