Commit Graph

25 Commits

Author SHA1 Message Date
Subash Shibu
3167339e45 Add hosted GBOX agent for OSWorld evaluation (#376) 2025-11-13 13:13:31 +08:00
Pengxiang-Li
00b6468eb7 feat/dart_gui (#371) 2025-11-07 21:50:01 +08:00
Xinyuan Wang
24fbad9015 Merge pull request #264 from yuanmengqi/main
Improve the parallel logic
2025-07-17 12:28:48 +08:00
yuanmengqi
fe40011b5d Improve the parallel logic 2025-07-17 04:21:42 +00:00
yuanmengqi
6788c58aa3 Improve the parallel logic 2025-07-17 04:20:59 +00:00
yuanmengqi
bb8b0b2582 Improve the parallel logic 2025-07-17 04:19:44 +00:00
Zilong Zhou
dc164d5269 feat&fix: update configuration management to save model arguments and enhance UI display for model args (#262) 2025-07-16 21:46:35 +08:00
yuanmengqi
cb070307ee merge code 2025-07-15 14:57:14 +00:00
yuanmengqi
90c4e894a4 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-14 07:14:19 +00:00
yuanmengqi
5d90faa548 run operagor 2025-07-14 07:13:17 +00:00
Zilong Zhou
74b7c189af Feat/monitor (#254)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
yuanmengqi
572a94b6df Merge branch 'main' into fix_chrome 2025-07-13 10:16:08 +00:00
yuanmengqi
ea51f5264a fix chrome 2025-06-30 08:07:24 +00:00
yuanmengqi
7315aec6e6 clean code 2025-06-10 04:06:54 +00:00
yuanmengqi
aee1207fff fix error 2025-06-09 04:20:59 +00:00
yuanmengqi
d8872634ee edit prompt 2025-06-08 03:59:31 +00:00
yuanmengqi
c57b1d4e7a eval update 2025-06-07 13:19:22 +00:00
yuanmengqi
64177045b5 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-06 10:22:56 +00:00
yuanmengqi
4ea24ddfd3 add proxy 2025-06-06 09:41:22 +00:00
adlsdztony
2ad48f04d7 feat&fix: update environment configuration for Docker compatibility and enhance result path handling 2025-06-06 02:53:20 +00:00
yuanmengqi
a6300e05c9 Merge remote-tracking branch 'upstream/feat/aws-provider-support' 2025-06-05 13:31:42 +00:00
adlsdztony
80e4ec75de fix&docs: update FLASK_DEBUG setting to false in .env and README 2025-06-04 19:58:47 +08:00
yuanmengqi
b211df3385 fix timeout 2025-06-04 10:23:45 +00:00
yuanmengqi
b87cbe69e5 add monitor 2025-06-02 13:34:20 +00:00
adlsdztony
e48bd6b059 feat: add .env configuration file and update README with configuration details 2025-06-01 07:07:47 +00:00