Commit Graph

1195 Commits

Author SHA1 Message Date
yuanmengqi
150234307e Merge branch 'fix_chrome' 2025-07-17 04:14:47 +00:00
yuanmengqi
9eeabfc52d Improve the parallel logic 2025-07-17 04:14:20 +00:00
yuanmengqi
cb070307ee merge code 2025-07-15 14:57:14 +00:00
yuanmengqi
175b4b46c2 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-15 14:50:48 +00:00
yuanmengqi
7912880d16 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-15 07:24:38 +00:00
yuanmengqi
451bbf5fc2 Update multi_apps JSON examples: refined instructions for image processing in GIMP, replaced an open command with a launch command for VLC, and corrected assignment modification instruction in LibreOffice Calc example. 2025-07-15 07:24:33 +00:00
Yuan Mengqi
af47ed8fb1 fix infeasible&chrome tasks (#258)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

* Merge branch 'fix_chrome'

* fix insensible&chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-15 13:02:42 +08:00
yuanmengqi
68a9f647f4 fix: address https://github.com/xlang-ai/OSWorld/issues/257 by implement fix for PyAutoGUI '<' character bug in command execution. Introduced a new function to handle typewrite and press calls, ensuring correct behavior when using '<' in commands. Updated command execution logic to apply this fix before executing user commands. 2025-07-15 04:17:34 +00:00
yuanmengqi
8e18b7839a fix insensible&chrome tasks 2025-07-15 02:19:27 +00:00
yuanmengqi
1bf0730dce Merge remote-tracking branch 'upstream/main' 2025-07-15 02:14:41 +00:00
yuanmengqi
756ef96850 Merge branch 'fix_chrome' 2025-07-15 02:13:58 +00:00
yuanmengqi
08b4cf2c2f fix infeasible&chome tasks 2025-07-15 02:09:40 +00:00
ChenYXxxx
698483390a "Could you turn my image into CYMK mode?" add "within GIMP" 2025-07-14 23:53:20 +08:00
ChenYXxxx
9242becd87 "Please batch process all images on the desktop by increasing their brightness to 50, instead of adjusting them individually." add "within GIMP" 2025-07-14 23:52:41 +08:00
ChenYXxxx
56b2fe9cc4 Update d16c99dc-2a1e-46f2-b350-d97c86c85c15.json 2025-07-14 23:23:11 +08:00
ChenYXxxx
b481c794c5 Update 72f83cdc-bf76-4531-9a1b-eb893a13f8aa.json 2025-07-14 23:22:50 +08:00
ChenYXxxx
7f973a391c Update f723c744-e62c-4ae6-98d1-750d3cd7d79d.json 2025-07-14 23:22:14 +08:00
shenzhennan
7f96cc0633 Merge branch 'main' of https://github.com/xlang-ai/OSWorld 2025-07-14 12:35:00 +00:00
shenzhennan
53983db9cb fix impress eval : extending sleep time to ensure save 2025-07-14 12:34:43 +00:00
Xinyuan Wang
db83b9cb2c Wxy/opencua (#256)
* OpenCUA Agent code base

* update url

* debug, modify url input
2025-07-14 20:26:39 +08:00
Danyang Zhang
2339db20ca ver Jul7th (#255)
pip-installing directly from PyPI fails misteriously in postconfig
execution, possible owing to proxy configuration in the VM, adjusted
strategy by downloading the wheel on host and pip-installing it locally
on VM in thunderbird/d38192b0-17dc-4e1d-99c3-786d0117de77
2025-07-14 20:26:29 +08:00
shenzhennan
60e26d2d0d fix impress compare use gold file 2025-07-14 11:35:06 +00:00
yuanmengqi
90c4e894a4 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-14 07:14:19 +00:00
yuanmengqi
5d90faa548 run operagor 2025-07-14 07:13:17 +00:00
Zilong Zhou
74b7c189af Feat/monitor (#254)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py

* fix: update logger usage to use global logger and improve error handling

* feat&fix: add configuration management API endpoints and update UI for configuration selection

* feat&fix: update environment configuration, enhance task statistics, and improve UI responsiveness

* feat&fix: add configuration toggle button in UI and improve task loading performance

* feat&fix: add accuracy percentage display to score and style updates for UI
2025-07-14 13:43:41 +08:00
yuanmengqi
0651495d88 fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
2025-07-14 05:43:17 +00:00
yuanmengqi
b8b026f817 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-13 13:16:48 +00:00
yuanmengqi
7c807d4f3e Merge remote-tracking branch 'upstream/main' 2025-07-13 13:12:26 +00:00
Zilong Zhou
349f2fd9fe Feat/claude cua support (#253)
* feat: add claude support

* feat: add script for end-to-end evaluation with logging and task distribution

* feat&fix: add tool result handling and update model default in evaluation script

* chore: remove run_test_env.py script

* feat&fix: implement action parsing for tool calls and update default action space

* fix: update text formatting in action parsing and replace logger import

* feat&fix: implement action parsing for tool calls and add screen size handling

* feat: add setup instructions for Anthropic API integration

* feat: add notice about image size limitations for Anthropic API

* Delete test_env/logger.py

* Delete test_env/utils.py
2025-07-13 21:10:49 +08:00
Yuan Mengqi
38a30734a6 Improve code logic for password & resolution (#252)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

* Improve code logic for password & resolution

* edit

* Merge branch 'main' into fix_chrome

* fix chrome tasks

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 21:04:07 +08:00
yuanmengqi
9f806d425d fix chrome final 2025-07-13 12:46:00 +00:00
yuanmengqi
7279469d23 fix chrome tasks 2025-07-13 12:41:27 +00:00
yuanmengqi
572a94b6df Merge branch 'main' into fix_chrome 2025-07-13 10:16:08 +00:00
yuanmengqi
a16b54c175 edit 2025-07-13 10:14:41 +00:00
yuanmengqi
94ea30cb45 Merge remote-tracking branch 'upstream/main' 2025-07-13 07:05:54 +00:00
yuanmengqi
d3bf4823cb Improve code logic for password & resolution 2025-07-13 07:01:28 +00:00
yuanmengqi
a070ddda7e Improve code logic for password & resolution 2025-07-13 06:59:45 +00:00
yuanmengqi
97ed6f99b0 Final review multi_apps fix the rest part 2025-07-12 20:28:55 +00:00
yuanmengqi
dbecf46057 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-12 16:35:02 +00:00
yuanmengqi
877e75a013 Final review multi_apps fix Xinzhuang part 2025-07-12 16:34:55 +00:00
Yuan Mengqi
27319ce1e3 fix password&resolution (#251)
* fix chrome

* fix: fix proxy setup

* feat&fix: add proxy support in setup and remove hardcoded proxy from example

* fix tasks

* fix chrome finished

* fix

* clean chrome_fix code

* clean chrome_fix code

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix multiapps

* fix chrome 2888b4e6-5b47-4b57-8bf5-c73827890774

* fix some multi_apps tasks

* fix some multi_apps tasks

* fix password&resolution

* fix password&resolution

---------

Co-authored-by: adlsdztony <zzl0712@connect.hku.hk>
2025-07-13 00:25:37 +08:00
yuanmengqi
3b698aa3c0 Merge branch 'fix_chrome' 2025-07-12 15:13:44 +00:00
yuanmengqi
08bbf77511 fix password&resolution 2025-07-12 15:11:42 +00:00
yuanmengqi
fb0c301e14 Merge branch 'fix_chrome' 2025-07-11 12:17:42 +00:00
yuanmengqi
37c56533f0 Merge remote-tracking branch 'upstream/main' into fix_chrome 2025-07-11 12:16:23 +00:00
yuanmengqi
fe3bb2fd92 fix password&resolution 2025-07-11 12:15:03 +00:00
yuanmengqi
6f0382c0c2 Merge branch 'main' of github.com:xlang-ai/OSWorld 2025-07-10 22:35:42 +00:00
yuanmengqi
6897e5320d Enhance image text comparison functionality with detailed logging
- Added logging for OCR results and text matching outcomes in compare_image_text function.
- Updated JSON examples to support multiple expected results and improved structure for evaluator functions.
- Enhanced handling of expected text rules to include multiple variations for better matching accuracy.
2025-07-10 22:32:53 +00:00
st2rb8g
61f265a082 fix some multi_apps tasks (#245)
* fix chrome

* fix some multi_apps tasks.

* fix some multiapps tasks

* fix some multiapps tasks

---------

Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
2025-07-11 06:32:13 +08:00
Yan98
4e3446d6fe Fix Name (#249)
* init

* init
2025-07-11 00:15:46 +08:00