* Enhance PPTX comparison logic in slides.py - Improved alignment comparison to treat None and LEFT as equivalent. - Added special handling for font bold and italic properties to consider None and False as equivalent. - Introduced a new bullet comparison function that allows for minor differences and tolerates formatting variations. - Updated JSON examples to support multiple file comparisons and results. * fix all fonts json file f23ac * fix clean the shape examination in unrelevatn part-top position check * Refactor JSON structure for PPTX comparison - Updated the instruction formatting for clarity. - Modified the comparison logic to support multiple expected and result files, enhancing flexibility in evaluations. - Changed the function key to an array to accommodate multiple comparison functions. - Introduced a conjunction key to specify logical relationships between comparisons. * fix impress-e4ef0baf by adding all fonts gold file * update impress bf4e9888 task ins * fix impress b8adbc24 font size * Enhance PPTX comparison functionality in slides.py - Introduced a debug logger for detailed output during PPTX comparisons. - Added a new function to recursively retrieve all text shapes, including those within groups. - Enabled debug logging to provide insights on slide and shape comparisons. - Updated JSON examples to support multiple expected and result files for enhanced evaluation flexibility. * Enable debug logging by default in PPTX comparison and enhance debug output for shape mismatches. Updated JSON examples to support multiple expected and result files for improved evaluation consistency. * fix impress all fons compare file * Refactor PPTX comparison logic and JSON examples for height modification tasks - Added critical notes in slides.py to clarify the execution order of shape examination and height modification checks. - Updated JSON examples to support multiple expected and result files, enhancing evaluation consistency. - Ensured that examine_shape must be set to False for examine_modify_height to function correctly, preventing premature termination of comparisons. * Enhance debug logging in PPTX comparison for detailed font attribute mismatches - Added debug logging for differences in font color, bold, italic, and underline attributes during table cell comparisons. - Improved clarity of debug output by including specific slide, shape, and cell indices for mismatches. - Ensured that existing comparison logic remains intact while enhancing debugging capabilities. * Enhance debug logging for font attribute mismatches in PPTX comparison - Added detailed debug logging for font name and size mismatches during PPTX comparisons, including specific slide, shape, and paragraph indices. - Updated JSON examples to support multiple expected and result files, improving evaluation consistency. - Maintained existing comparison logic while enhancing debugging capabilities. * fix impress 3161de json file --------- Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
Evaluation examples
Here we put the data examples to benchmark the ability of agents when interacting with GUI.
The examples are stored in ./examples where each data item formatted as:
{
"id": "uid", # unique id
"snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
"instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
"source": "website_url", # where we know this example, some forum, or some website, or some paper
"config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
# (coming in next project) "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
"related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
"evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example
…
}
The ./trajectories file contains the annotated trajectories for each data item in ./examples for finishing the task.
For now, it is under construction, and only tested on Windows 10. Please:
- Modify the path accordingly to run the evaluation;
- Remind us if some parts are overfit to our environment.