Files
MatBench/results/20250602_1706/summary.csv

6 lines
706 B
CSV

Model,accuracy,precision_micro,recall_micro,f1_micro,precision_macro,recall_macro,f1_macro,Data Count
qwen-max-2025-01-25,0.6446700507614214,0.6336633663366337,0.649746192893401,0.6416040100250626,0.6388760049474336,0.6501020408163265,0.64232342205538,197
gpt-4o,0.5482233502538071,0.5618556701030928,0.5532994923857868,0.5575447570332481,0.5779088050314465,0.5536734693877551,0.5600088997453159,197
deepseek-chat,0.6700507614213198,0.676923076923077,0.6700507614213198,0.673469387755102,0.6899114693446089,0.6705102040816326,0.6754210676562946,197
claude-sonnet-4-20250514,0.700507614213198,0.6934673366834171,0.700507614213198,0.696969696969697,0.7072180484244438,0.7009183673469388,0.69833034513671,197