选项平衡后的第一次试跑,约70%正确率

This commit is contained in:
lzy
2025-06-02 17:18:30 +08:00
parent 7a725bc003
commit 3984ec002e
12 changed files with 16742 additions and 0 deletions

View File

@@ -0,0 +1,5 @@
Model,accuracy,precision_micro,recall_micro,f1_micro,precision_macro,recall_macro,f1_macro,Data Count
qwen-max-2025-01-25,0.6446700507614214,0.6336633663366337,0.649746192893401,0.6416040100250626,0.6388760049474336,0.6501020408163265,0.64232342205538,197
gpt-4o,0.5482233502538071,0.5618556701030928,0.5532994923857868,0.5575447570332481,0.5779088050314465,0.5536734693877551,0.5600088997453159,197
deepseek-chat,0.6700507614213198,0.676923076923077,0.6700507614213198,0.673469387755102,0.6899114693446089,0.6705102040816326,0.6754210676562946,197
claude-sonnet-4-20250514,0.700507614213198,0.6934673366834171,0.700507614213198,0.696969696969697,0.7072180484244438,0.7009183673469388,0.69833034513671,197
1 Model accuracy precision_micro recall_micro f1_micro precision_macro recall_macro f1_macro Data Count
2 qwen-max-2025-01-25 0.6446700507614214 0.6336633663366337 0.649746192893401 0.6416040100250626 0.6388760049474336 0.6501020408163265 0.64232342205538 197
3 gpt-4o 0.5482233502538071 0.5618556701030928 0.5532994923857868 0.5575447570332481 0.5779088050314465 0.5536734693877551 0.5600088997453159 197
4 deepseek-chat 0.6700507614213198 0.676923076923077 0.6700507614213198 0.673469387755102 0.6899114693446089 0.6705102040816326 0.6754210676562946 197
5 claude-sonnet-4-20250514 0.700507614213198 0.6934673366834171 0.700507614213198 0.696969696969697 0.7072180484244438 0.7009183673469388 0.69833034513671 197