生成选项采用上采样的方式,采样6次并让模型进行回答;将早停的认为困难,全部采样都回答正确的认为简单。基于此构造新的stepy

This commit is contained in:
lzy
2025-06-02 16:19:18 +08:00
parent d219b9b0c0
commit abeacaac3e
8 changed files with 169413 additions and 11331 deletions

View File

@@ -2,7 +2,7 @@
api:
key: "sk-oYh3Xrhg8oDY2gW02c966f31C84449Ad86F9Cd9dF6E64a8d"
base_url: "https://vip.apiyi.com/v1"
temperature: -1 # 默认使用模型的温度设置
temperature: 0 # 默认使用模型的温度设置
max_retries: 10
# 支持多个模型
models:
@@ -10,7 +10,6 @@ api:
- "gpt-4o"
- "deepseek-chat"
- "claude-sonnet-4-20250514"
- "deepseek-r1"
# 或者使用单个模型(向后兼容)
# model: "qwen-max-2025-01-25"
@@ -20,7 +19,8 @@ system_prompt: None
evaluation:
max_workers: 20
# input_file: "/home/ubuntu/50T/LYT/MatBench/layer1/ALL-merge/merged.json"
input_file: "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepz_final_choice_questions.json"
# input_file: "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepz_final_choice_questions.json"
input_file: "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepz_final_choice_questions_filtered.json"
# 输出配置
output:
base_dir: "results"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,59 @@
=== 采样结果统计 ===
题目标记分布:
hard_early_stop: 1494 道 (44.7%)
easy_all_correct: 1807 道 (54.1%)
unknown_fallback: 42 道 (1.3%)
关键指标:
早停困难题(答错后早停): 1494 道
全正确简单题(所有采样都答对): 1807 道
早停率: 44.7%
全正确率: 54.1%
=== API调用统计 ===
总生成调用: 13850
总验证调用: 13850
总API调用: 27700
平均每题调用: 8.3
早停题目平均采样次数: 2.0
全正确题目平均采样次数: 6.0
=== 各题型采样效果 ===
short_answer:
早停率: 36.9% (721/1954)
全正确率: 62.4% (1219/1954)
multiple_choice:
早停率: 58.8% (154/262)
全正确率: 39.3% (103/262)
calculation:
早停率: 66.0% (578/876)
全正确率: 31.4% (275/876)
true_false:
早停率: 16.3% (41/251)
全正确率: 83.7% (210/251)
=== 生成成功率统计 ===
总共处理: 3343 道题目
成功生成: 3343 道
使用备用方案: 0 道
成功率: 100.00%
=== 策略效果评估 ===
✅ 早停策略有效:成功识别出困难题目
困难题目数量: 1494 道
早停题目示例:
1. short_answer题第1次采样后早停
2. short_answer题第1次采样后早停
3. short_answer题第3次采样后早停
✅ 全采样策略有效:识别出简单题目
简单题目数量: 1807 道
全正确题目示例:
1. short_answer题6次采样全部答对
2. short_answer题6次采样全部答对
3. true_false题6次采样全部答对
=== 优化建议 ===
• API调用次数偏高建议:
- 优化提示词提高首次生成质量
- 考虑减少最大采样次数
- 改进验证逻辑减少失败率

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,819 @@
[
{
"question": "What are the mechanical states of high polymers with larger molecular weight that are not completely crystalline?",
"choices": {
"text": [
"Glassy state",
"High elastic state",
"Viscoelastic transition state",
"Crystalline plateau state"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A photomicrograph was taken of a specimen at a magnification of 100 ×, and it was determined that the average number of grains per square inch was 200. What is this specimen's ASTM grain size number?",
"choices": {
"text": [
"4.9",
"5.2",
"3.8",
"6.1"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which of the following methods can improve the corrosion resistance of steel?",
"choices": {
"text": [
"Increasing the carbon content to form more protective cementite layers",
"Introducing controlled microstructural heterogeneity to disrupt corrosion pathways",
"Applying compressive surface stresses through shot peening",
"Reducing grain size to below 10nm to minimize galvanic potential differences"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]C[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A borosilicate glass used for sealing lighting lamps has an annealing point of 544°C, a softening point of 780°C, and a viscous flow activation energy of 373.13 kJ/mol. What is its working temperature range?",
"choices": {
"text": [
"760.6°C~1038.9°C",
"544°C~780°C",
"373°C~544°C",
"780°C~1200°C"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "The dislocation that cannot undergo climb motion is ().",
"choices": {
"text": [
"Shockley partial dislocation",
"Frank partial dislocation",
"Edge full dislocation",
"Screw full dislocation"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the meaning of the symbol Ca_{i}^{* *}?",
"choices": {
"text": [
"Ca2+ is located at the interstitial site of the lattice",
"A calcium vacancy with double positive effective charge",
"An excited state of calcium atom in the lattice",
"A calcium interstitial with double negative effective charge"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "For an alloy that has solidified with microscopic non-equilibrium segregation, which of the following measures can be taken to accelerate diffusion and homogenize the alloy?",
"choices": {
"text": [
"Heating to just below the solidus temperature followed by slow cooling",
"Applying high-frequency ultrasonic vibration during solidification",
"Introducing nano-sized precipitates to pin dislocations",
"Rapid quenching to room temperature to freeze the microstructure"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "To which group in the periodic table would an element with atomic number 112 belong?",
"choices": {
"text": [
"IIB",
"VIII",
"IIA",
"Transition metals (general classification)"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Why is the tensile strength of ceramics often much lower than the theoretical strength?",
"choices": {
"text": [
"Due to the splitting effect of pores and stress concentration during tension",
"Because ceramics lack dislocations that enable plastic deformation in metals",
"Due to their high elastic modulus limiting atomic bond stretching",
"Because grain boundaries act as preferential crack propagation paths"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A force of 20,000 N will cause a 1 cm × 1 cm bar of magnesium to stretch from 10 cm to 10.045 cm. The modulus of elasticity is:",
"choices": {
"text": [
"6.44 × 10^6 psi",
"44.7 GPa",
"1.02 × 10^7 psi",
"25.5 GPa"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Gallium has an orthorhombic structure, with a0=0.45258 nm, b0=0.45186 nm, and c0=0.76570 nm. The atomic radius is 0.1218 nm. The density is 5.904 g/cm3 and the atomic weight is 69.72 g/mol. Determine the packing factor in the unit cell.",
"choices": {
"text": [
"0.387",
"0.412",
"0.354",
"0.426"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A simple cubic crystal has a screw dislocation with $b=[001]$ on the (100) plane. An edge dislocation with $b=[010]$ on the (001) plane intersects with it. After the intersection, what forms on the two dislocations?",
"choices": {
"text": [
"A kink forms on the screw dislocation and a jog forms on the edge dislocation",
"A jog forms on both dislocations",
"A kink forms on both dislocations",
"No defects form as the Burgers vectors are perpendicular"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "According to the relationship between grain diameter and annealing time d²=kt, given that the grain diameter is 23μm after annealing for 30 minutes, the value of the constant k is:",
"choices": {
"text": [
"17.6μm²/min",
"35.3μm²/min",
"8.8μm²/min",
"529μm²/min"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "According to solidification theory, what is the fundamental principle of grain refinement by vibration and stirring?",
"choices": {
"text": [
"Vibration increases nucleation rate by lowering the activation energy barrier for nucleation",
"Stirring primarily works by reducing the temperature gradient in the melt",
"Both methods increase undercooling by enhancing heat transfer at the mold wall",
"The main effect is breaking dendrite arms to create more nucleation sites"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Seawater, which covers the majority of the earth, is composed primarily of molecules of H_{2} \\mathrm{O} and equal numbers of \\mathrm{Na}^{+}ions and \\mathrm{Cl}^{-}ions. Suppose we have a thoroughly mixed solution (containing these species only) at 25^{\\circ} C. How many components and how many phases are in such a system?",
"choices": {
"text": [
"1 component, 1 phase",
"2 components, 1 phase",
"3 components, 1 phase",
"2 components, 2 phases"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]B[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which of the following describes the first type of temper brittleness?",
"choices": {
"text": [
"The brittleness that occurs during tempering between 250~400°C is called low-temperature temper brittleness",
"The brittleness caused by phosphorus segregation at grain boundaries during slow cooling from 500~600°C",
"The embrittlement phenomenon resulting from carbide precipitation at 150~250°C",
"The loss of toughness due to retained austenite decomposition at 300~450°C"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "If a crystal has a high density of line defects (dislocations) or planar defects (grain boundaries, twin boundaries, etc.), its strength will significantly increase. What are these phenomena called?",
"choices": {
"text": [
"Strain hardening and grain boundary strengthening",
"Work hardening and Hall-Petch effect",
"Dislocation pinning and Zener drag",
"Peierls-Nabarro stress and Taylor hardening"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the light transmission characteristic of porous electrical insulators?",
"choices": {
"text": [
"Opaque",
"Transparent",
"Translucent",
"Variable depending on pore size"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "The formula for samarium iron garnet (Sm3Fe5O12) may be written in the form Sm3^aFe2^cFe3^dO12, where the superscripts a, c, and d represent different sites on which the Sm^3+ and Fe^3+ ions are located. The spin magnetic moments for the Sm^3+ and Fe^3+ ions positioned in a and c sites are oriented parallel to one another and antiparallel to the Fe^3+ ions in d sites. Compute the number of Bohr magnetons associated with each Sm^3+ ion, given the following information: (1) each unit cell consists of eight formula (Sm3Fe5O12) units; (2) the unit cell is cubic with an edge length of 1.2529 nm; (3) the saturation magnetization for this material is 1.35 × 10^5 A/m; and (4) there are 5 Bohr magnetons associated with each Fe^3+ ion.",
"choices": {
"text": [
"2.86 bm",
"1.43 bm",
"5.72 bm",
"0.71 bm"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "When a spherical embryo with radius r appears in an undercooled liquid, what is the critical nucleus radius?",
"choices": {
"text": [
"-2σ/ΔGv",
"2σ/ΔGv",
"σ/ΔGv",
"-σ/ΔGv"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What material is suitable for steam turbine blades?",
"choices": {
"text": [
"Ti-6Al-4V (Grade 5 titanium alloy)",
"2Cr13",
"Inconel 718 with 15% ceramic particulate reinforcement",
"Single-crystal CMSX-4 superalloy"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]B[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which of the following are the main factors affecting the recrystallization temperature of metals after cold deformation?",
"choices": {
"text": [
"Degree of deformation and initial grain size",
"Crystal structure type and elastic modulus",
"Melting point and thermal expansion coefficient",
"Dislocation density and stacking fault energy"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the approximate temperature at which it is desirable to heat a 0.85 wt% C iron-carbon alloy during a full anneal heat treatment?",
"choices": {
"text": [
"About 777°C (1430°F)",
"About 727°C (1340°F)",
"About 850°C (1562°F)",
"About 912°C (1674°F)"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A piece of corroded steel plate was found in a submerged ocean vessel. It was estimated that the original area of the plate was 10 in^{2} and that approximately 2.6kg had corroded away during the submersion. Assuming a corrosion penetration rate of 200 mpy for this alloy in seawater, estimate the time of submersion in years. The density of steel is 7.9g/cm^{3}. The time of submersion is:",
"choices": {
"text": [
"10 yr",
"5.2 yr",
"15.8 yr",
"20.4 yr"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A steel with BCC crystal structure containing 0.001% N is nitrided at 550°C for 5h. If the nitrogen content at the steel surface is 0.08%, the nitrogen content at 0.25mm from the surface is:",
"choices": {
"text": [
"0.049% N",
"0.062% N",
"0.033% N",
"0.071% N"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What does the long-range structure (secondary structure) of polymer chain structures include?",
"choices": {
"text": [
"Relative molecular mass and its distribution, chain flexibility and conformation",
"Crystallinity degree and spherulite size",
"Tacticity and head-to-head configuration",
"Crosslinking density and branching frequency"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What point defects are possible for Al2O3 as an impurity in MgO?",
"choices": {
"text": [
"Mg2+ vacancies and O2- interstitials",
"Al3+ substitutions and Mg2+ interstitials",
"Al3+ substitutions and O2- vacancies",
"Mg2+ vacancies and Al3+ interstitials"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "For an n-type semiconductor:",
"choices": {
"text": [
"The electron mobility decreases with increasing temperature due to enhanced phonon scattering",
"The electron mobility increases with temperature as more carriers are excited to the conduction band",
"The electron mobility remains constant with temperature as donor ionization compensates for phonon scattering",
"The electron mobility first increases then decreases with temperature due to competing impurity and phonon scattering effects"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Among the following types of crystals, the order of forming interstitial solid solutions is",
"choices": {
"text": [
"The critical resolved shear stress decreases with increasing temperature in BCC metals",
"The Peierls-Nabarro stress is higher in FCC than BCC crystals at room temperature",
"Dislocation cross-slip occurs more easily in HCP than FCC crystals",
"Twinning is the dominant deformation mechanism in pure aluminum at 300K"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "In the non-stoichiometric compound ZrO2-x, the lattice defect present is",
"choices": {
"text": [
"Dislocation climb is the dominant mechanism",
"Grain boundary sliding becomes negligible",
"Nabarro-Herring creep is suppressed",
"Coble creep dominates at all temperatures"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which method can be used to distinguish between 45 steel and HT150 metals?",
"choices": {
"text": [
"Metallographic examination showing pearlite-ferrite microstructure",
"X-ray diffraction analysis of crystal structure",
"Measuring electrical conductivity at 20°C",
"Comparing Brinell hardness values at 500kgf load"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which of the following is the third main mechanism of alloy strengthening?",
"choices": {
"text": [
"Grain boundary strengthening",
"Dislocation pinning by interstitial atoms",
"Strain hardening through cold working",
"Solid solution strengthening by substitutional atoms"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Analyze the type of solid solution formed by H in α-Fe and γ-Fe, their locations, and the solubility (mole fraction). The atomic radii of the elements are as follows: H: 0.046 nm, α-Fe: 0.124 nm, γ-Fe: 0.126 nm. Which of the following statements is correct?",
"choices": {
"text": [
"H forms interstitial solid solutions in both α-Fe and γ-Fe, with higher solubility in γ-Fe due to its larger octahedral interstitial sites",
"H forms substitutional solid solutions in α-Fe but interstitial in γ-Fe, due to the smaller size mismatch in γ-Fe",
"H forms interstitial solid solutions in both phases, but with higher solubility in α-Fe due to its BCC structure providing more interstitial sites",
"H forms substitutional solid solutions in both phases, with solubility limited by the large size mismatch between H and Fe atoms"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "After quenching, 1.2% steel obtains a martensite and a small amount of retained austenite structure. What changes will occur when heated to 680°C and held for 2 hours?",
"choices": {
"text": [
"Formation of spheroidized cementite in ferrite matrix",
"Complete transformation to austenite with grain growth",
"Retention of martensitic structure with reduced hardness",
"Precipitation of fine carbides creating tempered martensite"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Which of the following statements correctly compares the mechanical properties of upper bainite and lower bainite?",
"choices": {
"text": [
"Lower bainite exhibits higher hardness but lower toughness compared to upper bainite due to its finer carbide distribution",
"Upper bainite shows superior strength-toughness balance because its ferrite laths are more uniformly spaced",
"Both types show similar yield strength but lower bainite has better fatigue resistance due to its dislocation substructure",
"Upper bainite has higher ductility because its cementite particles are coarser and more widely spaced"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]B[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "When most of the latent heat of crystallization can be dissipated through the liquid phase during ingot solidification, the main solid microstructure is",
"choices": {
"text": [
"Dendritic growth with tertiary arms",
"Perfectly aligned single-crystal structure",
"Equiaxed grains with random orientation",
"Directionally solidified lamellar structure"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "Compute repeat unit molecular weight for polycarbonate:",
"choices": {
"text": [
"254.27 g/mol",
"228.29 g/mol",
"286.33 g/mol",
"242.23 g/mol"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "How is tool steel further classified?",
"choices": {
"text": [
"Cutting tool steel, die steel, and measuring tool steel",
"Carbon tool steel, alloy tool steel, and high-speed steel",
"Cold work die steel, hot work die steel, and plastic mold steel",
"Low-alloy tool steel, medium-alloy tool steel, and high-alloy tool steel"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "A completely amorphous and nonporous polymer will be",
"choices": {
"text": [
"The polymer will exhibit a sharp melting point transition",
"The polymer's glass transition temperature will decrease with increasing molecular weight",
"The polymer will show anisotropic mechanical properties",
"The polymer's crystallinity can be increased by rapid quenching"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "For a batch of approximately 10,000 pieces of 45 steel gears subjected to relatively low contact stress but requiring good wear resistance on the teeth and minimal heat treatment deformation, which surface treatment should be selected?",
"choices": {
"text": [
"High-frequency surface quenching",
"Carburizing followed by oil quenching",
"Nitriding at 500°C for 24 hours",
"Induction hardening with subsequent tempering"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the approximate temperature at which it is desirable to heat a 1.10 wt% C iron-carbon alloy during a full anneal heat treatment?",
"choices": {
"text": [
"About 777°C (1430°F)",
"About 727°C (1340°F)",
"About 912°C (1674°F)",
"About 1148°C (2098°F)"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the driving force for recrystallization?",
"choices": {
"text": [
"The difference in internal energy between the strained and unstrained material",
"The reduction in dislocation density during annealing",
"The stored elastic strain energy in the deformed material",
"The difference in Gibbs free energy between the deformed and recrystallized states"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
},
{
"question": "What is the interdiffusion coefficient D for the A-B binary system at 550K with molar fractions x_A=0.6 and x_B=0.4, given D_B^AB=9×10^-12 cm²/s, D_A^AB=2×10^-12 cm²/s, and d²G/dx_B²=-95.325?",
"choices": {
"text": [
"-2.4×10^-18 m²/s",
"3.6×10^-12 cm²/s",
"5.4×10^-12 m²/s",
"-9.5×10^-12 cm²/s"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"answer": "[ANSWER]A[/ANSWER]",
"prompt": "You are an expert in materials science. Please answer the following materials science question by selecting the correct option. You MUST include the letter of the correct answer at the end of your response within the following tags: [ANSWER] and [/ANSWER]. For example: [ANSWER]A[/ANSWER]."
}
]

View File

@@ -1,5 +1,6 @@
import json
from typing import Dict, Any, List, Optional
from typing import Dict, Any, List, Optional, Tuple
import random
def convert_to_target_format(source_data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
"""
@@ -54,23 +55,166 @@ def convert_to_target_format(source_data: Dict[str, Any]) -> Optional[Dict[str,
return target_data
def batch_convert_questions(input_file: str, output_file: str) -> None:
def classify_questions_by_difficulty(questions: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
"""
批量转换题目格式
按难度分类题目
Args:
questions: 题目列表
Returns:
按难度分类的题目字典
"""
difficulty_groups = {
"hard_early_stop": [], # 困难题(答错后早停)
"easy_all_correct": [], # 简单题(所有采样都答对)
"mixed": [], # 混合题(部分对部分错)
"unknown": [] # 未知难度
}
for question in questions:
generated_options = question.get("generated_options", {})
sampling_summary = generated_options.get("sampling_summary", {})
difficulty_label = sampling_summary.get("difficulty_label", "unknown")
if difficulty_label in difficulty_groups:
difficulty_groups[difficulty_label].append(question)
else:
difficulty_groups["unknown"].append(question)
return difficulty_groups
def select_questions_by_ratio(difficulty_groups: Dict[str, List[Dict[str, Any]]],
selection_ratios: Dict[str, float],
random_seed: Optional[int] = None) -> Tuple[List[Dict[str, Any]], Dict[str, int]]:
"""
按比例选择题目
Args:
difficulty_groups: 按难度分类的题目
selection_ratios: 各难度等级的选择比例 (0.0-1.0)
random_seed: 随机种子
Returns:
选中的题目列表和选择统计信息
"""
if random_seed is not None:
random.seed(random_seed)
selected_questions = []
selection_stats = {}
for difficulty, questions in difficulty_groups.items():
total_count = len(questions)
ratio = selection_ratios.get(difficulty, 0.0)
# 计算要选择的题目数量
if ratio <= 0:
selected_count = 0
elif ratio >= 1:
selected_count = total_count
else:
selected_count = int(total_count * ratio)
# 随机选择题目
if selected_count > 0 and total_count > 0:
if selected_count >= total_count:
selected = questions
else:
selected = random.sample(questions, selected_count)
selected_questions.extend(selected)
else:
selected = []
# 记录统计信息
selection_stats[difficulty] = {
"total": total_count,
"selected": len(selected),
"ratio_target": ratio,
"ratio_actual": len(selected) / total_count if total_count > 0 else 0
}
# 打乱最终题目顺序
random.shuffle(selected_questions)
return selected_questions, selection_stats
def batch_convert_questions_with_difficulty_filter(input_file: str,
output_file: str,
selection_ratios: Dict[str, float],
random_seed: Optional[int] = None) -> None:
"""
批量转换题目格式,支持按难度筛选
Args:
input_file: 输入文件路径
output_file: 输出文件路径
selection_ratios: 各难度等级的选择比例
random_seed: 随机种子
"""
print("正在加载数据...")
# 判断输入文件格式
with open(input_file, 'r', encoding='utf-8') as f:
source_questions = json.load(f)
data = json.load(f)
# 处理两种可能的输入格式
if isinstance(data, dict) and "questions" in data:
# 格式:{"questions": [...], "其他字段": ...}
source_questions = data["questions"]
print(f"检测到完整格式数据,包含其他元数据")
elif isinstance(data, list):
# 格式:[{题目1}, {题目2}, ...]
source_questions = data
print(f"检测到题目列表格式")
else:
raise ValueError("不支持的输入文件格式")
print(f"加载了 {len(source_questions)} 道题目")
# 按难度分类题目
print("正在按难度分类题目...")
difficulty_groups = classify_questions_by_difficulty(source_questions)
print("题目难度分布:")
total_multiple_choice = 0
for difficulty, questions in difficulty_groups.items():
# 统计该难度下的单选题数量
mc_count = sum(1 for q in questions
if q.get("generated_options", {}).get("question_type") == "multiple_choice")
total_multiple_choice += mc_count
print(f" {difficulty}: {len(questions)} 道总题目, {mc_count} 道单选题")
print(f"可转换的单选题总数: {total_multiple_choice}")
# 按比例选择题目
print("\n正在按比例选择题目...")
print("选择比例设置:")
for difficulty, ratio in selection_ratios.items():
if difficulty in difficulty_groups:
print(f" {difficulty}: {ratio*100:.1f}%")
selected_questions, selection_stats = select_questions_by_ratio(
difficulty_groups, selection_ratios, random_seed
)
print(f"\n题目选择结果:")
total_selected = 0
for difficulty, stats in selection_stats.items():
print(f" {difficulty}:")
print(f" 总数: {stats['total']}")
print(f" 选中: {stats['selected']}")
print(f" 目标比例: {stats['ratio_target']*100:.1f}%")
print(f" 实际比例: {stats['ratio_actual']*100:.1f}%")
total_selected += stats['selected']
print(f"总共选中: {total_selected} 道题目")
# 转换选中的题目
print("\n正在转换题目格式...")
converted_questions = []
conversion_stats = {
"total": len(source_questions),
"selected": total_selected,
"multiple_choice": 0,
"true_false": 0,
"other": 0,
@@ -78,7 +222,7 @@ def batch_convert_questions(input_file: str, output_file: str) -> None:
"failed": 0
}
for i, question in enumerate(source_questions):
for i, question in enumerate(selected_questions):
try:
# 统计题目类型
generated_options = question.get("generated_options", {})
@@ -105,18 +249,29 @@ def batch_convert_questions(input_file: str, output_file: str) -> None:
# 保存结果
print("正在保存转换结果...")
output_data = {
"questions": converted_questions,
"metadata": {
"total_original_questions": len(source_questions),
"selection_ratios": selection_ratios,
"selection_stats": selection_stats,
"conversion_stats": conversion_stats,
"random_seed": random_seed
}
}
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(converted_questions, f, ensure_ascii=False, indent=2)
# 打印统计信息
# 打印最终统计信息
print(f"\n转换完成!")
print(f"题目数: {conversion_stats['total']}")
print(f"选中题目数: {conversion_stats['selected']}")
print(f"单选题: {conversion_stats['multiple_choice']}")
print(f"判断题: {conversion_stats['true_false']}")
print(f"其他类型: {conversion_stats['other']}")
print(f"成功转换: {conversion_stats['converted']}")
print(f"转换失败: {conversion_stats['failed']}")
print(f"转换率: {conversion_stats['converted']/conversion_stats['total']*100:.1f}%")
print(f"最终转换率: {conversion_stats['converted']/conversion_stats['selected']*100:.1f}%")
print(f"结果已保存到: {output_file}")
def validate_converted_questions(questions: List[Dict[str, Any]]) -> Dict[str, int]:
@@ -172,24 +327,80 @@ def validate_converted_questions(questions: List[Dict[str, Any]]) -> Dict[str, i
return stats
def create_difficulty_config_template():
"""创建难度配置模板"""
template = {
"hard_early_stop": 1.0, # 困难题选择100%
"easy_all_correct": 0.1, # 简单题选择10%
"mixed": 0.5, # 混合题选择50%
"unknown": 0.0 # 未知难度题目选择0%
}
print("难度选择比例配置模板:")
print(json.dumps(template, indent=2))
print("\n说明:")
print("- 1.0 = 100% (全部选择)")
print("- 0.5 = 50% (选择一半)")
print("- 0.1 = 10% (选择10%)")
print("- 0.0 = 0% (不选择)")
return template
def main():
"""主函数"""
# 文件路径配置
INPUT_FILE = "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepy_complete_choice_questions.json"
OUTPUT_FILE = "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepz_final_choice_questions.json"
INPUT_FILE = "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepy_complete_choice_questions_with_sampling.json"
OUTPUT_FILE = "/home/ubuntu/50T/LYT/MatBench/layer2/PGEE/code/stepz_final_choice_questions_filtered.json"
# 难度选择比例配置
# 可以根据需要调整这些比例
SELECTION_RATIOS = {
"hard_early_stop": 1.0, # 困难题选择100% (全部)
"easy_all_correct": 0.0, # 简单题选择10%
"mixed": 0.0, # 混合题选择30%
"unknown": 0.0 # 未知难度不选择
}
# 随机种子,保证结果可复现
RANDOM_SEED = 42
try:
# 批量转换
batch_convert_questions(INPUT_FILE, OUTPUT_FILE)
# 显示配置信息
print("=== 难度筛选配置 ===")
print("选择比例:")
for difficulty, ratio in SELECTION_RATIOS.items():
print(f" {difficulty}: {ratio*100:.1f}%")
print(f"随机种子: {RANDOM_SEED}")
print()
# 批量转换(包含难度筛选)
batch_convert_questions_with_difficulty_filter(
INPUT_FILE,
OUTPUT_FILE,
SELECTION_RATIOS,
RANDOM_SEED
)
# 验证转换结果
print("\n正在验证转换结果...")
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
converted_questions = json.load(f)
result_data = json.load(f)
# 检查输出文件格式
if "questions" in result_data:
converted_questions = result_data["questions"]
metadata = result_data.get("metadata", {})
print("\n=== 元数据信息 ===")
if metadata:
print(f"原始题目总数: {metadata.get('total_original_questions', 'N/A')}")
print(f"随机种子: {metadata.get('random_seed', 'N/A')}")
else:
converted_questions = result_data
validation_stats = validate_converted_questions(converted_questions)
print(f"\n验证结果:")
print(f"\n=== 验证结果 ===")
print(f"总题目数: {validation_stats['total']}")
print(f"格式正确: {validation_stats['valid']}")
print(f"格式错误: {validation_stats['invalid']}")
@@ -203,42 +414,83 @@ def main():
except Exception as e:
print(f"程序执行失败: {e}")
import traceback
traceback.print_exc()
def test_single_conversion():
"""测试单个题目转换"""
# 测试数据
test_data = {
"idx": 3154,
"question": "In stable ZrO2 material, cations form an fcc structure, and anions occupy tetrahedral interstitial sites. If 20 mol% CaO is added, calculate the percentage of occupied tetrahedral interstitial sites.",
"answer": "Zr4+ and Ca2+ cations occupy the face-centered cubic lattice sites. 100 cations can form 25 unit cells, with a total of 25×8=200 tetrahedral interstitial sites. Therefore, the percentage of occupied tetrahedral interstitial sites is 180÷200=90%.",
"question_type": "calculation",
"correct_option": "90%",
"choice_question": "In stable ZrO2 material, cations form an fcc structure, and anions occupy tetrahedral interstitial sites. If 20 mol% CaO is added, what is the percentage of occupied tetrahedral interstitial sites?",
"generated_options": {
"question_type": "multiple_choice",
"options": {
"A": "80%",
"B": "90%",
"C": "50%",
"D": "75%"
},
"correct_answer": "B",
"explanation": "正确答案90%基于1) fcc中四面体间隙数量是阳离子的2倍2) 20 mol% CaO掺杂产生20%氧空位3) 被占据间隙位比例=(原始占据数-空位数)/总间隙位数。"
},
"generation_status": "success"
def interactive_config():
"""交互式配置选择比例"""
print("=== 交互式难度选择配置 ===")
difficulties = ["hard_early_stop", "easy_all_correct", "mixed", "unknown"]
difficulty_names = {
"hard_early_stop": "困难题(答错早停)",
"easy_all_correct": "简单题(全部答对)",
"mixed": "混合题(部分对错)",
"unknown": "未知难度题"
}
# 测试转换
result = convert_to_target_format(test_data)
if result:
print("转换成功!")
print(json.dumps(result, ensure_ascii=False, indent=2))
ratios = {}
for diff in difficulties:
while True:
try:
ratio_input = input(f"请输入{difficulty_names.get(diff, diff)}的选择比例 (0-100%): ").strip()
if ratio_input.endswith('%'):
ratio_input = ratio_input[:-1]
ratio_percent = float(ratio_input)
if 0 <= ratio_percent <= 100:
ratios[diff] = ratio_percent / 100.0
break
else:
print("请输入0-100之间的数值")
except ValueError:
print("请输入有效的数值")
print("\n配置结果:")
for diff, ratio in ratios.items():
print(f" {difficulty_names.get(diff, diff)}: {ratio*100:.1f}%")
return ratios
def test_difficulty_distribution(input_file: str):
"""测试文件中的难度分布"""
print(f"正在分析文件难度分布: {input_file}")
with open(input_file, 'r', encoding='utf-8') as f:
data = json.load(f)
# 处理两种可能的输入格式
if isinstance(data, dict) and "questions" in data:
questions = data["questions"]
elif isinstance(data, list):
questions = data
else:
print("转换失败!")
print("不支持的文件格式")
return
difficulty_groups = classify_questions_by_difficulty(questions)
print(f"\n难度分布分析:")
print(f"总题目数: {len(questions)}")
for difficulty, question_list in difficulty_groups.items():
mc_count = sum(1 for q in question_list
if q.get("generated_options", {}).get("question_type") == "multiple_choice")
print(f" {difficulty}:")
print(f" 总数: {len(question_list)}")
print(f" 单选题: {mc_count}")
print(f" 占比: {len(question_list)/len(questions)*100:.1f}%")
if __name__ == "__main__":
# 可以先运行测试
# test_single_conversion()
# 可以先测试难度分布
# test_difficulty_distribution("/path/to/your/input/file.json")
# 可以使用交互式配置
# ratios = interactive_config()
# 运行主程序
main()
# 显示配置模板
# create_difficulty_config_template()