17 lines
2.2 KiB
JSON
17 lines
2.2 KiB
JSON
[
|
||
{
|
||
"id": 1,
|
||
"chunk": "# nature computational science",
|
||
"category": " References"
|
||
},
|
||
{
|
||
"id": 2,
|
||
"chunk": "# Harnessing large language models for datascarce learning of polymer properties \n\nIn the format provided by the authors and unedited \n\nSupplementary Algorithm 1 The key steps of the two-phase training strategy.",
|
||
"category": " Materials and methods"
|
||
},
|
||
{
|
||
"id": 3,
|
||
"chunk": "# 1: LLM encoder pretraining: \n\n2: Use a large dataset of unlabeled SMILES representations of polymers to pretrain an LLM encoder M˜ encode. \n3: Phase-1 supervised pretraining: \n4: Use physics-based hypothetical polymer generation methods, such as group contribution (GC), to generate a large dataset of physically meaningful synthetic polymer structures $\\{X_{i}\\}_{i=1}^{S_{G C}}$ with the correlation of fundamental thermophysical properties; \n5: Build a physics-based model $\\boldsymbol{\\mathcal{M}}_{\\mathrm{p}h y s i c s}$ of the real-world physical process, by leveraging on the fundamental properties calculated from physically meaningful synthetic polymers; \n6: Construct a physically meaningful synthetic dataset $\\mathcal{D}_{G C}:=\\{(X_{i},\\mathcal{M}_{\\mathrm{p}h y s i c s}(X_{i}))\\}_{i=1}^{S_{G C}}$ ; \n7: Apply supervised pretraining to the LLM decoder/predictor using the synthetic dataset $\\mathit{\\Delta}\\mathcal{D}_{\\mathit{G C}}$ , to obtain an LLM decoder $\\dot{\\mathcal{M}}_{\\mathrm{decode}}$ with physically consistent initial state; \n8: Phase-2 finetuning: \n9: Collect a (usually small) set of high-fidelity measurements from experiments, denoted as $\\mathcal{D}_{H F}\\mathrel{\\mathop:}=$ $\\{({X}_{i}^{H F},{Y}_{i}^{H F})\\}_{i=1}^{S_{H F}}$ ; \n10: Split the high-fidelity experimental dataset $\\mathcal{D}_{H F}$ as a training set $\\mathcal{D}_{H F}^{\\mathrm{t}r a i n}$ and a test set $\\mathcal{D}_{H F}^{\\mathrm{t}e s t}$ , and finetune the phase-1 LLM $\\tilde{\\mathcal{M}}_{\\mathrm{decode}}$ using $\\mathcal{D}_{H F}^{\\mathrm{t}r a i n}$ ; \n11: Obtain the final physics-guided LLM, and report the prediction accuracy on the test dataset $\\mathcal{D}_{H F}^{\\mathrm{t}e s t}$ .",
|
||
"category": " Materials and methods"
|
||
}
|
||
] |