wl-hydrophilic-polymer/task1/task1-chunks/╧╚╜°╛█║╧╬я▓─┴╧╡─╗·╞ў╤з╧░╕и╓·╔ш╝╞.json

[
    {
        "id": 1,
        "chunk": "# Machine Learning-Assisted Design of Advanced Polymeric Materials  \n\nLiang Gao, Jiaping Lin,\\* Liquan Wang, and Lei Du",
        "category": " Introduction"
    },
    {
        "id": 2,
        "chunk": "# Cite This: Acc. Mater. Res. 2024, 5, 571−584  \n\n![](images/88396e30a8862fdad5ac3754833bd39c621a41cede3c163fe20b01dc47e4a92b.jpg)  \n\nACCESS  \n\nMetrics & More  \n\n![](images/3578bf835abc4a3eaa13a63ad9eea2bbf3667fe710472f7311cc20109bd3a322.jpg)  \nArticle Recommendations  \n\nCONSPECTUS: Polymeric material research is encountering a new paradigm driven by machine learning (ML) and big data. The ML-assisted design has proven to be a successful approach for designing novel high-performance polymeric materials. This goal is mainly achieved through the following procedure: structure representation and database construction, establishment of a ML-based property prediction model, virtual design and high-throughput screening. The key to this approach lies in training ML models that delineate structure−property relationships based on available polymer data (e.g., structure, component, and property data), enabling the screening of promising polymers that satisfy the targeted property requirements. However, the relative scarcity of high-quality polymer data and the complex polymeric multiscale structure−property relationships pose challenges for this ML-assisted design method, such as data and modeling challenges.  \n\nIn this Account, we summarize the state-of-the-art advancements concerning the ML-assisted design of polymeric materials. Regarding structure representation and database construction, the digital representations of polymers are the predominant methods in cheminformatics along with some newly developed methods that integrate the polymeric multiscale structure characteristics. When establishing a ML-based property prediction model, the key is choosing and optimizing ML models to attain high-precision predictions across a vast chemical structure space. Advanced ML algorithms, such as transfer learning and multitask learning, have been utilized to address the data and modeling challenges. During the ML-assisted screening process, by defining and combining polymer genes, virtual polymer candidates are generated, and subsequently, their properties are predicted and high-throughput screened using ML property prediction models. Finally, the promising polymers identified through this approach are verified by computer simulations and experiments.  \n\nWe provide an overview of our recent efforts toward developing ML-assisted design approaches for discovering advanced polymeric materials and emphasize the intricate nature of polymer structural design. To well describe the multiscale structures of polymers, new structure representation methods, such as polymer fingerprint and cross-linking descriptors, were developed. Moreover, a multifidelity learning method was proposed to leverage the multisource isomerous polymer data from experiments and simulations. Additionally, graph neural networks and Bayesian optimization methods have been developed and applied for predicting polymer properties as well as designing polymer structures and compositions.  \n\nFinally, we identify the current challenges and point out the development directions in this emerging field. It is highly desirable to establish new structure representation and advanced ML modeling methods for polymeric materials, particularly when constructing polymer large models based on chemical language. Through this Account, we seek to stimulate further interest and foster active collaborations for developing ML-assisted design approaches and realizing the innovation of advanced polymeric materials.",
        "category": " Abstract"
    },
    {
        "id": 3,
        "chunk": "# 1. INTRODUCTION  \n\nAdvanced high-performance materials are some of the cornerstones of human civilization development. The rapid progress of technology is closely tied to material innovation. The discovery of advanced high-performance materials primarily relies on labor-intensive trial-and-error methodologies. So far, numerous ground-breaking material innovations have emerged through countless iterations driven by the valuable experience and knowledge from materials scientists.1,2 However, the experience-driven methodology is time consuming, particularly when continuous improvements in the material performance are needed. A new methodology is highly demanded to achieve breakthroughs in material design and fabrication.  \n\n![](images/54d32517d9f96e8e40ccd20b2e0813912e344c77c2878bbbde706b54571f7500.jpg)  \nFigure 1. Development of research paradigms: (A) experimental and empirical, (B) model-based theoretical, (C) computer simulation, and (D) artificial intelligence (AI)-driven paradigms.  \n\n![](images/5ef5c3780d1439226d83a7d45e7b815206ca8a65adc6338e2f2f90455e5a4f02.jpg)  \nFigure 2. ML-assisted design procedure to obtain the desired polymeric materials: (A) structure representation and database construction, (B) establishment of ML-based property prediction model, and (C) virtual design and high-throughput screening.  \n\nThe paradigms of materials research have undergone four stages, known as the four paradigms,3−7 as illustrated in Figure 1. In the early stages, material innovations mainly relied on experiments and empiricism. Researchers intuitively described their experimental observations and summarized valuable experience and knowledge. This is an experimental and empirical paradigm, i.e., the first paradigm. Starting in the 17th century, theoretical models were built to generate more accurate scientific rules, thereby facilitating material development. This paradigm is named the model-based theoretical paradigm, i.e., the second paradigm. However, the theoretical models usually can only be solved analytically under some simplified conditions. Due to the temporal and spatial factors inherent in realistic material systems, some theoretical models have become extremely complex, making analytical solutions no longer feasible. Since the 1950s, the advancement of computer technology has enabled numerical solutions to some complex problems. Therefore, computer simulations of the materials, such as first-principles calculations, molecular dynamics, and Monte Carlo simulations, appeared, and they belong to the third paradigm.8,9 Since the 21st century, with the development of informatics and artificial intelligence (AI), a new research paradigm has emerged, the fourth paradigm, which is centered around the data-intensive science.10−12 Consequently, materials science has entered into the fourth paradigm by leveraging accumulated experimental and computational data. Big data analytics and machine learning (ML) have become indispensable tools within this new paradigm for analyzing massive data and constructing prediction models. So far, various data-driven tools and modeling methods have been employed in the fields of medical science, life science, and chemistry, facilitating the development of interdisciplinary methods such as medical informatics, bioinformatics, and cheminformatics.12−15 For example, by applying the fourth paradigm, various novel drug molecules can be virtually designed and screened out to satisfy the disease treatment requirements.16,17 However, in the material field, especially polymer material design, limited work based on the fourth paradigm has been reported. This Account focuses on the ML-assisted design of polymeric materials and summarizes the advancements and challenges in the emerging field of polymer informatics which is generated from the intersection of polymer science and informatics.  \n\nOver the past century, polymeric materials have been widely utilized in daily life and advanced manufacturing industries.  \n\nTable 1. List of Established Polymer Databases   \n\n\n<html><body><table><tr><td> no.</td><td>database</td><td>URL</td><td> origin of data</td><td> description</td></tr><tr><td>1</td><td>Khazana</td><td>https://khazana.gatech.edu/ dataset/</td><td>computational</td><td>thermoplastic; mechanical, thermal, electrical properties</td></tr><tr><td>2</td><td>PolyInfo</td><td>https://polymer.nims.go.jp/</td><td>empirical</td><td>thermoplastic; mechanical, optical, thermal, rheological properties</td></tr><tr><td>3</td><td>and database</td><td> Polymer property predictorhttps://pppdb.uchicago.edu/</td><td>empirical</td><td>Flory-Huggins parameter, glass transition temperature, binary polymer solution cloud point</td></tr><tr><td>4</td><td></td><td> Material properties database https://www.makeitfrom.com/</td><td>empirical/computational</td><td>thermoplastic, thermoset, rubber; mechanical, thermal, electrical properties</td></tr><tr><td>5</td><td>database</td><td>CROW polymer propertieshttps://polymerdatabase.com/</td><td>empirical/computational</td><td>thermoplastic, rubber, fiber; physical, thermal properties</td></tr><tr><td>6</td><td>PI1M</td><td>https://github.com/ RUIMINMA1996/PI1M</td><td>computational</td><td>virtual polymers; physical, thermal, electrical properties</td></tr><tr><td>7</td><td>Dortmund database</td><td>https://ddbst.com/</td><td>computational</td><td>physical properties, phase equilibrium data</td></tr><tr><td>8</td><td>AI plus Polymers</td><td>https://polymergenome.ecust. edu.cn/</td><td> empirical/computational</td><td>thermoset, thermoplastic; physical, mechanical, thermal, electrical properties</td></tr></table></body></html>  \n\nThe production scale of polymeric materials has surpassed that of the steel industry.18 The polymer properties can be regulated by their chemical structures and morphologies, and almost infinite design spaces are available. Significant efforts have been devoted to improving their performance; however, the existing approaches are time consuming and costly. An efficient new methodology is highly desired. In the emerging field of polymer informatics, the ML-assisted design strategy of polymeric materials can well address the challenge of performance improvement in the vast chemical space by utilizing informatics tools, including big data and ML algorithms.19,20  \n\nThe ML-assisted design of polymeric materials can be achieved through the following process: structure representation and database construction, establishment of a ML-based property prediction model, virtual design and high-throughput screening (Figure 2). The key lies in training ML prediction models that provide a quantitative structure−property relationship (QSPR) based on available polymer data (e.g., structure, component, and property data), enabling the screening of promising polymeric materials that satisfy the specific targeted property requirements. However, compared with metals and inorganic materials, the polymers possess unique multiscale structural characteristics. The polymer performance is determined by polymer structures (e.g., molecular weight and its distribution) and aggregation structures (e.g., crosslinking, crystallization, orientation, and microphase structures). This poses challenges for the data analysis and the ML prediction modeling tasks for polymeric materials.  \n\nIn this Account, we summarize the advancements achieved regarding the ML-assisted design of polymeric materials and emphasize the intricate nature of structure representation, property prediction, and structural design arising from the complex polymeric multiscale structures. Moreover, we discuss the challenges of the ML-assisted materials design approach and the development directions in the polymer informatics field.",
        "category": " Introduction"
    },
    {
        "id": 4,
        "chunk": "# 2. STRUCTURE REPRESENTATION AND DATABASE CONSTRUCTION  \n\nThe digitization of polymer structures and the construction of a polymer database are the first steps to support the establishment of ML prediction models. After undergoing feature engineering, the digital representations of structures can serve as the inputs for training. Note that simultaneously improving the quality and quantity of the data is essential during the database construction process to ensure adequate consistency and availability. This section discusses the polymer data sources, the digital representation methods of polymer structures, and the database construction methods.",
        "category": " Materials and methods"
    },
    {
        "id": 5,
        "chunk": "# 2.1. Polymer Data Sources  \n\nData sources include the literature, public databases, handbooks, theoretical calculations, and computer simulations.20−22 Over the past century, an extensive catalog of polymers was synthesized and their corresponding properties were examined, resulting in a large amount of structure−property data. To date, several databases have been established to compile the existing polymer data. These databases can be utilized as data sources. Table 1 provides an overview of the databases that primarily focus on polymer structure−property data. For example, the PolyInfo database catalogs experimental data for thermoplastic polymers, while Ramprasad et al. utilized computer simulation data to construct the Khazana database. Note that the polymer databases listed in Table 1 are mainly focused on thermoplastics. Recently, we established a platform named AI plus Polymers, which mainly focuses on thermosetting resins and encompasses about 150 000 property data for over 34 000 polymer structures. Such a comprehensive database covers a broader space of polymer structures and properties, providing data for database construction, especially for that of thermosetting resins.",
        "category": " Materials and methods"
    },
    {
        "id": 6,
        "chunk": "# 2.2. Polymer Structure Representation  \n\nThe polymeric multiscale structures, which encompass repeating unit chemical structures, chain structures, and aggregation structures, necessitate rational and accurate digital representations to ensure the accuracy of the established ML property prediction models.11 Drawing inspiration from the digital representation methods employed for small molecules, various methods have been utilized to digitally represent the structures of polymers. These methods include text-based simplified molecular input line entry systems (SMILES), graph- and fingerprint-based approaches (e.g., Morgan, PubChem, and Daylight fingerprints), as well as 3D geometry-based methods.23−25 The polymer chain structures are generally represented by the chemical structures of their repeating units. After digital representation, descriptors that describe the topological and chemical information of structures are obtained and serve as inputs for training a ML prediction model. In addition, after data preprocessing, the polydispersity information of polymers can also serve as model inputs to improve the accuracy of the ML prediction model.  \n\n![](images/3c72bd7eace2143b60d4c1bd70655f00a1c0c5bf61d475ebdb19559c0c673ee8.jpg)  \nFigure 3. Structure representation methods for polymers: (A) text-based SMILES representation, (B) molecular fingerprint representation, (C) molecular graph representation for the repeating units or monomers of polymers, and (D) newly developed polymer fingerprint representation based on a combination of molecular fingerprints and periodic graphs. (B) Reproduced with permission from ref 23. Copyright 2022 American Chemical Society. (C) Reproduced with permission from ref 25. Copyright 2023 Royal Society of Chemistry.  \n\nFor example, the text-based SMILES representation method encodes chemical structure information into a text form by converting it into SMILES strings (Figure 3A).24 ML models can be trained on these representations to learn the features of molecular structures.26 Molecular fingerprint is a representation method based on chemical structure segments, which converts molecular structure information into binary code.23 The 0 or 1 of each fingerprint indicates whether a certain chemical feature is present in the given molecule, such as atoms and bonds. Recently, the molecular fingerprint method has been extended from a simple 2D model to a 3D representation model. For example, Axelrod et al. first constructed the molecular fingerprint elements $\\vec{h}$ and subsequently embedded them with the statistical weight $p$ to generate a new fingerprint $\\vec{q}$ . Then, they trained deep learning (DL) models with an attention mechanism to yield the final comprehensive fingerprint $\\vec{Q}$ by combining the fingerprints $\\vec{q}$ (Figure 3B).23 This method enables accurate 3D fingerprint representations of molecular structures to be obtained, which is significantly better than 2D methods and holds promise for accurately predicting polymer properties.  \n\nThe molecular graph is a representation method that transforms the chemical structure into a graph, where atoms and bonds are represented as nodes and edges, respectively.23,25 The topological characteristics of a molecular structure are digitized through the connectivity of nodes and edges in the graph. A feature matrix is defined to capture atomic feature information, and an adjacency matrix is defined to reflect chemical bond feature information. For example, we utilized the molecular graph method to represent the monomer of polycyanurate resins.25 After defining atom feature matrices and adjacency matrices, we utilized a graph-based selfsupervised learning framework to learn the low-dimensional representations of monomers and found that the gated attention network performs best among the graph neural network (GNN) models (Figure 3C). Subsequently, we constructed the ML prediction models for hygroscopicity, coefficient of linear thermal expansion (CLTE), and tensile modulus and screened out the high-performance thermosetting polycyanurates.  \n\nHowever, applying these digital representation methods to polymeric structures is subject to certain limitations.23,27−29 For example, SMILES cannot capture the 3D structure and chain information of polymers and may exhibit ambiguity or nonuniqueness when representing complex polymer structures. Molecular fingerprints are unable to adequately capture certain physical information that is inherent in molecular structures and are sensitive to conformation variations, thereby resulting in the instability of representations. Therefore, considering the characteristics of polymer chains and aggregation structures and developing new structure representation methods have become a consensus in the field of polymer informatics. For example, BigSMILES, an extended form of SMILES, offers a more rational representation of polymers, especially for long chain structures and branching structures.28 BigSMILES introduces new symbols and rules, such as using the $^{\\omega}{\\boldsymbol{\\S}}^{\\prime\\prime}$ symbol to represent connections or branch points, thereby partially incorporating the chain structure information into the digital representations.  \n\nRecently, we proposed a representation method based on periodic fingerprints, namely, polymer fingerprints. As illustrated in Figure 3D, this method extracts features, such as chemical bonds, atoms, and molecular substructures from repeating units, and converts them into fixed-length fingerprint vectors. Through a combination and embedding process with an adjacency matrix, the fingerprint vectors are periodically augmented to attain digital representations of the polymer chain structures. This polymer fingerprint method can be integrated with GNN, exhibiting its scalability with the DL models. For example, by incorporating structural chemical information extracted from other representation methods (e.g., the text-based SMILES representation) into the polymer fingerprint vectors, more accurate digital representations at both the molecular scale and the chain scale can be achieved. Additionally, the high-level structural information of polymers is taken into account in our recent work. For example, we proposed a representation method for polymer cross-linking structures in which the cross-linking density is described based on classical gelation theory to capture the cross-linked features.26 By utilizing such a representation method, we obtained the ML models that can predict the moduli, strength, and elongation at break for the cured epoxy thermosets and screened out the high-performance epoxy resins.  \n\nFeature engineering, a crucial step following digital representation, aims to extract meaningful features from the primary representations for training ML prediction models. This process can reduce the dimensions of features and simultaneously improve the efficiency and accuracy of ML modeling.30 There is a series of descriptors and structural features associated with polymer properties. Descriptors are numerical features derived from atoms, molecules, or chain structures, such as molecular weight, charge distribution, and polarity, which can be extracted using RDKit Python tools. The structural features encompass geometric or topological characteristics, including bond lengths, bond angles, torsion angles, and the number of rings. Various statistical methods can be utilized for feature selection and dimensionality reduction, such as principal component analysis (PCA),29 linear discriminant analysis (LDA),31 and least absolute shrinkage and selection operator $(\\mathrm{LASSO})^{26}$ methods. For example, we developed a gate- and attention-augmented graph convolutional network $\\big(\\mathrm{GCN+a+g}\\big)$ to convert 1081 molecular descriptors of epoxy into an 8-dimensional vector.26 This dimensionality reduction method not only significantly incurs a reduced computational cost but also improves the accuracy of ML models for predicting epoxy thermoset properties.  \n\nNew representation methods that take into account the characteristics of polymeric multiscale structures are desirable to be developed. It is necessary to establish the multiscale descriptors to comprehensively describe the multiscale structures of repeated units, polymer chains, crystallization, cross-linking, and other aggregation structures. For example, the performance of organic photovoltaic (OPV) materials is determined by their multiscale structure, such as the crystallization or orientation of donors/acceptors and the morphological structures of crystalline domains. These complex multiscale structures should be rationally and accurately digitized for training the ML models that can precisely predict the properties of OPV materials. Therefore, comprehensively considering the polymeric multiscale structures during the digital representation process is critical to achieving accurate ML prediction for the polymeric materials with complex structure−property relationships.",
        "category": " Materials and methods"
    },
    {
        "id": 7,
        "chunk": "# 2.3. Polymer Database Construction  \n\nConstructing high-quality data sets is an essential step before conducting data analysis, property prediction, and structural design. However, the structure−property data of polymeric materials exhibit multisource isomerous characteristics in terms of their sources (including experimental and computational data) and types (including numerical values, spectra, and images). These multisource isomerous data sets may contain significant deviations. Therefore, before establishing prediction models, rigorous data cleansing operations should be carried out to improve data availability.31,32 For example, methods such as text similarity testing and entity recognition can be used to standardize data, and $N$ -fold cross-validation (i.e., leave-one-out cross-validation) can be utilized to identify and discard outliers.22,25 By employing the above methods for processing polymer data, the consistency and availability of the constructed polymer database can be ensured.  \n\nTo date, significant advancements have been made in terms of structure representation and database construction for polymeric materials; however, some long-standing issues still require attention. Due to the vast structural diversity of polymers, the amount of currently available data is relatively limited.5,20 Sometimes the polymer property data are of low quality, and the targeted property data are scarce. These issues pose nonnegligible data challenges for the ML-assisted design of polymeric materials. These challenges can be addressed by effectively utilizing computer simulation data or by generating a large amount of high-quality data through high-throughput experiments.",
        "category": " Materials and methods"
    },
    {
        "id": 8,
        "chunk": "# 3. ESTABLISHMENT OF ML PREDICTION MODELSFOR POLYMER PROPERTIES  \n\nThe establishment of a property prediction model is pivotal for polymer design tasks because it determines the accuracy of virtual design results. Various ML algorithms have been employed to establish property prediction models.33 The key is choosing appropriate algorithms and optimizing the model, aiming to achieve high-precision predictions across a vast chemical structure space. The mainstream modeling approaches include Gaussian process regression (GPR), neural networks, decision trees, support vector machines, etc. For example, GPR is a nonparametric Bayesian regression method that has been widely applied to construct ML models for material structure−property relationships. Random forest algorithm, based on the decision tree model, demonstrates excellent generalizability and can be utilized for classification and regression modeling tasks. The neural network is a modeling method that mimics the human brain. Through the neuron connections and the information transfers in the input layer, hidden layer, and output layer, this modeling method can be utilized for data analysis and prediction.  \n\n![](images/6759b80ee7329efa3686d6401318377379bd8a00d79b51200f6fab3f1937d6ca.jpg)  \nFigure 4. Establishment of the GPR prediction models of polycyanurate resins by a multifidelity learning method on experimental and simulation data. (A) Modeling framework containing the graph-based representation method and multifidelity learning. (B−D) Model performance of the polycyanurate resin properties: (B) hygroscopicity, (C) coefficient of linear thermal expansion (CLTE), and (D) tensile modulus. Reproduced with permission from ref 25. Copyright 2023 Royal Society of Chemistry.  \n\nThe neural networks that contain multiple hidden layers and complex hyperparameters, i.e., deep neural networks, can achieve deep learning (DL) modeling. For example, convolutional neural network (CNN) and graph neural network (GNN) are representative deep neural network models employed to establish DL prediction models of material properties. Recently, various advanced ML algorithms have been successfully applied in the fields of chemical informatics and bioinformatics, especially reinforcement learning and transformer-based chemical language modeling methods. When combining the developed structure representation methods suitable for polymers, e.g., polymer fingerprint and BigSMILES, these advanced modeling methods can be applied to establish the prediction models of polymers.  \n\nMoreover, when dealing with some sparse and low-quality polymer data, it is difficult to construct accurate ML prediction models. To address this issue, advanced ML algorithms, such as transfer learning and multitask learning, have been employed. Transfer learning and multitask learning are indeed powerful modeling methods that can leverage knowledge from related domains or tasks to improve the model performance in a target domain even with limited data. For example, transfer learning involves pretraining a model on a relatively large data set of well-studied polymers and then fine tuning it on a smaller target data set about the desired properties, enabling the model to learn relevant patterns and relationships more efficiently.  \n\nRecently, we utilized GPR to establish the prediction models for polycyanurate properties. During the training process, due to the scarcity of experimental data, we leveraged computer simulations to obtain the polycyanurate properties and developed a multifidelity learning method to effectively map low-quality simulation data to high-quality experimental data (Figure 4A).25 Within this framework, the surrogate function $Y_{\\mathrm{H}}(X)$ of the high-quality data depends on the surrogate function $Y_{\\mathrm{L}}(X)$ of the low-quality data and their difference $Y_{\\mathrm{D}}(X)$ . For thermosetting polycyanurate resins, some property data are limited due to the difficulties of chemical synthesis, for example, hygroscopicity, thermal expansivity, and modulus data of polycyanurates. We calculated the properties of crosslinked polycyanurate structures by simulations and utilized these calculation data (low-quality data) to learn the underlying structure−property relationships via the GPR model. As shown in Figure 4, through graph-based representation and GPR modeling, the trained models of $Y_{\\mathrm{L}}(X)$ demonstrate excellent performance and generalizability. Then, we established the GPR models for the deviations between simulation and experimental data, i.e., $Y_{\\mathrm{D}}(X)$ . By integrating the well-trained models of $Y_{\\mathrm{L}}(X)$ and $Y_{\\mathrm{D}}(X)$ , we obtained the prediction models of $Y_{\\mathrm{H}}(X)$ with high precision. The proposed data mapping strategy can simultaneously improve the quantity and quality of the available data. Compared with the direct ML modeling methods, such a GPR-based multifidelity learning method significantly improves the resulting prediction accuracy and provides a reliable solution for establishing high-precision ML models on limited polymer data.  \n\nUtilizing the deep neural network modeling method, Ramprasad et al. developed a framework containing multitask learning and meta learning algorithms and established the prediction models for copolymer properties.34 Based on a copolymer data set consisting of glass transition $(T_{\\mathrm{g}})$ , melting $\\left(T_{\\mathrm{m}}\\right)$ , and degradation $\\left(T_{\\mathrm{d}}\\right)$ temperatures, they conducted fingerprint representations for copolymers (Figure 5A). These fingerprints served as inputs for training five cross-validation multitask neural networks. The trained multitask learning models were utilized to construct a meta learner as the final prediction model (Figure 5B). The model performance was assessed by metrics, such as the determination coefficient $\\left(R^{2}\\right)$ and root-mean-square error (RMSE). This modeling approach resulted in highly accurate models with an $R^{2}$ value of 0.94 and excellent transferability for both homopolymers and copolymers (Figure 5C−E).  \n\n![](images/f084e485c56084d9aaea788d828a082d66059cab3e8061fddb61a34771290f1e.jpg)  \nFigure 5. Multitask deep neural networks for establishing copolymer property prediction models. (A) Concatenation-based conditioned multitask neural networks. (B) Meta learner that is composed of the five cross-validation models (blue nodes) and neural networks. (C−E) Model performance of meta learner for the (C) glass transition $(T_{\\mathrm{g}})_{\\mathrm{\\ell}}$ , (D) melting $\\left(T_{\\mathrm{m}}\\right)$ , and (E) degradation $\\left(T_{\\mathrm{d}}\\right)$ temperatures. The frequencies of the homopolymer and copolymer data are indicated in black and red, respectively, in the margins of $\\operatorname{C-E}$ . Reproduced with permission from ref 34. Copyright 2021 American Chemical Society.  \n\nWith the advancements in ML algorithms, more intricate prediction modeling tasks can be accomplished. However, the opacity of these black-box models in terms of their prediction and decision-making processes presents challenges in understanding the models and providing physical explanations. Hence, ensuring interpretability is important for ML prediction models. Various model interpretability approaches have been employed to unveil the underlying rules behind black-box models and enhance the transparency and credibility of their decision-making processes.35 Additionally, the existing ML algorithms can be combined with physical mechanisms, such as physics-informed neural network (PINN), to improve model interpretability. During the PINN training process, neural networks with supervised learning tasks are constructed by imposing physical information constraints. Therefore, PINN can not only attain the underlying rules to achieve prediction but also learn and generate physical mechanisms based on the introduced physical constraints, endowing neural networks with interpretability.36  \n\nIn addition to the above considerations, a desirable ML prediction model should also possess robust extrapolation capability for property prediction.35 The capability to precisely extrapolate high-performance spaces is decisive as it impacts whether novel advanced polymeric materials can be efficiently discovered. To endow the model with enough extrapolation capability, the promising strategies include reasonable data preprocessing, integrating domain-specific insights into feature engineering, and employing some advanced modeling methods (e.g., active learning, generative models, and interactive linear regression models).37 Therefore, future ML prediction models for polymeric materials should exhibit enhanced accuracy, interpretability, and generalizability, providing a robust foundation for rational design.",
        "category": " Results and discussion"
    },
    {
        "id": 9,
        "chunk": "# 4. VIRTUAL DESIGN AND HIGH-THROUGHPUTSCREENING OF POLYMERIC MATERIALS  \n\nML-assisted design usually involves three steps, that is, gene definition and combination, ML-assisted screening, and verification of screened polymers.5,6 The procedure is illustrated in Figure 6A. First, various elements related to material properties are defined as genes, such as structures, components, reaction conditions, and processing parameters. For the polymeric materials, the chemical structures of repeating units or segments along with the length, distribution, and sequence of polymer chains and the morphological features, etc., can be considered to be the genes. We can combine polymer genes according to polymer synthesis routes to generate virtual polymer candidates. Then, using ML models, high-throughput property predictions can be conducted for these candidates, and promising polymers that satisfy the targeted properties can be identified through screening. Finally, computer simulations and experiments are employed to validate the screened polymers.  \n\nRecently, we conducted a ML-assisted design of thermosetting polymer resin and obtained a novel poly(silane arylacetylene) (PSA) resin with enhanced heat resistance and processability.38 The heat-resistant PSA resins, an organic− inorganic hybrid material, have been developed; however, their processability still needs to be improved. First, we defined diyne and silane as the genes of PSA resins according to the synthesis route, collected various genes from the database, and combined them to generate 368 PSA candidates. Then, we established the ML models for predicting the processability and heat resistance of these PSA candidates. Through predicting the candidate properties with these ML models, promising PSA resins with optimal processability and high heat resistance were screened out (see Figure 6B). The black rectangles shown in the heatmap represent the candidates with the top 10 high weighted-average values, i.e., the comprehensive scores of heat resistance and processability. Subsequently, one screened resin containing naphthalene and vinyl groups (PSNP-MV) was synthesized to verify the predictions (Figure 6C). The $T_{\\mathrm{d}5}$ and $T_{\\mathrm{g}}$ values of the PSNP-MV resin are 689 and $608^{\\circ}\\mathrm{C},$ respectively, and its processing window ranges from 110 to $152~^{\\circ}\\mathrm{C}$ . Compared with those of existing PSA resins (e.g., the PSA resin containing 2,7-diethynylnaphthalene, named PSNP-M), the heat resistance and processability of the screened PSNP-MV resins are improved simultaneously, exhibiting enhanced comprehensive performance.  \n\n![](images/14351a855d78104c910b410fc2963acba971336690d675a2d18956d613d255b5.jpg)  \nFigure 6. Virtual design and high-throughput screening of polymeric materials. (A) ML-assisted design procedure including gene definition and combination, ML-assisted screening, and verification of screened polymers. (B) Heatmap of the thermal decomposition temperature and viscosity obtained by ML property prediction for 368 poly(silane arylacetylene) (PSA) resin candidates. (C) Experimental verification for the novel PSNP-MV resin with enhanced comprehensive properties obtained by a ML-assisted design approach. (D) Scatter plot of the modulus, strength, and elongation at break obtained by ML property prediction for 243 942 epoxy resin candidates. (E) Comparison between the experimental values of the screened (red star symbol) and the existing epoxy resin regarding the modulus, strength, and elongation at break. (B and C) Reproduced with permission from ref 38. Copyright 2022 Elsevier. (D and E) Reproduced with permission from ref 26. Copyright 2022 American Chemical Society.  \n\nThe effective screening and design rely on the rational definition and combination of genes to ensure a broader coverage in the target polymer structure space. It is feasible to define genes according to the compositions of polymeric materials and construct gene sets containing diverse structures to generate vast candidates. For example, since epoxy resins generally consist of epoxides and amine curing agents, we defined and collected 746 epoxy genes and 327 amine genes and combined them to generate 243 942 epoxy resin candidates.26 By establishing the ML prediction models for the mechanical properties of epoxy resins and predicting the properties of these candidates, we screened out the epoxy resin thermosets with high strength, high modulus, and high elongation at break from the 243 942 candidates. The mechanical properties of these candidates are shown in Figure 6D. The predicted values of the mechanical properties were standardized and comprehensively scored. The promising resins that meet the multiobjective property targets were screened out according to their comprehensive scores. Finally, one promising high-performance epoxy resin was verified by experiments (see Figure 6E). Its average tensile strength, tensile modulus, and elongation at break are about $89.25\\mathrm{\\:MPa}_{;}$ , $3.85~\\mathrm{GPa},$ , and $6.71\\%$ , respectively, which are close to the predicted values $(99.0~\\mathrm{~\\mp}_{\\mathrm{{a}}}$ , $3.57~\\mathrm{~GPa~}$ , and $6.08\\%$ , respectively). Additionally, compared with other existing epoxy resins, this screened resin exhibits superior mechanical properties, including high modulus, strength, and toughness.  \n\nIn addition to the aforementioned comprehensive score evaluation methods, the multiobjective screening task can be transformed into several single-objective tasks. Through the stepwise screening process one by one, the polymer structures that meet the requirements of superior comprehensive properties are screened out.39 The other possible screening strategies for multiobjective tasks are generally derived from the comprehensive score evaluation method. Sometimes, by defining the approximating function, we can evaluate the approaching degrees of multiple properties to the targets and thereby evaluate whether the candidates meet the requirements of comprehensive properties. Moreover, identifying the Pareto frontier of polymer properties and exploring the promising polymers beyond the Pareto frontier is a feasible screening method.40,41 For the given optimization tasks, the solution that can balance the multiobjective performance is defined as a Pareto-optimal solution, and the set of such solutions is denoted as the Pareto frontier. Therefore, for material properties, in the multiobjective property space, the optimal performance can be obtained at the boundaries of this property space, and these results constitute the Pareto frontier. For example, Li et al. compiled a data set of 2233 existing polyimides (PIs) collected from the PoLyInfo database to identify the Pareto frontier of property space.41 The property space of the glass transition temperature $(T_{\\mathrm{g}})$ , Young’s modulus $(E)$ , and tensile yield strength $(\\sigma_{y})$ of PIs is shown in Figure 7A, where the red squares are the existing PIs with high performance in defining Pareto frontier. Such a Pareto frontier represents an envelope boundary for these three properties $(T_{\\mathrm{g}},\\cal{E},$ and $\\sigma_{y,}^{\\phantom{}}$ . Through gene definition and combination, they obtained 8 million virtual PIs and then established ML models based on the improved Morgan fingerprint representations and the feed-forward neural networks to predict the $T_{\\mathrm{g}},E,$ and $\\sigma_{y}$ values of the candidate PIs.  \n\n![](images/ecf3ab43986ca78889854ee4bd0bdfa85622c936051c06dfb0f871e1f70f443e.jpg)  \nFigure 7. ML-assisted screening of high-performance polyamides (PIs). (A) Comparisons for the properties of PIs to identify the Pareto frontier and screen the promising PIs. The red squares are the existing PIs with high performance in defining a boundary (Pareto frontier). The red stars are the discovered virtual PIs whose properties are beyond the Pareto frontier. The boundaries on each 2D plane are illustrated with the lines passing Pareto frontiers for clarity. (B) ML-assisted screening combined with SAscore for identifying PIs with high $T_{\\mathrm{g}}$ and good synthetic accessibility (SAscore $<3.8$ ). The three typical PI structures are given at the right of Figure 7B. (A) Reproduced with permission from ref 41. Copyright 2023 Elsevier. (B) Reproduced with permission ref 42. Copyright 2023 Royal Society of Chemistry.  \n\n![](images/927db9ca9b111a186a420eb9144e96fea99f649c5f3a746fea89da7b862d3271.jpg)  \nFigure 8. Exploration of compositional space for high-performance copolymers via a Bayesian optimization-guided approach. (A) Illustration of the optimization and exploration in compositional and property spaces. (B) ML-assisted compositional design of the cocured polycyanurates. The variation in the comprehensive score of hygroscopicity, $T_{\\mathrm{g}},$ and Young’s modulus for all measured samples is plotted as a function of iteration. The contour plot displays the distribution of expected improvement (EI) values at the beginning, middle, and end of the iterations, with regions of higher EI values indicated in red. Green circular and red solid square markers represent the measured samples and the best sample at each iteration. Figures are reproduced with permission from ref 44. Copyright 2023 Royal Society of Chemistry.  \n\nBased on these predicted properties of candidate PIs, they screened out three novel PIs beyond the Pareto frontier (see the red stars in Figure 7A). The screened PIs are promising to exhibit superior performance. Finally, they verified the promising PIs through all-atom molecular dynamic simulations and experiments.  \n\nIn addition, synthetic accessibility is another important metric to be considered during the ML-assisted design process of polymeric materials.20,42 For example, Sun et al. predicted the $T_{\\mathrm{g}}$ values of virtual PIs with the GNN model and screened out the promising PIs with $T_{\\mathrm{g}}$ values exceeding $250~^{\\circ}\\mathrm{C}$ . Then, they utilized the synthetic accessibility score (SAscore) to further screen the PIs that can be easily synthesized (e.g., SAscore $<3.8$ ).42 The SAscore represents synthesis accessibility on a scale from 1 (easy) to 10 (difficult) based on the difficulty of synthesizing the corresponding molecules, which is related to molecular segment contributions and molecular complexity. As shown in Figure 7B, 9191 PIs with high $T_{\\mathrm{g}}$ values and good synthetic accessibility were screened out. Although they did not conduct experimental verification, the proposed ML-assisted design strategy combined with synthetic accessibility is commendable, making it helpful for the experimental exploration of high-performance PIs. However, the SAscore evaluation method only considers the synthetic accessibility of small molecules, that is, monomers, and ignores the polymerization ability of these monomers. Thus, developing a new method to evaluate the comprehensive synthetic accessibility of polymeric materials is an important direction.  \n\nIn the industrialization of polymeric materials, formulation and processing parameters play important roles in determining the final performance. For example, when epoxy resins are utilized as the matrix of composites, various multicomponent curing agents, thickeners, and fillers are added to achieve a balance among the mechanical properties, processability, industry costs, and others.43 The trial-and-error approach for formulation exploration is costly and time consuming. Thus, efficiently designing a multicomponent resin formulation is desperately needed. Recently, we developed a Bayesian optimization (BO)-guided method for the compositional design of cocured polycyanurates (see Figure 8A).44 BO is a representative ML approach for solving costly optimization problems. Establishing appropriate optimization frameworks and conducting benchmarking studies are crucial for ensuring the accuracy and efficiency of the BO-based models.  \n\nWe proposed a BO-guided copolymer compositional design workflow in which the compositional space consists of three monomers, i.e., 1,3-bis[2-(4-cyanatophenyl)-2-propyl]benzene (MBCy), 2,5-bis(4-cyanatophenyl)octahydro-1H-4,7-methanoindene (DOCy), and 2,2-bis(4-cyanatophenyl)propane (BADCy). The workflow comprises the following steps: converting the given multiobjective optimization problem into a single-objective optimization problem using a scalarizing function, fitting an underlying function through a Gaussian process-based surrogate, and inferring promising formulas using an acquisition function. Benchmarking studies were conducted by virtue of the knowledge gained from molecular simulations. Then, based on the benchmarking studies, we developed an effective BO-guided design approach. Utilizing this optimization strategy, we discovered several novel cocured copolymers exhibiting low hygroscopicity along with superior $T_{\\mathrm{g}}$ and Young’s modulus after a few iterations (see Figure 8B). Simultaneously improving the multiobjective properties of polymers is challenging due to their contradictory relationships; however, the developed optimization method effectively addresses this challenge. Therefore, data-driven methods, such as Bayesian optimization,45,46 can be employed for the efficient exploration and discovery of advanced polymeric materials, especially for designing their multicomponent formulations.  \n\nMoreover, the reversal design strategy for polymeric materials has attracted considerable attention due to its ability to rapidly discover novel polymers with desirable properties. In contrast to the aforementioned design methods, utilizing the generative algorithms enables the identification and mutation of preferred polymer genes, allowing the design of highperformance polymers in a relatively limited structural space.11,47 By predicting the properties of the first-generation candidates from gene combinations, the differences between candidate performance and target performance can be evaluated. Based on these differences, the combinable polymer genes can be optimized through genetic crossover and mutation. Then, a new generation of polymer genes and candidates can be generated through the structural generative models trained based on self-supervised learning, such as variational autoencoders or generative adversarial networks.  \n\nThe performance of the new-generation candidates can gradually approach the target performance. As a result, through a few iterations, novel polymers that satisfy the multiobjective properties can be identified. Therefore, beyond forward prediction and screening, it is imperative to develop a reversal design approach that can rapidly identify new polymer structures that meet the imposed multiobjective performance requirements.",
        "category": " Results and discussion"
    },
    {
        "id": 10,
        "chunk": "# 5. SUMMARY AND OUTLOOK  \n\nIn summary, the polymeric material field is entering a new research paradigm driven by ML and big data, promoting advancements in polymer informatics. The QSPR of polymeric materials is predicted by ML models to achieve ML-assisted virtual design and screening, effectively accelerating the discovery of high-performance polymers. However, the relative scarcity of high-quality data and the complex multiscale structure−property relationships of polymeric materials present formidable challenges for such an ML-assisted design method, that is data and modeling challenges, as shown in Figure 9.  \n\n![](images/1af13f657fa9bb95b2f2355c818b4d4254c2834c64a4898e36fcf54f4fc80289.jpg)  \nFigure 9. Schematic summarizing the challenges and development directions of ML-assisted design of polymeric materials.  \n\nTo address these challenges, it is imperative to establish new methods and develop advanced models, especially those inspired by cutting-edge algorithms in the fields of artificial intelligence and cheminformatics. To solve the small data conundrum in polymeric materials, not only the data from the theoretical calculation and computer simulation can be utilized but also advanced ML algorithms can be employed. Directly utilizing the theoretical calculation or simulation data or improving the data quality with the multifidelity surrogate model is effective.25 Additionally, it is essential to account for the multiscale structure characteristics that are inherent in polymeric materials. These development directions are outlined below.",
        "category": " Conclusions"
    },
    {
        "id": 11,
        "chunk": "# 5.1. Establishing New Representation Methods for the Polymeric Multiscale Structures  \n\nA series of multiscale descriptors should be established to accurately represent the chemical structures of repeated units, polymer chain structures, and aggregation structures, including crystallization, cross-linking, and phase structures.48 For example, it is advisable to establish a periodic fingerprint or graph representation method that can capture the characteristics of polymer chains by modifying and embedding various digital representations.27 Moreover, the metastable multiscale structures of polymeric materials should also be digitally represented, and the generated descriptors can serve as inputs for ML training. The information of multiscale structures can be obtained from theoretical calculations, computer simulations, and experiments. Then, they are processed using image analysis algorithms to extract relevant features that characterize the crystalline region, orientation degree, phase boundary, etc.49 These features can be directly adopted or further modified to generate polymeric multiscale descriptors for training graph-based prediction models. Furthermore, processing parameters play a crucial role in determining the aggregation structures and ultimately affecting the properties of polymeric materials. Processing parameters such as temperature, pressure, and shear rate significantly impact the crystalline, orientation, and phase structures of polymers during processing. These parameters can be adopted or modified to generate processing descriptors and then serve as the inputs for training models. When these effects are considered during the modeling process, the accuracy of ML models can be further improved.",
        "category": " Results and discussion"
    },
    {
        "id": 12,
        "chunk": "# 5.2. Developing Advanced ML Models To Describe Polymer Structure−Property Relationships  \n\nFor example, ML modeling methods, such as transfer learning and multitask learning, can be adapted and optimized to establish high-precision prediction models, even in cases with limited amounts of reliable data.50 Moreover, the interpretable ML methods can provide insight and guidance for designing polymers and advancing polymer theory. It can give a transparent view into the decision-making process, enabling researchers to understand the underlying relationships between input features and output predictions. On one hand, by using interpretable models or feature importance analysis methods (e.g., SHapley Additive exPlanations, SHAP),51 we can identify the most influential features and correlations that govern polymer properties. On the other hand, by employing the interpretable ML models in combination with genetic algorithms and symbolic regression,35,52 quantitative equations and even new theoretical frameworks for structure−property relationships can be obtained. For example, the sure independence screening and sparsifying operation (SISSO) is a data-driven symbolic regression method. Different from other ML predictive methods, SISSO is a white-box model that can be described as a formula, allowing researchers to analyze and explain the field knowledge. Thus, we believe that such interpretable ML models can promote advancements in polymer theory.",
        "category": " Results and discussion"
    },
    {
        "id": 13,
        "chunk": "# 5.3. Establishing Polymer Large Models Based on Chemical Language  \n\nDrawing inspiration from natural language processing methods, the polymer informatics field is leveraging chemical language to establish large models for polymer structure generation and property prediction.32,53−56 A large model refers to a model with extensive and intricate parameters, such as the large model of Chat Generative Pretrained Transformer (ChatGPT). Several established chemical language models possess the ability to comprehend chemical rules and generate novel chemical substances. Recently, there have been some advancements in polymer chemical language models, such as TransPolymer,53 PolyBERT,54 and PolyNC,56 which aim to address the data and modeling challenges of polymers and predict their various properties in a vast structural space. Polymer large models trained through data parallelism can effectively integrate and utilize the data from different sources and modalities. Moreover, they can also be established by integrating a series of well-trained prediction models through pipeline parallelism training processing. Thus, they can give out the commonality of the structure−property relationships for different properties and inspire researchers to identify important factors influencing the polymer properties and to establish a prediction model for some specific properties. Therefore, the upcoming polymer large models possess immense potential to comprehensively accelerate the development of polymeric materials and revolutionize traditional design methodologies.  \n\nFurthermore, it is coveted to construct high-throughput synthesis and characterization for polymeric materials.57−59 The high-throughput experimental systems can generate highquality data encompassing diverse structures, components, and processing parameters. By combining with ML-assisted design methods, the efficiency of the design and optimization processes for polymeric materials can be further improved. For example, Brabec and Li et al. established a high-throughput experimental platform named AMANDA Line One and realized the automatic fabrication and characterization of organic solar cells.58,59 This platform enables the screening of over 100 processing parameter variations in terms of efficiency and photostability within $^{70\\mathrm{~h~}}$ . Combined with ML methods, they achieved the prediction for both the efficiency and the photostability of OPV materials based on the high-quality experimental data set. However, for polymeric materials, the development of high-throughput experimental systems is still in the nascent stage. The existing relevant equipment is typically adopted from parallel synthesizers but only partially satisfies the requirements for achieving truly high-throughput synthesis and characterization capabilities of polymers. Therefore, developing high-throughput experimental platforms for polymers is anticipated by experimentalists.  \n\nThe integration of ML methods with computer simulations is emerging as an important direction in computational materials science, offering significant enhancements in the accuracy and efficiency of computer simulations. ML algorithms can analyze simulation outputs, identify hidden correlations, and extract meaningful insights. Moreover, by leveraging ML algorithms for feature selection, model optimization, and data preprocessing, researchers can streamline the simulation workflow and reduce computational costs.60 The integration of ML methods with computer simulations can facilitate the development of surrogate models or emulators that can mimic complex computer simulation codes with lower computational costs.  \n\nOverall, the ML-assisted design strategy offers a new methodology for discovering high-performance polymeric materials that are needed for applications in advanced equipment for aerospace, energy storage, communication, and other fields. For example, regarding organic semiconductors, there is a demand for polymeric materials with high electron mobility, high luminescence efficiency, high spin characteristics, and high conductivity. By employing the MLassisted design strategy, it is promising to discover novel polymeric semiconductor materials with good comprehensive properties in a shorter time frame and with reduced costs. Undoubtedly, the emerging paradigm driven by ML and big data can revolutionize the traditional trial-and-error design approach of polymeric materials. As summarized in this Account, state-of-the-art research and achievements have demonstrated both the high efficiency and the advanced nature of this paradigm. Nevertheless, this field is still in its infancy. In future endeavors, developing new methods for data processing, structure representation, ML model establishment, as well as virtual screening is highly desirable for realizing the intelligent design of advanced polymeric materials.",
        "category": " Results and discussion"
    },
    {
        "id": 14,
        "chunk": "# AUTHOR INFORMATION",
        "category": " References"
    },
    {
        "id": 15,
        "chunk": "# Corresponding Author  \n\nJiaping Lin − Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China; $\\circledcirc$ orcid.org/ 0000-0001-9633-4483; Email: jlin@ecust.edu.cn",
        "category": " References"
    },
    {
        "id": 16,
        "chunk": "# Authors  \n\nLiang Gao − Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China; $\\textcircled{1}$ orcid.org/ 0000-0001-6852-8301   \nLiquan Wang − Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China; $\\textcircled{6}$ orcid.org/ 0000-0002-5141-8584   \nLei Du − Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China  \n\nComplete contact information is available at: https://pubs.acs.org/10.1021/accountsmr.3c00288",
        "category": " References"
    },
    {
        "id": 17,
        "chunk": "# Notes  \n\nThe authors declare no competing financial interest.",
        "category": " Conclusions"
    },
    {
        "id": 18,
        "chunk": "# Biographies  \n\nLiang Gao received his Bachelor’s (2013) and Ph.D. (2019) degrees under the supervision of Jiaping Lin in Materials Science and Engineering from East China University of Science and Technology (ECUST). Now, he is working as an associate professor at ECUST. His research is focused on theoretical simulation of polymers and polymer material genome approach.  \n\nJiaping Lin received his Bachelor’s and Master’s degrees from Shanghai Jiao Tong University and Ph.D. degree from ECUST (1993). After that, he obtained a postdoctoral fellowship from the Japan Society for the Promotion of Science (JSPS), received a LiseMeitner fellowship from Fonds zur Förderung der Wissenschaftlichen Forschung (FWF), and worked as a postdoctoral researcher in the Tokyo Institute of Technology and University of Linz in Austria. He has been in ECUST ever since returning home from abroad in 1997. He became a full professor in 1999. His current research interests include polymer theory and simulation, polymer material genome approach, and polymer self-assembly.  \n\nLiquan Wang received his Ph.D. degree under the supervision of Jiaping Lin in Materials Science and Engineering from ECUST in 2011. Now he is working in the School of Materials Science and Engineering of ECUST. From 2016 to 2017 he worked as a visiting researcher scholar with Zhen-Gang Wang at the California Institute of Technology. His research interests center on theoretical simulation of complex polymer systems and polymer material genome approach.  \n\nLei Du is a professor at the School of Materials Science and Engineering of ECUST. He received his Bachelor’s degree from Nanjing University (1965). From 1985 to 1987, he worked as a visiting researcher scholar in the Polymer Department of Pennsylvania State University. His current research interests include highperformance resin and composites and polymer material genome approach.",
        "category": " References"
    },
    {
        "id": 19,
        "chunk": "# ACKNOWLEDGMENTS  \n\nThis work was supported by the National Natural Science Foundation of China (22103025, 52394271, 51833003, and 21975073) and the Shanghai Chenguang Program (22CGA31).",
        "category": " References"
    },
    {
        "id": 20,
        "chunk": "# REFERENCES  \n\n(1) Wang, S.; Hu, Y.; Kouznetsova, T. B.; Sapir, L.; Chen, D.; Herzog-Arbeitman, A.; Johnson, J. A.; Rubinstein, M.; Craig, S. L. Facile mechanochemical cycloreversion of polymer cross-linkers enhances tear resistance. Science 2023, 380, 1248−1252.   \n(2) Zhu, L.; Zhang, M.; Xu, J.; Li, C.; Yan, J.; Zhou, G.; Zhong, W.; Hao, T.; Song, J.; Xue, X.; Zhou, Z.; Zeng, R.; Zhu, H.; Chen, C.-C.; MacKenzie, R. C. I.; Zou, Y.; Nelson, J.; Zhang, Y.; Sun, Y.; Liu, F. Single-junction organic solar cells with over $19\\%$ efficiency enabled by a refined double-fibril network morphology. Nat. Mater. 2022, 21, 656−663.   \n(3) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, No. 011002. (4) Agrawal, A.; Choudhary, A. Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science. APL Mater. 2016, 4, No. 053208.   \n(5) Gao, L.; Wang, L.; Lin, J.; Du, L. An intelligent manufacturing platform of polymers: Polymeric material genome engineering. Engineering 2023, 27, 31−36.   \n(6) Du, S.; Zhang, S.; Wang, L.; Lin, J. Polymer genome approach: A new method for research and development of polymers. Acta Polym. Sin. 2022, 53, 592−607.   \n(7) Yuan, W.-L.; He, L.; Tao, G.-H.; Shreeve, J. M. Materialsgenome approach to energetic materials. Acc. Mater. Res. 2021, 2, 692−696.   \n(8) Luo, J.; Wang, X.; Li, S.; Liu, J.; Guo, Y.; Niu, G.; Yao, L.; Fu, Y.; Gao, L.; Dong, $\\mathrm{Q.;}$ Zhao, C.; Leng, M.; Ma, F.; Liang, W.; Wang, L.; Jin, S.; Han, J.; Zhang, L.; Etheridge, J.; Wang, J.; Yan, Y.; Sargent, E. H.; Tang, J. Efficient and stable emission of warm-white light from lead-free halide double perovskites. Nature 2018, 563, 541−545. (9) Xu, J.; Ma, Y.; Hu, W.; Rehahn, M.; Reiter, G. Cloning polymer single crystals through self-seeding. Nat. Mater. 2009, 8, 348−353. (10) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533, 73−76.   \n(11) Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361, 360−365.   \n(12) Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; Kohl, S. A. A.; Ballard, A. J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A. W.; Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583−589.   \n(13) Xu, Y.; Liu, X.; Cao, X.; Huang, C.; Liu, E.; Qian, S.; Liu, X.; Wu, Y.; Dong, F.; Qiu, C.-W.; Qiu, J.; Hua, K.; Su, W.; Wu, J.; Xu, H.; Han, Y.; Fu, C.; Yin, Z.; Liu, M.; Roepman, R.; Dietmann, S.; Virta, M.; Kengara, F.; Zhang, Z.; Zhang, L.; Zhao, T.; Dai, J.; Yang, J.; Lan, L.; Luo, M.; Liu, Z.; An, T.; Zhang, B.; He, X.; Cong, S.; Liu, X.; Zhang, W.; Lewis, J. P.; Tiedje, J. M.; Wang, $\\mathrm{Q.;}$ An, Z.; Wang, F.; Zhang, L.; Huang, T.; Lu, C.; Cai, Z.; Wang, F.; Zhang, J. Artificial intelligence: A powerful paradigm for scientific research. Innovation 2021, 2, No. 100179.   \n(14) Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 2018, 4, 1465−1476.   \n(15) Mannodi-Kanakkithodi, A.; Chandrasekaran, A.; Kim, C.; Huan, T. D.; Pilania, G.; Botu, V.; Ramprasad, R. Scoping the polymer genome: A roadmap for rational polymer dielectrics design and beyond. Mater. Today 2018, 21, 785−796.   \n(16) Wainberg, M.; Merico, D.; Delong, A.; Frey, B. J. Deep learning in biomedicine. Nat. Biotechnol. 2018, 36, 829−838.   \n(17) Warmuth, M. K.; Liao, J.; Rätsch, G.; Mathieson, M.; Putta, S.; Lemmen, C. Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 2003, 43, 667−673. (18) Geyer, R.; Jambeck, J. R.; Law, K. L. Production, use, and fate of all plastics ever made. Sci. Adv. 2017, 3, No. e1700782.   \n(19) Audus, D. J.; de Pablo, J. J. Polymer informatics: Opportunities and challenges. ACS Macro Lett. 2017, 6, 1078−1082.   \n(20) Chen, L.; Pilania, G.; Batra, R.; Huan, T. D.; Kim, C.; Kuenneth, C.; Ramprasad, R. Polymer informatics: Current status and critical next steps. Mater. Sci. Eng. R 2021, 144, No. 100595.   \n(21) Ding, F.; Liu, L.-Y.; Liu, T.-L.; Li, Y.-Q.; Li, J.-P.; Sun, Z.-Y. Predicting the mechanical properties of polyurethane elastomers using machine learning. Chin. J. Polym. Sci. 2023, 41, 422−431.   \n(22) Xu, P.; Chen, H.; Li, M.; Lu, W. New opportunity: Machine learning for polymer materials design and discovery. Adv. Theory Simul. 2022, 5, No. 2100565.   \n(23) Axelrod, S.; Schwalbe-Koda, D.; Mohapatra, S.; Damewood, J.; Greenman, K. P.; Gómez-Bombarelli, R. Learning matter: Materials design with machine learning and atomistic simulations. Acc. Mater. Res. 2022, 3, 343−357.   \n(24) Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31−36.   \n(25) Xu, X.; Zhao, W.; Hu, Y.; Wang, L.; Lin, J.; Qi, H.; Du, L. Discovery of thermosetting polymers with low hygroscopicity, low thermal expansivity, and high modulus by machine learning. J. Mater. Chem. A 2023, 11, 12918−12927.   \n(26) Hu, Y.; Zhao, W.; Wang, L.; Lin, J.; Du, L. Machine-learningassisted design of highly tough thermosetting polymers. ACS Appl. Mater. Interfaces 2022, 14, 55004−55016.   \n(27) Antoniuk, E. R.; Li, P.; Kailkhura, B.; Hiszpanski, A. M. Representing polymers as periodic graphs with learned descriptors for accurate polymer property predictions. J. Chem. Inf. Model. 2022, 62, 5435−5445.   \n(28) Lin, T. S.; Coley, C. W.; Mochigase, H.; Beech, H. K.; Wang, W.; Wang, Z.; Woods, E.; Craig, S. L.; Johnson, J. A.; Kalow, J. A.; Jensen, K. F.; Olsen, B. D. BigSMILES: A structurally-based line notation for describing macromolecules. ACS Cent. Sci. 2019, 5, 1523−1531.   \n(29) Chen, L.; Kim, C.; Batra, R.; Lightstone, J. P.; Wu, C.; Li, Z.; Deshmukh, A. A.; Wang, Y.; Tran, H. D.; Vashishta, P.; Sotzing, G. A.; Cao, Y.; Ramprasad, R. Frequency-dependent dielectric constant prediction of polymers using machine learning. npj Comput. Mater. 2020, 6, 61 (30) Li, Z.; Ma, X.; Xin, H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal. Today 2017, 280, 232−238.   \n(31) Zhou, T.; Song, Z.; Sundmacher, K. Big data creates new opportunities for materials research: A review on methods and applications of machine learning for materials design. Engineering 2019, 5, 1017−1026.   \n(32) Shetty, P.; Ramprasad, R. Machine-guided polymer knowledge extraction using natural language processing: The example of named entity normalization. J. Chem. Inf. Model. 2021, 61, 5377−5385. (33) Dou, B.; Zhu, Z.; Merkurjev, E.; Ke, L.; Chen, L.; Jiang, J.; Zhu, Y.; Liu, J.; Zhang, B.; Wei, G.-W. Machine learning methods for small data challenges in molecular science. Chem. Rev. 2023, 123, 8736− 8780.   \n(34) Kuenneth, C.; Schertzer, W.; Ramprasad, R. Copolymer informatics with multitask deep neural networks. Macromolecules 2021, 54, 5957−5961.   \n(35) Oviedo, F.; Ferres, J. L.; Buonassisi, T.; Butler, K. T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 2022, 3, 597−607.   \n(36) Raissi, M.; Perdikaris, P.; Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686−707.   \n(37) Yong, W.; Zhang, H.; Fu, H.; Zhu, Y.; He, J.; Xie, J. Improving prediction accuracy of high-performance materials via modified machine learning strategy. Comput. Mater. Sci. 2022, 204, No. 111181. (38) Zhang, S.; Du, S.; Wang, L.; Lin, J.; Du, L.; Xu, X.; Gao, L. Design of silicon-containing arylacetylene resins aided by machine learning enhanced materials genome approach. Chem. Eng. J. 2022, 448, No. 137643.   \n(39) Zhu, J.; Chu, M.; Chen, Z.; Wang, L.; Lin, J.; Du, L. Rational design of heat-resistant polymers with low curing energies by a materials genome approach. Chem. Mater. 2020, 32, 4527−4535. (40) Mannodi-Kanakkithodi, A.; Pilania, G.; Ramprasad, R.; Lookman, T.; Gubernatis, J. E. Multi-objective optimization techniques to design the Pareto front of organic dielectric polymers. Comput. Mater. Sci. 2016, 125, 92−99.   \n(41) Tao, L.; He, J.; Munyaneza, N. E.; Varshney, V.; Chen, W.; Liu, G.; Li, Y. Discovery of multi-functional polyimides through highthroughput screening using explainable machine learning. Chem. Eng. J. 2023, 465, No. 142949.   \n(42) Qiu, H.; Qiu, X.; Dai, X.; Sun, Z.-Y. Design of polyimides with targeted glass transition temperature using a graph neural network. J. Mater. Chem. C 2023, 11, 2930−2940.   \n(43) Dean, J. M.; Verghese, N. E.; Pham, H. $\\mathrm{Q.;}$ Bates, F. S. Nanostructure toughened epoxy resins. Macromolecules 2003, 36, 9267−9270.   \n(44) Xu, X.; Zhao, W.; Wang, L.; Lin, J.; Du, L. Efficient exploration of compositional space for high-performance copolymers via Bayesian optimization. Chem. Sci. 2023, 14, 10203−10211.   \n(45) van de Schoot, R.; Depaoli, S.; King, R.; Kramer, B.; Märtens, K.; Tadesse, M. G.; Vannucci, M.; Gelman, A.; Veen, D.; Willemsen, J.; Yau, C. Bayesian statistics and modelling. Nat. Rev. Methods Primers 2021, 1, 1.   \n(46) Zhao, S.; Cai, T.; Zhang, L.; Li, W.; Lin, J. Autonomous construction of phase diagrams of block copolymers by theoryassisted active machine learning. ACS Macro Lett. 2021, 10, 598−602. (47) Gurnani, R.; Kamal, D.; Tran, H.; Sahu, H.; Scharm, K.; Ashraf, U.; Ramprasad, R. polyG2G: A novel machine learning algorithm applied to the generative design of polymer dielectrics. Chem. Mater. 2021, 33, 7008−7016.   \n(48) Buehler, M. J. Multiscale modeling at the interface of molecular mechanics and natural language through attention neural networks. Acc. Chem. Res. 2022, 55, 3387−3403.   \n(49) Vargo, E.; Dahl, J. C.; Evans, K. M.; Khan, T.; Alivisatos, P.; Xu, T. Using machine learning to predict and understand complex selfassembly behaviors of a multicomponent nanocomposite. Adv. Mater.  \n\n(50) Gurnani, R.; Kuenneth, C.; Toland, A.; Ramprasad, R. Polymer informatics at scale with multitask graph neural networks. Chem. Mater. 2023, 35, 1560−1567. (51) Lundberg, S. M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 110. (52) Xu, Y.; Qian, $\\mathsf{Q}.$ i-SISSO: Mutual information-based improved sure independent screening and sparsifying operator algorithm. Eng. Appl. Artif. Intell. 2022, 116, No. 105442. (53) Xu, C.; Wang, Y.; Barati Farimani, A. TransPolymer: a transformer-based language model for polymer property predictions. npj Comput. Mater. 2023, 9, 64. (54) Kuenneth, C.; Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 2023, 14, 4099. (55) Wu, S.; Kondo, Y.; Kakimoto, M.-a.; Yang, B.; Yamada, H.; Kuwajima, I.; Lambard, G.; Hongo, K.; Xu, Y.; Shiomi, J.; Schick, C.; Morikawa, J.; Yoshida, R. Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm. npj Comput. Mater. 2019, 5, 5. (56) Qiu, H.; Liu, L.; Qiu, X.; Dai, X.; Ji, X.; Sun, Z.-Y. PolyNC: A natural and chemical language model for the prediction of unified polymer properties. Chem. Sci. 2024, 15, 534−544. (57) Liu, Y.; Hu, Z.; Suo, Z.; Hu, L.; Feng, L.; Gong, X.; Liu, Y.; Zhang, J. High-throughput experiments facilitate materials innovation: A review. Sci. China Technol. Sci. 2019, 62, 521−545. (58) Du, X.; Lüer, L.; Heumueller, T.; Wagner, J.; Berger, C.; Osterrieder, T.; Wortmann, J.; Langner, S.; Vongsaysy, U.; Bertrand, M.; Li, N.; Stubhan, T.; Hauch, J.; Brabec, C. J. Elucidating the full potential of OPV materials utilizing a high-throughput robot-based platform and machine learning. Joule 2021, 5, 495−506. (59) Wang, R.; Lüer, L.; Langner, S.; Heumueller, T.; Forberich, K.; Zhang, H.; Hauch, J.; Li, N.; Brabec, C. J. Understanding the microstructure formation of polymer films by spontaneous solution spreading coating with a high-throughput engineering platform. ChemSusChem 2021, 14, 3590−3598. (60) Zhao, G.; Xu, T.; Fu, X.; Zhao, W.; Wang, L.; Lin, J.; Hu, Y.; Du, L. Machine-learning-assisted multiscale modeling strategy for predicting mechanical properties of carbon fiber reinforced polymers. Compos. Sci. Technol. 2024, 248, No. 110455.",
        "category": " References"
    }
]