wl-hydrophilic-polymer/task2/task2-chunks/s41529-024-00427-z.json

[
    {
        "id": 1,
        "chunk": "# ARTICLE OPEN Machine learning assisted discovery of high-efficiency self-healing epoxy coating for corrosion protection  \n\nTong Liu1,2,3, Zhuoyao Chen1,2, Jingzhi Yang1,2, Lingwei $\\mathsf{M a}^{1,2,4}$ , Arjan Mol $\\textcircled{1}^{5}$ and Dawei Zhang $\\textcircled{10}^{1,2,4\\boxtimes}$  \n\nMachine learning is a powerful means for the rapid development of high-performance functional materials. In this study, we presented a machine learning workflow for predicting the corrosion resistance of a self-healing epoxy coating containing $Z|F-8@C{\\mathsf{a}}$ microfillers. The orthogonal Latin square method was used to investigate the effects of the molecular weight of the polyetheramine curing agent, molar ratio of polyetheramine to epoxy, molar content of the hydrogen bond unit (UPy-D400), and mass content of the solid microfillers $(\\boldsymbol{Z}|\\mathsf{F}-8@\\mathsf{C}\\mathsf{a}$ microfillers) on the low impedance modulus $(\\log\\vert Z\\vert_{0.01\\mapsto1z})$ values of the scratched coatings, generating 32 initial datasets. The machine learning workflow was divided into two stages: In stage I, five models were compared and the random forest (RF) model was selected for the active learning. After 5 cycles of active learning, the RF model achieved good prediction accuracy: coefficient of determination $(R^{2})=0.709$ , mean absolute percentage error $(\\mathsf{M A P E})=0.081$ , root mean square error $(\\mathsf{R M S E})=0.685\\ (\\mathsf{l g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ . In stage II, the best coating formulation was identified by Bayesian optimization. Finally, the electrochemical impedance spectroscopy (EIS) results showed that compared with the intact coating $((4.63\\pm2.08)\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2})_{i}$ , the $|Z|_{0.01\\mathsf{H z}}$ value of the repaired coating was as high as $(4.40\\pm2.04)\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2}$ . Besides, the repaired coating showed minimal corrosion and $3.3\\%$ of adhesion loss after 60 days of neutral salt spray testing.  \n\nnpj Materials Degradation  (2024) 8:11 ; https://doi.org/10.1038/s41529-024-00427-z",
        "category": " Results and discussion"
    },
    {
        "id": 2,
        "chunk": "# INTRODUCTION  \n\nEpoxy (EP) resin is widely used in the field of corrosion protection because of strong adhesion properties, high corrosion resistance, excellent mechanical properties and low cost. However, cracks may arise inside or at the surface of the EP matrix during longterm service and reduce its corrosion protection performance with time, thus increasing potential safety hazards during its service life1. The application of self-healing coatings will be the most common and cost-effective method of improving the corrosion protection and thus the durability of metallic structures. A wide range of engineering structures from vehicles to aircrafts, from factories to house-hold equipment can be effectively protected via the self-healing coating systems. Recent efforts have focused on improving the durability of EP coatings in the presence of damage by granting them self-healing functions, which can be realized through intrinsic repair of the material matrix by reversible covalent bonds2 and noncovalent bonds3, or via extrinsic strategies depending on the release of healing agents4 and corrosion inhibitors5 into coating defects. In contrast to these extrinsic self-healing mechanisms, the intrinsic one endows the coating with the ability to simulate natural systems and repeated repairability. Such mechanisms are typically based on reversible covalent bonds via disulfide bonds6, Diels–Alder reactions7, and hydrazone bonds8, or non-covalent interactions via metal-ligand9 and hydrogen bonding10–12. Among these mechanisms, the most promising one is based on dynamic hydrogen bonds because of their high reversibility and mild repair conditions, in combination with their directional and tunable self-association properties13. As an indication of the self-healing ability of the coating, the lowfrequency impedance modulus, such as according to the electrochemical impedance spectroscopy (EIS) data measured at  \n\n$0.01\\mathsf{H z}\\ (|Z|_{0.01\\mathsf{H z}}),$ were extensively used to estimate the overall corrosion resistance of the test area14,15. A higher $|Z|_{0.01\\mathsf{H z}}$ value represents a higher barrier ability of the coating. Based on the previous studies16, in our view the design of an ideal self-healing corrosion protective coating should have the following main index: (1) The $|Z|_{0.01\\mathsf{H z}}$ value of the self-healed coating is nearly close to that of the intact coating; (2) excellent barrier ability, $|Z|_{0.01\\mathsf{H z}}$ value more than $10^{10}\\ \\Omega{\\cdot}\\mathsf{c m}^{2},$ ; (3) long-term stability in corrosive environments both before and after repair. For example, in a previous work by our group11, an intrinsic self-healing EP coating was developed by grafting 2-ureido-4[1H]-pyrimidinone (UPy) as a quadruple hydrogen bonding unit onto the backbones of an EP-matrix. The UPy/EP coating demonstrated high-efficient self-healing functionality within 5 min in $3.5\\ \\mathsf{w t}.\\%$ NaCl solution. The self-healed coating still had high $|Z|_{0.01\\mathsf{H z}}$ value of $4.8\\times10^{10}$ $\\scriptstyle\\Omega\\cdot\\mathsf{c m}^{2}$ even after 60 days of immersion in NaCl solution.  \n\nOften, the achievement of the target performance of selfhealing implies synergy between multiple components of the EP coating formulation, including different resins, curing agents, liquid/solid additives, etc. The conventional trial-and-error design strategy for coating formulation is time-consuming and laborintensive. Recently, machine learning methods have show to represent a promising option for materials design and optimization, especially for systems with complex properties or compositions17–21. For example, Haik et al.22 developed a machine learning model to predict the stress relaxation properties of EP matrix composites, based on a three-layer neural network model using initial stress, test temperature and operating time as input variables and stress relaxation behavior as output. The final model was obtained by training 9000 experimental data samples. This model can predict efficiently the time-dependent mechanical behavior of a viscoelastic or a viscoplastic material. Kan et al.23 constructed a molecular recognition model for predicting 2000 molecular descriptors from chemical structures using a gated graph neural network, and extracted 32-dimensional vectors representing 2000 molecular descriptors through the molecular recognition model to complete the dimension reduction. This 32- dimensional vector was used as the input value for the next Gaussian regression, and the machine learning model for predicting electrical conductivity was finally built by training a large amount of data. Typically, the establishment of an accurate machine learning requires vast training data, which is difficult to be obtained for polymer resin formation considering the heavy experimental workload in the synthesis and characterization24,25. Therefore, the construction of small sample datasets in the machine learning aspect of the research method has major implications for polymer design.  \n\n![](images/b30c9e8b421045d32d2843116104e013c2e0c650008ac7f46fc662d4cbd6c31c.jpg)  \nFig. 1 A machine learning workflow for performance optimization in self-healing EP composite coating. Four steps are involved in machine learning workflow, from a data acquisition, b active learning, c Bayesian optimization, and d experimental verification.  \n\nThe problem of machine learning under small sample data conditions $_{<1000}$ samples) has received much attention in recent years26,27. For the processing of small sample data, the most common methods are the neural-network-based methods28, hierarchical machine learning29, active-learning-based method30 and so on. For instance, Li et al.31 proposed a model combined with nearest neighbor interpolation (NNI), synthetic minority oversampling technique (SMOTE) and extreme gradient boosting (XGBoost) models to predict the abrasion of rubber composites with small samples. NNI and SMOTE are two classical models in image processing that aim at increasing the sample size and solving the problem of sample unevenness. Combining these two models, the original dataset was expanded from 23 to 710 samples. Finally, the abrasion was predicted by the XGBoost model to yield a better prediction accuracy $(\\mathsf{M S E}=0.001$ ). Similarly, active learning has been applied to discover EP adhesive strength30, polymer molecular dynamics32, high- $\\cdot\\tau_{g}$ polymers33,34 and among others from the small initial datasets.  \n\nHerein, we employed a machine learning framework to develop self-healing composite coatings for corrosion protection applications. A flowchart of the machine learning workflow is shown in Fig. 1. In the machine learning framework, active learning and Bayesian optimization to model and maximize the common logarithm of the low-frequency impedance modulus $(\\log\\lvert Z\\rvert_{0.01\\mapsto l z})$ obtained from EIS measurements for various scratched selfhealing EP composite coatings to improve its self-healing property. This coating formulation consists of an EP resin, polyetheramines, amino-terminated urea-pyrimidinone monomers (UPy-D400) and $Z|F-8@C{\\mathsf{a}}$ microfillers. The EP resin mixed with polyetheramine can react to form an EP-based polymer, and the UPy-D400 acts as a quadruple hydrogen bonding unit that can be grafted into the EP network to provide a self-healing function for the EP polymer via the self-association process; The ZIF- $\\boldsymbol{\\cdot}8@{\\mathsf{C}}\\mathsf{a}$ microfiller, which is an empty ${\\mathsf{C a C O}}_{3}$ carbonate microcontainer with ZIF-8 nanoparticles assembled on the surface, is incorporated as a model filler that can not only enhance the barrier property of EP coating, but also present a pH-sensitive response to release loaded substance (e.g., inhibitors) to achieve useful functions. For the machine learning process, four-parameter variables, molecular weights of polyetheramine, the molar ratio of polyetheramine to EP, UPy-D400 content, and $Z|F-8@C{\\mathsf{a}}$ content, were used as input, and the $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value of the scratched coatings was used as output; 32 initial dataset were obtained from the preliminary experiment. Among the five common models, the model with the best accuracy was selected, and trained to achieve the best accuracy by active learning. Subsequently, the Bayesian optimization method was used to search for the scratched self-healing EP composite coating with an extremely high $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value. Finally, the self-healing and corrosion protective properties of the optimal coating were verified by EIS and salt spray testing.  \n\nTable 1. Summary of variable parameters for coating formulation used at the active learning stage.   \n\n\n<html><body><table><tr><td rowspan=\"2\">Serial number</td><td colspan=\"3\">Variable parameter</td></tr><tr><td>MWc r (g·mol-1)</td><td>UPy-D400 content (mol%)</td><td>ZIF-8/Ca content (wt.%)</td></tr><tr><td>１ 2</td><td>230 0.55 400 0.70</td><td>5 10</td><td>5.5 7.0</td></tr><tr><td>3 4</td><td>2000 0.85 4000 1.00 20</td><td>15</td><td>8.5 10.0</td></tr><tr><td colspan=\"4\">Variable parameters include the molecular weight of polyetheramine curing agent, molar ratio of polyetheramine to EP (r), molar content of UPy- D400 and mass content of ZiF-8/Ca microfillers.</td></tr></table></body></html>",
        "category": " Introduction"
    },
    {
        "id": 3,
        "chunk": "# RESULTS AND DISCUSSION",
        "category": " Results and discussion"
    },
    {
        "id": 4,
        "chunk": "# Experimental results from the initial dataset  \n\nAs seen in Table 1, four parameters with four initial condition levels were set (total experimental conditions $=4^{4}=256$ sets).  \n\nTable 2. Experimental results of $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values of scratched   \n\n\n<html><body><table><tr><td colspan=\"6\">coatings prepared under various conditions (32 initial dataset), the Ig|Zlo.01Hz values represent the average ± standard deviations.</td></tr><tr><td rowspan=\"2\">Serial number</td><td colspan=\"5\">Variable parameter</td></tr><tr><td>MWc (g·mol-1)</td><td>r</td><td>UPy-D400 content (mol%)</td><td>ZIF-8/Ca content (wt.%)</td><td>Measured Ig((Z//Ω2·cm²)</td></tr><tr><td>1</td><td>230</td><td>0.55 5</td><td></td><td>5.5</td><td>4.89 ± 0.72</td></tr><tr><td>２</td><td>230</td><td>0.70 10</td><td></td><td>8.5</td><td>5.12 ± 0.69</td></tr><tr><td>3</td><td>230</td><td>0.85 15</td><td></td><td>10.0</td><td>6.06 ± 0.22</td></tr><tr><td>4</td><td>230</td><td>1.00</td><td>20</td><td>7.0</td><td>8.91 ± 0.76</td></tr><tr><td>5</td><td>400</td><td>0.55</td><td>10</td><td>7.0</td><td>4.75 ± 0.83</td></tr><tr><td>６</td><td>400</td><td>0.70 5</td><td></td><td>10.0</td><td>5.39 ± 0.45</td></tr><tr><td>7</td><td>400</td><td>0.85 20</td><td></td><td>8.5</td><td>10.08 ± 0.72</td></tr><tr><td>8</td><td>400</td><td>1.00</td><td>15</td><td>5.5</td><td>10.55 ± 0.52</td></tr><tr><td>９</td><td>2000</td><td>0.55</td><td>15</td><td>8.5</td><td>8.35 ± 0.41</td></tr><tr><td>10</td><td>2000</td><td>0.70</td><td>20</td><td>5.5</td><td>10.05 ± 0.76</td></tr><tr><td>11</td><td>2000</td><td>0.85 5</td><td></td><td>7.0</td><td>9.12 ± 0.69</td></tr><tr><td>12</td><td>2000</td><td>1.00</td><td>10</td><td>10.0</td><td>7.23 ± 0.82</td></tr><tr><td>13</td><td>4000</td><td>0.55</td><td>20</td><td>10.0</td><td>8.94± 0.70</td></tr><tr><td>14</td><td>4000</td><td>0.70</td><td>15</td><td>7.0</td><td>8.04± 0.59</td></tr><tr><td>15</td><td>4000</td><td>0.85</td><td>10</td><td>5.5</td><td>8.43 ± 0.28</td></tr><tr><td>16</td><td>4000</td><td>1.00 5</td><td></td><td>8.5</td><td>6.44± 0.65</td></tr><tr><td>17</td><td>230</td><td>0.55</td><td>20</td><td>7.0</td><td>4.88 ± 0.70</td></tr><tr><td>18</td><td>230</td><td>0.70 5</td><td></td><td>5.5</td><td>4.93 ± 0.63</td></tr><tr><td>19</td><td>230</td><td>0.85</td><td>10</td><td>8.5</td><td>5.59 ± 0.69</td></tr><tr><td>20</td><td>230</td><td>1.00</td><td>15</td><td>10.0</td><td>7.97 ± 0.70</td></tr><tr><td>21</td><td>400</td><td>0.55</td><td>15</td><td>5.5</td><td>5.01 ± 0.81</td></tr><tr><td>22</td><td>400</td><td>0.70</td><td>10</td><td>7.0</td><td>7.31± 0.42</td></tr><tr><td>23</td><td>400</td><td>0.85 5</td><td></td><td>10.0</td><td>8.12 ± 0.62</td></tr><tr><td>24</td><td>400</td><td>1.00</td><td>20</td><td>8.5</td><td>10.87 ± 0.80</td></tr><tr><td>25</td><td>2000</td><td>0.55</td><td>10</td><td>10.0</td><td>6.14±0.75</td></tr><tr><td>26</td><td>2000</td><td>0.70</td><td>15</td><td>8.5</td><td>9.29 ± 0.62</td></tr><tr><td>27</td><td>2000</td><td>0.85</td><td>20</td><td>5.5</td><td>8.98 ± 0.74</td></tr><tr><td>28</td><td>2000</td><td>1.00 5</td><td></td><td>7.0</td><td>6.92 ± 0.70</td></tr><tr><td>29</td><td>4000</td><td>0.55</td><td>5</td><td>8.5</td><td>6.93 ± 0.62</td></tr><tr><td>30</td><td>4000</td><td>0.70</td><td>20</td><td>10.0</td><td>8.35± 0.52</td></tr><tr><td>31</td><td>4000</td><td>0.85</td><td>15</td><td>7.0</td><td>9.15 ± 0.66</td></tr><tr><td>32</td><td>4000</td><td>1.00</td><td>10</td><td>5.5</td><td>6.95 ± 0.79</td></tr></table></body></html>  \n\nFour parameter variables included the molecular weight of polyetheramine, molar ratio of polyetheramine to EP, the molar content of UPy-D400, and mass content of the $Z|F-8@C{\\mathsf{a}}$ microfillers. An initial 32 sets of experimental conditions were extracted from the 256 sets by orthogonal Latin square design method35. This is a method based on mathematical statistics and the orthogonality principle, which can achieve the equivalent results of a large number of comprehensive tests with the minimum number of tests. It selects a part of points which can represent the whole experiment according to the orthogonality of the experiments. And these selected points are uniformly distributed in the whole space36,37. Then, the coatings were prepared for EIS measurements according to these 32 conditions, the corresponding the low impedance modulus $(\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value) of different scratched coatings was obtained. The reason for selecting $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value as the output instead of using $|Z|_{0.01\\mathsf{H z}}$ value is to eliminate the undesirable effects caused by sample dataset with high variability.  \n\n![](images/dd6acf985492f5cdd9ce8b98401dd7d9683c552ef2ac68a73ab35ac1f3275469.jpg)  \nFig. 2 Distribution of $\\mathbf{\\|\\bigcirc\\|}Z\\mathbf{\\|_{0.01\\:\\mathsf{Hz}}}$ experimental values from the 32 initial dataset. This task aims to confirm the distribution of target property values under initial experimental conditions.  \n\nMeasurements of $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ experimental values of scratched coatings that comprise our initial dataset are reported in Table 2. Figure 2 shows the distribution of $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ experimental values. As shown in Fig. 2, the average $\\mathsf{l g}|\\boldsymbol{Z}|_{0.01\\mathsf{H z}}$ experimental values were widely distributed in the range of 4.75–10.87 $(\\mathsf{l}\\mathsf{g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ . According to a previous experimental study11, the scratched coatings with different self-healing abilities are involved in this distribution, indicating that the selection of the initial preparation conditions using the orthogonal Latin square method is reasonable.",
        "category": " Results and discussion"
    },
    {
        "id": 5,
        "chunk": "# Assessment and selection of an $\\mathbf{\\boldsymbol{\\mathsf{I}}}\\mathbf{\\boldsymbol{\\mathsf{g}}}|\\mathbf{\\boldsymbol{Z}}|_{0.01\\mathsf{H z}}$ values prediction model  \n\nNext step, different experimental conditions and corresponding $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value of scratched coating were used as the input and output of the machine learning process, respectively, and five common machine learning models were trained using 32 initial datasets. A comparison of the predicted and measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values for each model is shown in Fig. 3a, e. A black dashed straight line indicates equal measured and predicted values. A comparison of the accuracy of each model is shown in Fig. 3f. Compared with the other models, the RF model yielded the best accuracy in terms of a higher coefficient of determination $(R^{2})$ value, and lower mean absolute percentage error (MAPE) and root mean square error (RMSE) values. This may be due to its deeper layers of model structure than general machine learning models; RF models possessed a good processing ability for data with high variability38,39. Hence, the RF model was chosen to predict the $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values in subsequent steps.",
        "category": " Results and discussion"
    },
    {
        "id": 6,
        "chunk": "# Active learning and machine learning model performance  \n\nFor the active learning process, the RF model first predicted the lg| $Z|_{0.01\\mathsf{H z}}$ values of all $256\\textrm{--}32=224$ sets) possible experimental conditions from the 32 initial dataset. The predicted $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values were ranked in descending order. The five top-ranked experimental conditions from 224 sets of conditions were selected as proposals for subsequent measurements to be performed in the laboratory. These five measurements were added to the initial 32 datasets. Then, the machine learning model for the prediction of the $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values was trained again on this improved $(32+5)$ dataset. The new measurements were re-used in the RF model to improve the accuracy, as this can enhance the prediction accuracy for high-target performance samples in a targeted manner and improve the active learning efficiency. This process, from the prediction phase to the reuse phase, represents one cycle of active learning (see Table 3). This active learning process is repeated until the preliminary goal of the best accuracy of the machine learning model is achieved. In this study, the active learning cycle was stopped if all the evaluation indices (MAPE, RMSE and $R^{2}$ ) stopped increasing.  \n\n![](images/6737f76d82306573523c5d04f21d9340754bf45640eb689abcf5d0b4071b2ad5.jpg)  \nFig. 3 The selection of the best machine learning model. Distribution of predicted versus measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values from successive test sets used in the 10-fold cross-validation using different machine learning models, a–e correspond to artificial neural network (ANN), linear regression (LR), support vector regression (SVR), decision tree (DT) and random forest (RF) model, respectively. f A comparison of the accuracy for each model, including $R^{2}$ , MAPE, and RMSE values.  \n\nFigures $_{4a-9}$ present scatter plots of the predicted versus measured $\\mathsf{l g}|Z|_{0.01\\mathsf{H z}}$ values from the initial dataset to the last cycle. The blue and red dots indicate existing and new measurements, respectively. The evolution of the corresponding $R^{2}$ , MAPE and RMSE values for each cycle is summarized in Fig. 4h, i. As shown in Figs. ${4a-g},$ the predicted and measured values gradually approached the black dashed straight line from the initial dataset to the last cycle, indicating that an increase in the dataset size resulted in predicted $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values that are closer to measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values. As the dataset size increased, $R^{2}$ clearly increased, and the MAPE and RMSE decreased gradually. After five active learning cycles, the $R^{2}$ , MAPE and RMSE values reached equilibrium, at this time, the active learning process was terminated. For the dataset of 62 samples, the RF model achieved $R^{2}$ , MAPE and RMSE values of 0.709, 0.081 and 0.685 $(\\mathsf{l g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ , respectively. Compared to the accuracy of the initial dataset, improvements of $246\\%$ , $51\\%$ and $47\\%$ were achieved for $R^{2}$ , MAPE, and RMSE, respectively. In this case, $R^{2}$ was greater than 0.7 and both MAPE and RMSE were stabilized at a low level, indicating that the RF model reached acceptable accuracy. Therefore, the active learning procedure was stopped at this stage and the RF model was fixed based on the existing dataset.  \n\nIn addition, Table 3 lists the top-five proposed experiments for the five cycles of active learning with the corresponding predicted and measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values. Several measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values in Table 3 that were greater than 11.00 $(\\mathsf{l g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ , which is greater than the highest value in the initial dataset, showed that the RF model allowed us to predict the experimental conditions of the coating with a potentially high self-healing ability. These additional data on high-performance self-healing coatings are beneficial for further maximization using Bayesian optimization. In addition, the proposed experiments required polyetheramine of molecular weights 400 and $2000\\ {\\mathsf{g}}{\\cdot}{\\mathsf{m o l}}^{-1}$ , with an $r$ value greater than 0.85, $10{-}20\\mathrm{mol}\\%$ of UPy-D400, and ZIF- $8@C a$ microfiller content in the full range. This provided the main guidance for refining the test conditions in the subsequent step.",
        "category": " Results and discussion"
    },
    {
        "id": 7,
        "chunk": "# Bayesian optimization for screening optimal candidate  \n\nIn this step, three experimental conditions were refined: r values, molar ratio of UPy-D400, and microfiller content were varied from 0.85 to 1.00, 10 to $20\\mathrm{mol}\\%$ , and 5.5 to $10.0\\mathrm{~wt.\\%}$ , by increments of 0.1, $1\\mathrm{mol}\\%$ , and $0.1~\\mathrm{wt}.\\%$ , respectively. The molecular weights of the polyetheramine curing agents were fixed at 400 and $2000g\\cdot m{\\mathsf{o l}}^{-1}$ . Obviously, this search space for the coating formulation is vast, and the machine learning model has limited utility if it do not incorporate uncertainty and the expected improvement process. Since a machine learning model is built using a limited amount of training data, the selection of candidates using that model may be limited to a local search. Therefore, we speculate that Bayesian optimization may give better results because this optimization technique considers the uncertainty of the prediction and the balance between local and global search40.  \n\nBayesian optimization works on a surrogate model and evaluates a utility function41. The utility function uses the mean and standard deviation of the candidates estimated by the surrogate model. The utility function encodes a trade-off between the exploitation (candidate searching at points with high mean) and exploration (candidate searching at points with high uncertainty). Herein, we have used RF as the surrogate model and expected improvement (EI) as a utility function. The EI is defined as the following Eqs. (1)- $(2)^{42}$ :  \n\n$\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$   \n\n\n<html><body><table><tr><td>Tabie 3. Expenmenta</td><td></td><td colspan=\"5\">proposeacond</td></tr><tr><td rowspan=\"2\">Cycle</td><td rowspan=\"2\">Rank</td><td colspan=\"5\">Variable parameter</td></tr><tr><td>MWc (g·mol-1)</td><td>UPy-D400 content (mol%)</td><td>ZIF-8/Ca content (wt.%)</td><td>Predicted lg(Z|/s2·cm2)</td><td>Measured lg(Z|/s2·cm2)</td></tr><tr><td>Initial</td><td>1</td><td>400</td><td>1.00 20</td><td>5.5</td><td>10.49 ± 0.32</td><td>10.15 ± 0.30</td></tr><tr><td></td><td>2</td><td>400 1.00</td><td>20</td><td>7.0</td><td>9.96 ± 0.24</td><td>10.88 ± 0.44</td></tr><tr><td rowspan=\"7\">Cycle 1</td><td>3</td><td>400</td><td>0.85 10</td><td>5.5</td><td>9.71 ± 0.14</td><td>10.3 ± 0.41</td></tr><tr><td>4</td><td>2000 0.85</td><td>15</td><td>5.5</td><td>8.93 ± 0.29</td><td>8.35 ± 0.62</td></tr><tr><td>5</td><td>2000</td><td>0.85 20</td><td>8.5</td><td>9.33 ± 0.20</td><td>9.05 ± 0.75</td></tr><tr><td>1</td><td>400</td><td>0.85 20</td><td>7.0</td><td>10.14 ± 0.19</td><td>10.11 ± 0.44</td></tr><tr><td>2</td><td>400 1.00</td><td>20</td><td>10.0</td><td>9.88 ± 0.17</td><td>10.24 ± 0.13</td></tr><tr><td>3</td><td>400</td><td>1.00 15</td><td>7.0</td><td>9.52 ± 0.20</td><td>10.52 ± 0.46</td></tr><tr><td>4</td><td>2000</td><td>1.00 10</td><td>7.0</td><td>9.57± 0.18</td><td>8.23 ± 0.29</td></tr><tr><td rowspan=\"5\">Cycle 2</td><td>5</td><td>400</td><td>0.85 20</td><td>10.0</td><td>9.65 ± 0.17</td><td>9.52 ± 0.51</td></tr><tr><td>1</td><td>400</td><td>0.85 15</td><td>5.5</td><td>10.27 ± 0.21</td><td>10.08 ± 0.30</td></tr><tr><td>2</td><td>400</td><td>1.00 15</td><td>8.5</td><td>10.23 ± 0.25</td><td>11.03 ± 0.38</td></tr><tr><td>3</td><td>400</td><td>0.85 15</td><td>7.0</td><td>10.03 ± 0.15</td><td>10.76 ± 0.46</td></tr><tr><td>4</td><td>400</td><td>1.00 15</td><td>10.0</td><td>9.90 ± 0.20</td><td>9.63 ± 0.64</td></tr><tr><td rowspan=\"5\">Cycle 3</td><td>5</td><td>400</td><td>0.85 15</td><td>8.5</td><td>10.31 ± 0.12</td><td>10.26 ± 0.71</td></tr><tr><td>1</td><td>400</td><td>0.85 15</td><td>10.0</td><td>9.44± 0.24</td><td>10.25 ± 0.75</td></tr><tr><td>2</td><td>2000</td><td>0.85 15</td><td>7.0</td><td>9.12 ± 0.35</td><td>9.62 ± 0.54</td></tr><tr><td>3</td><td>2000</td><td>0.85 15</td><td>8.5</td><td>9.37 ± 0.36</td><td>9.94 ± 0.48</td></tr><tr><td>4</td><td>2000</td><td>1.00 15</td><td>5.5</td><td>8.91 ± 0.44</td><td>9.41 ± 0.38</td></tr><tr><td rowspan=\"5\">Cycle 4</td><td>5</td><td>2000</td><td>1.00 15</td><td>8.5</td><td>9.43± 0.18</td><td>9.22 ± 0.15</td></tr><tr><td>1</td><td>2000</td><td>0.85 15</td><td>10.0</td><td>9.40 ± 0.30</td><td>9.68 ± 0.80</td></tr><tr><td>2</td><td>2000</td><td>0.85 20</td><td>8.5</td><td>9.30 ± 0.25</td><td>9.03 ± 0.68</td></tr><tr><td>3</td><td>2000</td><td>1.00 15</td><td>10.0</td><td>9.30 ± 0.20</td><td>9.84± 0.51</td></tr><tr><td>4</td><td>2000</td><td>1.00 15</td><td>7.0</td><td>9.24 ± 0.22</td><td>9.10 ± 0.74</td></tr><tr><td rowspan=\"6\">Cycle 5</td><td>5</td><td>2000</td><td>1.00 20</td><td>8.5</td><td>9.00 ± 0.18</td><td>9.62 ± 0.48</td></tr><tr><td>1</td><td>2000</td><td>0.85 20</td><td>7.0</td><td>9.04± 0.10</td><td>9.18 ± 0.84</td></tr><tr><td>2</td><td>2000</td><td>0.85 20</td><td>10.0</td><td>9.28 ± 0.08</td><td>9.45 ± 0.54</td></tr><tr><td>3</td><td>2000</td><td>1.00 20</td><td>10.0</td><td>9.06 ± 0.15</td><td>9.21 ± 0.69</td></tr><tr><td>4</td><td>2000</td><td>1.00 20</td><td>5.5</td><td>8.99 ± 0.18</td><td>9.30 ± 0.50</td></tr><tr><td>5</td><td>2000</td><td>0.85 20</td><td>7.0</td><td>9.16 ± 0.20</td><td>9.09 ± 0.25</td></tr></table></body></html>\n\nInitial step: the top-five proposed experiments were obtained by a model trained on initial 32 samples in the range of remaining 224 untested experiments; Cycle 1: From the remaining 219 untested experiments, the another top-five proposed experiments were obtained by a model trained on 37 samples. Cycle $2\\sim5$ utilized the same method to obtain new proposed experiment and train the model.  \n\n$$\n\\mathsf{E I}(\\mathsf{x})=\\sigma(\\mathsf{x})[z\\Phi(z)+\\phi(z)]\n$$  \n\n$$\n{\\boldsymbol{\\ z}}=[\\mu(\\mathbf{x})-\\mathbf{f}(\\mathbf{x}^{+})-\\varepsilon]/\\sigma(\\mathbf{x})\n$$  \n\nwhere $E I(x)$ represents the expected improvement value for each coating formulation candidate. $\\mu$ and $\\sigma$ are the predicted output and standard deviation of the candidates obtained from the surrogate model, $f(x^{+})$ is the maximum value of the target material property observed in the training data set. $\\phi$ represents the cumulative distribution function and $\\phi$ is the probability distribution function assuming the target property values follows the normal distribution. The term ε regulates the amount of exploration, higher the value of ε more is the exploration. In this method, the largest EI value represents the most promising coating formulation candidate. Here, we use 1000 iterations for BO run, as this was sufficiently many to predict the optimal experimental conditions with high accuracy (see Data Availability section for where to access this code), and a series of experiments were conducted starting from rank 1 (Table 4). The new highest lg| $Z|_{0.01\\mathsf{H z}}$ values of $11.58\\pm0.28$ $(\\mathsf{l}\\mathsf{g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ was observed, that is, $(4.40\\pm2.04)\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2}.$ This impedance modulus value was considerably high compared with those reported in previous studies on EP-based self-healing coating $11,43-\\dot{4}6$ , which reported a typical $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value range of $7.48\\substack{-10.68}$ $(\\mathsf{l}\\mathsf{g}(\\Omega\\cdot\\mathsf{c m}^{2}))$ . The suggested experimental conditions from Bayesian optimization showed that a relatively low molecular weight of polyetheramine and a high molar ratio of polyetheramine to EP were promising conditions for achieving a high $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value, whereas the molar ratio of UPy-D400 and microfillers content should be in the middle of their defined range. According to previous studies47,48, excessive amine addition improves the shape recovery rate of EP materials. The intrinsic self-repair process mentioned in this study is realized by a self-healing unit (hydrogen bond) selfassociation process on the premise that the damage can be physically closed. A high shape recovery rate is beneficial for the physical closure of scratched material surfaces11. Excess amine (excessive r value) leads to higher flexibility but lower mechanical strength of EP materials47, an optimum combination of high strength and good flexibility can be achieved by adjusting the $r$ value precisely through Bayesian optimization. The introduction of self-healing units and microfillers may also affect the various performance indicators of the coatings, which can balance each addition amount simultaneously to achieve a reasonable design for target property.  \n\n![](images/acdf10c6345288dfc1e83c6b642a845d2e17b535f644a8ef0fedb9062f37b5a9.jpg)  \nFig. 4 Active learning process. a–g Correlation scatter plots of predicted and measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values using different datasets, including initial dataset and cycle 1-6 datasets. h, i Comparison of the accuracy $({\\cal R}^{2}$ , RMSE and MAPE value) of the RF model for different datasets.  \n\n<html><body><table><tr><td colspan=\"6\">Table 4. Proposed preparations of a composite coating at Bayesian optimization stage with the related experimental lg|Zlo.o1Hz values of scratched coatings.</td></tr><tr><td colspan=\"6\">Rank Variable parameter</td></tr><tr><td></td><td>MWc r (g·mol-1)</td><td>UPy content (mol%)</td><td>ZIF-8/Ca content (wt.%)</td><td>Predicted Ig(Z)/ Ω·cm²)</td><td>Measured lg(|Z|/Ω·cm²)</td></tr><tr><td>１</td><td>400 0.94</td><td>14</td><td>7.8</td><td>11.01</td><td>11.58±0.28</td></tr><tr><td>２</td><td>400 0.97</td><td>17</td><td>8.0</td><td>10.92</td><td>11.15 ± 0.65</td></tr><tr><td>3</td><td>400 1.00</td><td>16</td><td>8.0</td><td>10.92</td><td>10.98 ± 0.40</td></tr><tr><td>4</td><td>400 0.95</td><td>20</td><td>8.8</td><td>10.88</td><td>10.85 ± 0.74</td></tr><tr><td>５</td><td>400 1.00</td><td>16</td><td>7.4</td><td>10.88</td><td>10.90 ± 0.68</td></tr></table></body></html>  \n\nFigure 5 shows the distribution of $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values of scratched coatings from the initial dataset, after the five active learning cycles, and after a Bayesian optimization process. The $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values from the initial dataset were spread randomly from 4.75 to 10.87 $(\\mathsf{l}9(\\Omega\\cdot\\mathsf{c m}^{2}),$ . By comparison, all samples that followed an active learning cycle exhibited a high $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value $(>8.23$ $(\\mathsf{l g}(\\Omega\\cdot\\mathsf{c m}^{2})))$ , and one sample from the Bayesian optimization dataset showed an exceptionally high $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ value. These results demonstrate the potential of our machine learning framework for the design and optimization of high-performance functional materials based on small sample conditions.  \n\nInterpretation of machine learning model for coating design EIS measurements were conducted on the scratched pure commercial EP and ZIF- ${\\cdot8@\\mathsf{C a}/\\mathsf{E P}}$ coatings and their corresponding intact coatings to study the self-healing and corrosion resistance properties. The $Z|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating was prepared based on the best formulation selected by Bayesian optimization. Nyquist and Bode plots of the intact coatings were obtained by EIS after $30\\mathrm{min}$ of immersion in $3.5\\ \\mathsf{w t}.\\%$ NaCl solution (Fig. 6a–c). Figure 6d–i show the Nyquist and Bode plots of the steels with scratched coatings after immersion for 1, 15, 30 and $60~\\mathsf{d}$ . The as-used pure EP coating was prepared by mixing E51 with D400 polyetheramine curing agents at a molar ratio of 5:3. For the pure EP sample, the intact coating initially showed a high barrier property with large capacitive arc in the Nyquist plot (Fig. 6a) and the high $|Z|_{0.01\\mathsf{H z}}$ value $(3.98\\times10^{10}~\\Omega{\\cdot}\\mathsf{c m}^{\\hat{2}})$ in the Bode plot (Fig. 6b). The phase angles in the high frequencies $(10^{5}\\mathsf{H z})$ were close to $-90^{\\circ}$ which indicates the capacitive character of the coatings. In contrast to the intact pure EP coating, intact ZI $\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating exhibited a slightly larger capacitive arc in terms of Nyquist plot, and $|Z|_{0.01\\mathsf{H z}}$ value rose to $3.8\\dot{2}\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2}$ , indicating substantial improvement in the barrier property of the coating after the machine learning adjustment. The average and standard deviation of the $|Z|_{0.01\\mathsf{H z}}$ value for intact coating were calculated using six parallel samples, expressed as $(4.63\\pm2.08)\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2}$ .  \n\n![](images/e7c1bdabee074993eb6e101b9c0857235babbe2dbe22a74feb09ed1a3090a0be.jpg)  \nFig. 5 Comparison of the measured target performance for each machine learning stage. Distribution of measured $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values from the initial dataset (blue), after active learning process (dark blue) and after Bayesian optimization (red).  \n\nIn terms of the scratched coatings, the capacitive arcs of the pure EP coating shrank and the $|Z|_{0.01\\mathsf{H z}}$ values declined gradually over the entire immersion time, demonstrating the continuous deterioration of the barrier property (Figs. 6d–e). Subsequently, for the phase diagrams in Fig. 6f, scratched pure EP showed two-time constants: one related to the charge transfer process at the coating/substrate interface $(10^{-2}-10^{\\bar{0}}\\mathsf{H z}).$ and the other related to the resistance increase by means of corrosion product formation in the artificial defect $(10^{1}-10^{5}\\mathsf{H z})^{49}$ . Compared with the Bode plots for pure EP coating, the Bode plots of the scratched coating showed approximately $-45^{\\circ}$ straight lines with $|Z|_{0.01\\mathsf{H z}}$ values in excess of $3.80\\times10^{11}\\quad\\Omega{\\cdot}\\mathrm{cm}^{2}$ at the beginning of immersion. The corresponding phase angles were $-900$ over the frequency range of $1\\dot{0}^{-1}-10^{\\bar{5}}\\mathsf{\\Pi}\\mathsf{\\dot{H}}z$ . This implies that during the immersion, a conductive pathway is not formed through the coating, which largely exhibits a capacitive behavior similar to that of an intact coating50. During the $60~\\mathsf{d}$ of immersion, the $|Z|_{0.01\\mathsf{H z}}$ values of the ZIF- ${\\cdot}8@{\\mathsf{C a/E P}}$ coating only slightly decreased from $3.80\\ \\times\\ 10^{11}\\ \\Omega{\\cdot}\\mathsf{c m}^{2}$ to $1.23\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2}.$ , confirming that the scratched ZIF-8@Ca/EP coating had been well repaired and possessed a satisfactory corrosion resistance.  \n\nAfter scratching, the pure EP and ZIF-8@Ca/EP coatings were subjected to salt spray tests following the ASTM B117/ D1654 standard. Figures 6b and 7a show the optical images of the coatings after exposure to the salt spray chamber for different periods. According to the visual assessment in Fig. 7a, green corrosion products were observed at the scratches of the pure EP coating within the 1 d of the salt spray test. After 60 d, large-scale coating delamination and corrosion products appeared in the scratched region, indicating that the scratched location of the pure EP coating was highly vulnerable to attack by corrosive species. Compared with pure EP, only slight scratch traces were observed at the scratched positions, and the $Z_{1}F{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating did not show any signs of degradation (delamination, corrosion, or blistering) after 30 d (Fig. 7b). Furthermore, as the salt spray exposure time increased to $60~{\\mathsf{d}},$ only one slight corrosion spot was observed at the scratched site, indicating the corrosion of the scratched $Z|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating could be controlled in a salt spray environment for a long time.  \n\nThe adhesion strength, an important indicator of coating properties, can be measured using a pull-off test. Figure 7d shows the adhesion strength/loss values of intact pure EP and $Z_{1}F_{-}8@C_{\\mathsf{a}}/$ EP coating before and after the 60 d salt spray test. The optical images of the remaining coatings following the pull-off test are presented in Fig. 7c. As shown in Fig. 7c, none of the samples exhibits cohesive failure. As shown in Fig. 7c, the dry adhesion strength of the $Z|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coatings (9.82 MPa) is higher than that of pure EP (4.70 MPa). This is because the introduction of branched-chain amines and UPy units enhanced the hydrogen bonding between the coating and the metal surface51. After salt spraying, the pure EP coating exhibited a considerable adhesion loss of $79.4\\%$ $(0.97\\mathsf{M P a})$ . In contrast, the ZIF-8@Ca/EP coating demonstrated not only the highest wet adhesion strength $(9.50\\mathsf{M P a})$ but also minimal adhesion loss $(3.3\\%)$ after a 60 d of salt spray test.  \n\nIn summary, the design of experimental techniques combined with an active learning and Bayesian optimization was proposed to predict and optimize the $\\mathsf{l g}|Z|_{0.01\\mathsf{H z}}$ values of scratched EP selfhealing coatings composed of different molecular weights of polyetheramine curing agent, molar ratios of polyetheramine to E51 EP resin, molar content of UPy-D400 and mass contents of ZIF$8@C a$ microfillers. The active learning process yielded the preferred experimental conditions to build a predictive RF model of $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values with satisfactory accuracy $(R^{2}=0.709,\\mathsf{M A P E}=$ 0.081, $\\mathsf{R M S E}=0.685$ $(\\mathsf{l g}(\\Omega\\cdot\\mathsf{c m}^{2})))$ after five cycles of active learning. Then, an extremely high $\\mathsf{I g}|Z|_{0.01\\mathsf{H z}}$ values of 11.58 $(|Z|_{0.01\\mathsf{H z}}=3.80\\times10^{11}\\Omega{\\cdot}\\mathsf{c m}^{\\dot{2}})$ was achieved using the experimental conditions that were refined by Bayesian optimization. As confirmed by EIS, the $Z|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating exhibited a great healing effect in barrier property (intact sample: $3.82\\times10^{11}\\Omega{\\cdot}\\mathrm{cm}^{2},$ repaired sample: $3.80\\times10^{11}~\\dot{\\Omega}{\\cdot}\\mathsf{c m}^{2})$ . In addition, in terms of the corrosion resistance after repair, the $Z|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P}$ coating exhibited slight corrosion after 60 d of the salt spray test, and the adhesion loss of the composite coating after the salt spray test was $3.3\\%$ which was considerably lower than that of the pure EP coating $(79.4\\%)$ .",
        "category": " Results and discussion"
    },
    {
        "id": 8,
        "chunk": "# METHODS",
        "category": " Materials and methods"
    },
    {
        "id": 9,
        "chunk": "# Materials  \n\nPolyetheramine curing agents with four different molecular weights (230, 400, 2000 and $4000\\mathsf{g}\\mathsf{m o l}^{-1}$ ) were sourced from the Aladdin Industrial Corporation. The E51 EP resin was sourced from Jiangsu Heli Resin Co., ltd. The ZIF- $8@C a$ microfillers and the UPy-D400 monomers were obtained using previously published methods11,51. The Q235 mild steel was used as the substrate.  \n\n![](images/d218b5017e8776b8d3b07db88a31a622234ce25382dea667c5df62f5f3e52805.jpg)  \nFig. 6 EIS characterizations of the different intact/scratched coatings. a Nyquist plots and b, c Bode plots of the intact pure EP and intact ZIF-8@Ca/EP coatings after 30 min of immersion in $3.5\\mathrm{\\:wt.}\\%$ NaCl solution. Nyquist plots and Bode plots of different d–f scratched pure EP and g–i scratched ZIF-8@Ca/EP coating during immersion in 3.5 wt. $\\%$ NaCl solution for $60~\\mathsf{d}$",
        "category": " Materials and methods"
    },
    {
        "id": 10,
        "chunk": "# Preparation of coatings and EIS test  \n\nBased on the selected 32 experimental conditions, the preparation process of the self-healing EP coating containing $Z|F-8@C{\\mathsf{a}}$ microfillers $(\\boldsymbol{Z}|\\mathsf{F}{-}8@\\mathsf{C a}/\\mathsf{E P})$ is shown in Fig. 8. In each case, the ZIF-8@Ca microfillers were first mixed with the E51 EP resin under magnetic stirring. The polyetheramine curing agent and UPy-D400 were then added to the mixture using a mechanical agitator at 500 rpm for 10 min. Prior to the coating preparation, the steel specimens were wet-polished sequentially with 150-, 240- and 400-grit sandpapers, washed with ethanol and blow-dried in an ${\\sf N}_{2}$ atmosphere. The resulting mixture was applied to a steel piece using a bar coater. The coated samples were obtained by drying at room temperature for $48\\mathsf{h}$ . The final thickness of each of the dry films was approximately $85~{\\upmu\\mathrm{m}}$ .  \n\nEIS tests were performed to measure the low-frequency impedance $(|Z|_{0.011\\forall})$ values of the coated steel with/without an artificial scratch. Herein, all scratches of the EIS tests are made by a scalpel, and they are reproducible. The EIS results were obtained using a $3.5\\ \\mathsf{w t}.\\%$ NaCl solution and a CHI-660E electrochemical workstation with a three-electrode cell system comprising a coated steel substrate as a working electrode, a platinum plate electrode as a counter electrode and a saturated calomel electrode (SCE) as a reference electrode. The test parameters were set in the $10^{-2}{-}10^{5}\\mathsf{H z}$ range with a $0.02\\mathsf{V}$ root mean square amplitude. Prior to EIS measurements, artificial through-coating scratches (approximately $3\\mathsf{m m}$ in length and approximately $60\\upmu\\mathrm{m}$ in width) were made on the different coated steels using a scalpel. The measurements were conducted on the coated steels at least five times to ensure the reproducibility of the EIS results. In EIS results, the $|Z|_{0.01\\mathsf{H z}}$ value in the Bode plot usually represents the main performance index for the corrosion resistance of a coating, that is, a higher $|Z|_{0.01\\mathsf{H z}}$ value reflects a higher barrier property52. Therefore, this index was used to characterize the repair effect of the barrier properties of the coating after scratching.  \n\nTo further verify the self-healing and long-term anti-anticorrosion ability of the scratched composite coating after machine learning process, salt spray test was performed on the coatings via exposing the samples to salt spray for $60{\\mathrm{~d~}}$ in accordance with ASTM D1654.",
        "category": " Materials and methods"
    },
    {
        "id": 11,
        "chunk": "# Data pre-processing, data splitting and machine learning models  \n\nData pre-processing and data splitting were performed and different machine learning models were simulated using the Python package scikit-learn (version 1.1.1). The four variable parameters (Table 4) in this study were standardized following a standard Gaussian distribution of a mean of 0 and a variance of $1^{53}$ . The purpose of normalization is to make the preprocessed data be limited to a certain range (e.g., [0,1] or [–1,1]), thus eliminating the undesirable effects caused by sample dataset with high variability. The validity and accuracy of all employed machine learning models were evaluated using k-fold cross-validation. In this step, the data were randomly arranged and divided into 10 groups. Nine groups were allocated for training purposes, and the remaining group was assigned to validate of the model. The average value was obtained by repeating the same process 10 times. To obtain the performance level of the model, the MAPE,  \n\n![](images/42c589ed5e004a3971035c311d43d7da737d0af26f34606ec4ab9c9cf2aa0780.jpg)  \nFig. 7 Salt spray analysis of the different intact/scratched coatings. a, b Optical images of the pure EP and ZIF- ${\\cdot}8@C\\mathsf{a}/\\mathsf{E P}$ coating. c Optical images of the pure EP and ZIF- ${\\pmb{8}}\\textcircled{\\circ}{\\mathsf{C a/E P}}$ coating after pull-off test at the end of salt spray test. d The adhesion strength values of the pure EP and ZIF-8@Ca/EP coating before and after 60 d of salt spray exposure, the adhesion strength values represent the average $\\pm$ standard deviations.  \n\n![](images/d0553c201e5d3aa235222ba35cdd22b5381b023f68b217f244eee7ab75931fd3.jpg)  \nFig. 8 Schematic illustration of the preparation process for self-healing EP composite coating. The coating formulation consists of the EP resin, polyetheramines, hydrogen bond unit (UPy-D400) and ZIF-8@Ca microfillers.  \n\nRMSE and $R^{2}$ were introduced to evaluate the k-fold crossvalidation, using the following Eqs. (3)-(5):54–56  \n\n$$\nM A P E=\\frac{1}{n}\\sum_{\\mathrm{i}=1}^{n}\\frac{|\\mathsf{y}_{\\mathrm{i}}-\\hat{\\mathsf{y}}_{\\mathrm{i}}|}{|\\mathsf{y}_{\\mathrm{i}}|}\n$$  \n\n$$\n{\\mathsf{R M S E}}={\\sqrt{{\\frac{1}{\\mathsf{n}}}\\sum_{\\mathrm{i=1}}^{\\mathsf{n}}{(\\mathsf{y}_{\\mathrm{i}}-{\\hat{\\mathsf{y}}}_{\\mathrm{i}})}^{2}}}\n$$  \n\n$$\n\\mathsf{R}^{2}=1-\\frac{\\sum_{\\mathrm{i=1}}^{\\mathrm{n}}(\\mathsf{y}_{\\mathrm{i}}-\\hat{\\mathsf{y}}_{\\mathrm{i}})^{2}}{\\sum_{\\mathrm{i=1}}^{\\mathrm{n}}(\\mathsf{y}_{\\mathrm{i}}-\\bar{\\mathsf{y}})^{2}}\n$$  \n\nwhere $\\mathsf{n}$ is the number of samples, and $y_{i}$ and $\\hat{y}_{i}$ are the experimental and predicted values of the ith sample, respectively.  \n\nThe accuracy of the machine learning model was accessed using its MAPE (MAPE value is in between 0 and 1, a value closer to 0 indicates greater accuracy57) and RMSE (a lower value of each indicates greater accuracy30) and $R^{2}$ (a value closer to 1 indicates greater accuracy; when the $R^{2}$ coefficient is greater than 0.7, the model represents acceptable accuracy58.)  \n\nFive machine learning models were applied as regression tools to the dataset: LR, ANN, SVR, DT and RF models. The machine learning methods are described in detail in the related reference59. The interested reader should refer to the Data Availability section for where to access our code used to run these algorithms.",
        "category": " Materials and methods"
    },
    {
        "id": 12,
        "chunk": "# Bayesian optimization  \n\nBayesian optimization40 was used to determine the highest $\\mathsf{I g}\\vert\\boldsymbol{Z}\\vert_{0.01\\mathsf{H z}}$ values by refining the variable conditions from Table 1. Bayesian optimization was performed using the Python package GPyOpt.",
        "category": " Materials and methods"
    },
    {
        "id": 13,
        "chunk": "# DATA AVAILABILITY  \n\nSource codes for this article are publicly available at https://github.com/ lt1037870521/manuscript-code-EP-Lt.  \n\nReceived: 11 June 2023; Accepted: 4 January 2024; Published online: 19 January 2024",
        "category": " References"
    },
    {
        "id": 14,
        "chunk": "# REFERENCES  \n\n1. He, Y. et al. Micro-crack behavior of carbon fiber reinforced $\\mathsf{F e}_{3}\\mathsf{O}_{4}/$ graphene oxide modified epoxy composites for cryogenic application. Compos. Part A Appl. Sci. Manuf. 108, 12–22 (2018).   \n2. Huang, S. et al. An overview of dynamic covalent bonds in polymer material and their applications. Eur. Polym. J. 141, 110094 (2020).   \n3. Utrera-Barrios, S., Verdejo, R. & López-Manchado, M. A. & Hernández Santana, M. Evolution of self-healing elastomers, from extrinsic to combined intrinsic mechanisms: a review. Mater. Horiz. 7, 2882–2902 (2020).   \n4. Samadzadeh, M., Boura, S. H., Peikari, M., Kasiriha, S. M. & Ashrafi, A. A review on selfhealing coatings based on micro/nanocapsules. Prog. Org. Coat. 68, 159–164 (2010).   \n5. Shchukin, D. G. Container-based multifunctional self-healing polymer coatings. Polym. Chem. 4, 4871–4877 (2013).   \n6. Canadell, J., Goossens, H. & Klumperman, B. Self-healing materials based on disulfide links. Macromolecules 44, 2536–2541 (2011).   \n7. Kuang, X. et al. Facile fabrication of fast recyclable and multiple self-healing epoxy materials through diels-alder adduct cross-linker. J. Polym. Sci. Pol. Chem. 53, 2094–2103 (2015).   \n8. Wen, N. et al. Recent advancements in self-healing materials: Mechanicals, performances and features. React. Funct. Polym. 168, 105041 (2021).   \n9. Han, Y., Wu, X., Zhang, X. & Lu, C. Self-healing, highly sensitive electronic sensors enabled by metal–ligand coordination and hierarchical structure design. ACS Appl. Mater. Inter. 9, 20106–20114 (2017).   \n10. Nardeli, J. V., Fugivara, C. S., Taryba, M., Montemor, M. F. & Benedetti, A. V. Selfhealing ability based on hydrogen bonds in organic coatings for corrosion protection of AA1200. Corros. Sci. 177, 108984 (2020).   \n11. Liu, T. et al. Ultrafast and high-efficient self-healing epoxy coatings with active multiple hydrogen bonds for corrosion protection. Corros. Sci. 187, 109485 (2021).   \n12. Kim, G., Caglayan, C. & Yun, G. J. Epoxy-based catalyst-free self-healing elastomers at room temperature employing aromatic disulfide and hydrogen bonds. ACS omega 7, 44750–44761 (2022).   \n13. Bosnian, A., Brunsveld, L., Folmer, B. & Sijbesma, R. & Meijer, E. Macromol. Symp. 201, 143–154 (2003).   \n14. Rosero-Navarro, N. C., Pellice, S. A., Durán, A. & Aparicio, M. Effects of Cecontaining sol–gel coatings reinforced with $\\mathsf{S i O}_{2}$ nanoparticles on the protection of AA2024. Corros. Sci. 50, 1283–1291 (2008).   \n15. Wang, J. et al. Two birds with one stone: Nanocontainers with synergetic inhibition and corrosion sensing abilities towards intelligent self-healing and selfreporting coating. Chem. Eng. J. 433, 134515 (2022).   \n16. Fan, Z. et al. Self-healing mechanisms in smart protective coatings: a review. Corros. Sci. 144, 74–88 (2018).   \n17. Tao, Q., Xu, P., Li, M. & Lu, W. Machine learning for perovskite materials design and discovery. npj Comput. Mater. 7, 23 (2021).   \n18. Li, Z. et al. Machine learning in concrete science: applications, challenges, and best practices. npj Comput. Mater. 8, 127 (2022).   \n19. Zhong, X. et al. Explainable machine learning in materials science. npj Comput. Mater. 8, 204 (2022).   \n20. Taylor, C. D. & Tossey, B. M. High temperature oxidation of corrosion resistant alloys from machine learning. npj Mater. Degrad 5, 38 (2021).   \n21. Li, Q. et al. Long-term corrosion monitoring of carbon steels and environmental correlation analysis via the random forest method. npj Mater. Degrad 6, 1 (2022).   \n22. Al-Haik, M. S., Hussaini, M. Y. & Garmestani, H. Prediction of nonlinear viscoelastic behavior of polymeric composites using an artificial neural network. Int. J. Plast. 22, 1367–1392 (2006).   \n23. Hatakeyama-Sato, K., Tezuka, T., Umeki, M. & Oyaizu, K. AI-assisted exploration of superionic glass-type $\\mathsf{L i}(+)$ conductors with aromatic structures. J. Am. Chem. Soc. 142, 3301–3305 (2020).   \n24. Askland, K. D. et al. Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy. Int. J. Methods Psychiatr. Res. 24, 156–169 (2015).   \n25. Shao, M., Zhu, X.-J., Cao, H.-F. & Shen, H.-F. An artificial neural network ensemble method for fault diagnosis of proton exchange membrane fuel cell system. Energy 67, 268–275 (2014).   \n26. Xu, P., Ji, X., Li, M. & Lu, W. Small data machine learning in materials science. npj Comput. Mater. 9, 42 (2023).   \n27. Sutojo, T. et al. A machine learning approach for corrosion small datasets. npj Mater. Degrad 7, 18 (2023).   \n28. Xiang, K.-L., Xiang, P.-Y. & Wu, Y.-P. Prediction of the fatigue life of natural rubber composites by artificial neural network approaches. Mater. Des. 57, 180–185 (2014).   \n29. Menon, A., Thompson-Colón, J. A. & Washburn, N. R. Hierarchical machine learning model for mechanical property predictions of polyurethane elastomers from small datasets. Front. Mater. 6, 87 (2019).   \n30. Pruksawan, S., Lambard, G., Samitsu, S., Sodeyama, K. & Naito, M. Prediction and optimization of epoxy adhesive strength from a small dataset through active learning. Sci. Technol. Adv. Mater. 20, 1010–1021 (2019).   \n31. Li, D., Liu, J. & Liu, J. NNI-SMOTE-XGBoost: A novel small sample analysis method for properties prediction of polymer materials. Macromol. Theory Simul. 30, 2100010 (2021).   \n32. Novikov, I. S., Shapeev, A. V. & Suleimanov, Y. V. Ring polymer molecular dynamics and active learning of moment tensor potential for gas-phase barrierless reactions: Application to ${\\mathsf{S}}+{\\mathsf{H}}_{2}$ . J. Chem. Phys. 151, 224105 (2019).   \n33. Kim, C., Chandrasekaran, A., Jha, A. & Ramprasad, R. Active-learning and materials design: the example of high glass transition temperature polymers. MRS Commun. 9, 860–866 (2019).   \n34. Jha, A., Chandrasekaran, A., Kim, C. & Ramprasad, R. Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures. Model. Simul. Mater. Sci. Eng. 27, 024002 (2019).   \n35. Mandl, R. Orthogonal Latin squares: an application of experiment design to compiler testing. Commun. ACM 28, 1054–1058 (1985).   \n36. Balak, Z. & Zakeri, M. Application of Taguchi $\\mathsf{L}_{32}$ orthogonal design to optimize flexural strength of $Z r B_{2}$ -based composites prepared by spark plasma sintering. Int. J. Refract. Met. H. 55, 58–67 (2016).   \n37. Wu, C. J. & Hamada, M. S. Experiments: planning, analysis, and optimization. (John Wiley & Sons), (2011).   \n38. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).   \n39. Ji, Y. et al. Random forest incorporating ab-initio calculations for corrosion rate prediction with small sample Al alloys data. npj Mater. Degrad 6, 83 (2022).   \n40. Packwood, D. Bayesian Optimization for Materials Science. 11-28 (Springer), (2017).   \n41. Wagner, T., Emmerich, M., Deutz, A. & Ponweiser, W. in Parallel Problem Solving from Nature, PPSN XI: 11th International Conference, Kraków, Poland, September 11-   \n15, 2010, Proceedings, Part I 11. 718-727 (Springer).   \n42. Mohanty, T., Chandran, K. & Sparks, T. D. Machine learning guided optimal composition selection of niobium alloys for high temperature applications. APL Mach. Learn. 1, 036102 (2023).   \n43. Cui, G. et al. Research progress on self-healing polymer/graphene anticorrosion coatings. Prog. Org. Coat. 155, 106231 (2021).   \n44. Nawaz, M., Habib, S., Khan, A., Shakoor, R. A. & Kahraman, R. Cellulose microfibers (CMFs) as a smart carrier for autonomous self-healing in epoxy coatings. N. J. Chem. 44, 5702–5710 (2020).   \n45. Zhang, C., Wang, H. & Zhou, Q. Preparation and characterization of microcapsules based self-healing coatings containing epoxy ester as healing agent. Prog. Org. Coat. 125, 403–410 (2018).   \n46. Wang, T. et al. Photothermal nanofiller-based polydimethylsiloxane anticorrosion coating with multiple cyclic self-healing and long-term self-healing performance. Chem. Eng. J. 446, 137077 (2022).   \n47. Zheng, N., Fang, G., Cao, Z., Zhao, Q. & Xie, T. High strain epoxy shape memory polymer. Polym. Chem. 6, 3046–3053 (2015).   \n48. Li, J., Rodgers, W. R. & Xie, T. Semi-crystalline two-way shape memory elastomer. Polymer 52, 5320–5325 (2011).   \n49. Oliveira, C. & Ferreira, M. Ranking high-quality paint systems using EIS. Part I: intact coatings. Corros. Sci. 45, 123–138 (2003).   \n50. Hao, Y., Sani, L. A., Ge, T. & Fang, Q. Phytic acid doped polyaniline containing epoxy coatings for corrosion protection of Q235 carbon steel. Appl. Surf. Sci. 419,   \n826–837 (2017).   \n51. Liu, T. et al. Self-healing and corrosion-sensing coatings based on pH-sensitive MOF-capped microcontainers for intelligent corrosion control. Chem. Eng. J. 454,   \n140335 (2023).   \n52. Tavandashti, N. P. et al. Inhibitor-loaded conducting polymer capsules for active corrosion protection of coating defects. Corros. Sci. 112, 138–149 (2016).   \n53. Zheng, X., Zheng, P. & Zhang, R.-Z. Machine learning material properties from the periodic table using convolutional neural networks. Chem. Sci. 9, 8426–8432 (2018).   \n54. De Myttenaere, A., Golden, B., Le Grand, B. & Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 192, 38–48 (2016).   \n55. Fukutani, T., Miyazawa, K., Iwata, S. & Satoh, H. G-RMSD: Root mean square deviation based method for three-dimensional molecular similarity determination. Bull. Chem. Soc. Jpn. 94, 655–665 (2021).   \n56. Uyanık, T., Karatuğ, Ç. & Arslanoğlu, Y. Machine learning approach to ship fuel consumption: A case of container vessel. Transp. Res. D.-T. E. 84, 102389 (2020).   \n57. Chen, S., Cao, H., Ouyang, Q., Wu, X. & Qian, Q. ALDS: An active learning method for multi-source materials data screening and materials design. Mater. Des. 223,   \n111092 (2022).   \n58. Faraji Niri, M., Reynolds, C., Román Ramírez, L. A. A., Kendrick, E. & Marco, J. Systematic analysis of the impact of slurry coating on manufacture of Li-ion battery electrodes via explainable machine learning. Energy Storage Mater. 51,   \n223–238 (2022).  \n\n59. Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning. 4 (Springer), (2006).",
        "category": " References"
    },
    {
        "id": 15,
        "chunk": "# ACKNOWLEDGEMENTS  \n\nThis work is supported by National Key R&D Program of China (2022YFB3808803).",
        "category": " References"
    },
    {
        "id": 16,
        "chunk": "# AUTHOR CONTRIBUTIONS  \n\nT.L.: investigation, methodology, and writing—original draft. Z.C: investigation and methodology. J.Y.: investigation. L.M.: investigation. A.M.: writing—review and editing. D.Z.: supervision, conceptualization, methodology, and writing—review and editing.",
        "category": " References"
    },
    {
        "id": 17,
        "chunk": "# COMPETING INTERESTS  \n\nThe authors declare no competing interests.",
        "category": " Conclusions"
    },
    {
        "id": 18,
        "chunk": "# ADDITIONAL INFORMATION  \n\nCorrespondence and requests for materials should be addressed to Dawei Zhang.  \n\nReprints and permission information is available at http://www.nature.com/ reprints  \n\nPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.  \n\nOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http:// creativecommons.org/licenses/by/4.0/.  \n\n$\\circledcirc$ The Author(s) 2024",
        "category": " References"
    }
]