当前位置: X-MOL 学术ACS Appl. Polym. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Small Data Machine Learning: Classification and Prediction of Poly(ethylene terephthalate) Stabilizers Using Molecular Descriptors
ACS Applied Polymer Materials ( IF 4.4 ) Pub Date : 2020-11-23 , DOI: 10.1021/acsapm.0c00921
Aaron L. Liu 1 , Rahul Venkatesh 1 , Michael McBride 1 , Elsa Reichmanis 1, 2, 3 , J. Carson Meredith 1 , Martha A. Grover 1
Affiliation  

Experimental data from a patent were analyzed to learn about the small molecule additives that were most effective in mitigating the degradation of polyethylene terephthalate. Two sets of molecular descriptors were calculated for a dataset of 39 additive candidates; unsupervised and supervised analyses were performed to determine the most influential structural features that led to reduced degradation. A clustering approach revealed evidence that performance differences had some structural pattern dependence on the molecular descriptors that were employed. To pinpoint the features responsible for those physical differences, a reduced design region approach was applied to analyze descriptors both individually and in multiple dimensions to determine the effectiveness in a binary classification of high and low performances. For each molecular descriptor type, two or three influential descriptors were identified and justified with respect to the additive performance and physicochemical ability to mitigate degradation. Random forest models were constructed with relatively high predictability for both MACCS-166 (AUC = 0.86) and alvaDesc molecular descriptors (AUC = 0.93). We compare molecular descriptor methods for their ability to construct classifiers and to prioritize experimental work toward building a rich dataset. We find that, in small materials datasets, understanding the underlying physicochemical behavior is indispensable for validating the effectiveness of machine learning models.

中文翻译:

小数据机器学习:使用分子描述符对聚对苯二甲酸乙二酯稳定剂进行分类和预测

分析了一项专利的实验数据,以了解在减轻聚对苯二甲酸乙二醇酯降解方面最有效的小分子添加剂。计算了39个候选添加物的数据集的两组分子描述子;进行了无监督和有监督的分析,以确定导致降解降低的最有影响力的结构特征。聚类方法揭示了证据,表明性能差异对所采用的分子描述符具有一定的结构模式依赖性。为了查明造成这些物理差异的特征,使用了缩小的设计区域方法来单独和多维分析描述符,以确定对高性能和低性能进行二进制分类的有效性。对于每种分子描述符类型,就添加剂性能和减轻降解的理化能力而言,已确定了两个或三个有影响力的描述符并进行了论证。对于MACCS-166(AUC = 0.86)和alvaDesc分子描述符(AUC = 0.93)都具有相对较高的可预测性的随机森林模型。我们比较分子描述符方法构造分类器的能力,并为建立丰富的数据集确定实验工作的优先级。我们发现,在小型材料数据集中,了解潜在的理化行为对于验证机器学习模型的有效性是必不可少的。对于MACCS-166(AUC = 0.86)和alvaDesc分子描述符(AUC = 0.93)都具有相对较高的可预测性的随机森林模型。我们比较分子描述符方法构造分类器的能力,并为建立丰富的数据集确定实验工作的优先级。我们发现,在小型材料数据集中,了解潜在的理化行为对于验证机器学习模型的有效性是必不可少的。对于MACCS-166(AUC = 0.86)和alvaDesc分子描述符(AUC = 0.93)都具有相对较高的可预测性的随机森林模型。我们比较分子描述符方法构造分类器的能力,并为建立丰富的数据集确定实验工作的优先级。我们发现,在小型材料数据集中,了解潜在的理化行为对于验证机器学习模型的有效性是必不可少的。
更新日期:2020-12-11
down
wechat
bug