SAR and QSAR in Environmental Research ( IF 2.3 ) Pub Date : 2021-04-07 , DOI: 10.1080/1062936x.2021.1902387 S. Wang 1 , M. Cheng 1 , L. Zhou 1 , Y. Dai 1 , Y. Dang 1 , X. Ji 1
ABSTRACT
Linear and nonlinear quantitative structure–property relationship (QSPR) models were developed based on a dataset with 65 polymer-solvent combinations. Seven quantum chemical descriptors, dipole moment, hardness, chemical potential, electrophilicity index, total energy, HOMO and LUMO orbital energies, were calculated with density functional theory at the B3LYP/6-31 G(d) level for polymers and solvents. Considering the strong correlation between intrinsic viscosity and weight, size, shape as well as topological structure of polymers and solvents, topological descriptors were also applied in this work. Meanwhile, the most appropriate polymer structure representation was investigated by considering 1–5 monomeric repeating units. The molecular descriptors were first screened by using the genetic algorithms-multiple linear regression (GA-MLR), with coefficient of determinations () of 0.78 and 0.83 for the training set and the prediction set, respectively. The support vector machine model (SVM) model based on the selected descriptors subset showed a value of 0.95 for the training set and 0.93 for the prediction set. All statistical results suggest that the established QSPR models have good predictability. Furthermore, a new test set obtained from the literature was used for further validation. The values were 0.81 for the MLR model and 0.90 for the SVM model.
中文翻译:
基于密度泛函理论的聚合物-溶剂组合特性粘度的QSPR建模
摘要
基于具有65种聚合物-溶剂组合的数据集,开发了线性和非线性的定量结构-性质关系(QSPR)模型。使用密度泛函理论在聚合物和溶剂的B3LYP / 6-31 G(d)水平上计算了七个量子化学描述符,即偶极矩,硬度,化学势,亲电指数,总能量,HOMO和LUMO轨道能。考虑到特性粘度与聚合物,溶剂的重量,尺寸,形状以及拓扑结构之间存在很强的相关性,因此在此工作中也使用了拓扑描述符。同时,通过考虑1-5个单体重复单元,研究了最合适的聚合物结构表示形式。首先使用遗传算法-多元线性回归(GA-MLR)筛选分子描述子,分别为训练集和预测集的0.78和0.83)。基于所选描述符子集的支持向量机模型(SVM)模型显示了训练集的预测值为0.95,预测集的预测值为0.93。所有统计结果表明,已建立的QSPR模型具有良好的可预测性。此外,从文献中获得的新测试集用于进一步验证。这 对于MLR模型,该值为0.81;对于SVM模型,该值为0.90。