当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection.
Interdisciplinary Sciences: Computational Life Sciences ( IF 3.9 ) Pub Date : 2019-09-06 , DOI: 10.1007/s12539-019-00346-7
Youquan Liu 1 , Yanzhi Guo 2 , Wengang Wu 1 , Ying Xiong 1 , Chuan Sun 1 , Li Yuan 1 , Menglong Li 2
Affiliation  

BACKGROUND Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces. PURPOSE Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives. METHODS First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors. RESULTS Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data. CONCLUSION Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.

中文翻译:

基于机器学习的苯并咪唑衍生物作为缓蚀剂的QSAR模型,通过综合特征选择来实现。

背景技术对于抑制剂分子的抑制效率(IE)的计算预测是设计可有效抑制对金属表面腐蚀的新颖分子的关键补充方法。目的在这里我们致力于为苯并咪唑衍生物的抑制效率(IE)开发一种新的基于机器学习的预测器。方法首先,基于3-D结构,从能量,电子,拓扑,物理化学和空间特性的各个方面对抑制剂分子进行了全面的数值表示,获得了150个有效的结构描述符。然后,对这些结构描述符进行了彻底的研究。进行了基于多重共线性的聚类分析,以去除线性相关的特征变量,因此产生了47个特征聚类。与此同时,使用随机森林(RF)的基尼重要性来进一步测量每个聚类中描述符的贡献,并选择了47个非线性描述符,其对应聚类中基尼重要性得分最高。此外,考虑到可用抑制剂的数量有限,根据47个描述符的基尼重要性评分排名表构建了不同的特征子集。结果最后,通过留一法交叉验证对基于不同特征子集的支持向量机(SVM)模型进行了测试。通过比较,基于Poly核获得了具有前11个描述符的最优SVM模型。该模型产生了令人鼓舞的性能,相关系数(R)和均方根误差(RMSE)分别为0.9589和4.45,这表明我们提出的方法可以为当前数据提供最佳性能。结论根据我们的模型,设计了6种新的苯并咪唑分子,并且通过该模型预测的IE值表明,其中两个具有作为潜在的优异缓蚀剂的高潜力。
更新日期:2019-11-01
down
wechat
bug