当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain.
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-08-30 , DOI: 10.1080/1062936x.2019.1644666
I Luque Ruiz 1 , M Á Gómez-Nieto 1
Affiliation  

The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.



中文翻译:

具有密度和距离加权方案的竞争性指数邻域算法,用于建立具有高可靠适用性域的鲁棒QSAR分类模型。

竞争性指数(RI)是分子与它们的第一个最近邻之间的归一化距离测量值,可基于其最近邻的已知活性对分子的活性提供可靠的预测。RI的负值表示可以通过统计算法正确分类的分子,反之亦然,该索引的正值表示通过分类算法检测为异常值的分子。在本文中,我们描述了一种基于RI的分类算法并且我们基于在邻居阈值确定的值下对数据集的每个分子的分子邻域的不同特征的测量,提出了四种加权方案(内核)进行计算。获得的结果表明,与许多更常用的和众所周知的机器学习算法相比,基于RI的拟议分类算法可生成更可靠和更强大的分类模型。通过使用20个大小和可建模性不同的平衡和不平衡基准数据集,这些结果已得到验证和证实。生成的分类模型提供了有关数据集分子,模型的适用范围以及预测的可靠性的有价值的信息。

更新日期:2019-08-30
down
wechat
bug