当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building Highly Reliable Quantitative Structure-Activity Relationship Classification Models Using the Rivality Index Neighborhood Algorithm with Feature Selection.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-01-15 , DOI: 10.1021/acs.jcim.9b00706
Irene Luque Ruiz 1 , Miguel Ángel Gómez-Nieto 1
Affiliation  

Dimensionality reduction of the data set representation for the construction of the quantitative structure-activity relationship classification models is an important research subject for the interpretability of the models and the computational cost efficiency of the classification algorithms. Feature selection techniques are appropriate as only a short number of relevant features should be used in the classification process because irrelevant and redundant features should be discarded, the same as the noninterpretable ones. In this paper, we propose an embedded feature selection technique for the construction of classification models using the rivality index neighborhood (RINH) algorithm. This technique uses a filter selection in the preprocessing stage considering the selectivity of the features as a selection criterion and a wrapper technique in the processing stage based on the improvement of the accuracy and reliability of the models generated using the RINH algorithm with LTN and GTN functions. The results obtained using the RINH algorithm with and without the selection of features and compared with those results obtained using 14 machine learning algorithms have demonstrated that the feature selection technique proposed in this paper is capable of clearly building more accurate and reliable models, reducing the data dimensionality around 90%, and generating high robust and interpretable models.

中文翻译:

使用具有特征选择的相对性指数邻域算法构建高度可靠的定量构效关系分类模型。

数据集表示的降维对于定量结构-活性关系分类模型的构建是模型的可解释性和分类算法的计算成本效率的重要研究课题。特征选择技术是适当的,因为在分类过程中仅应使用少量的相关特征,因为无关和冗余的特征应被丢弃,这与不可解释的特征相同。在本文中,我们提出了一种使用竞争性指数邻域(RINH)算法构建分类模型的嵌入式特征选择技术。该技术在预处理阶段使用过滤器选择,将特征的选择性作为选择标准,并在处理阶段使用包装技术,这是基于对使用带有LTN和GTN函数的RINH算法生成的模型的准确性和可靠性的改进。使用RINH算法在不选择特征的情况下获得的结果以及与使用14种机器学习算法获得的结果相比较表明,本文提出的特征选择技术能够清晰地建立更准确,可靠的模型,从而减少了数据量大约90%的维度,并生成高度可靠且可解释的模型。
更新日期:2020-01-16
down
wechat
bug