GMDH-Based Outlier Detection Model in Classification Problems,Journal of Systems Science and Complexity

当前位置： X-MOL 学术 › J. Syst. Sci. Complex. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GMDH-Based Outlier Detection Model in Classification Problems
Journal of Systems Science and Complexity ( IF 2.6 ) Pub Date : 2020-08-04 , DOI: 10.1007/s11424-020-9002-6
Ling Xie , Yanlin Jia , Jin Xiao , Xin Gu , Jing Huang

In many practical classification problems, datasets would have a portion of outliers, which could greatly affect the performance of the constructed models. In order to address this issue, we apply the group method of data handing (GMDH) neural network in outlier detection. This study builds a GMDH-based outlier detection (GOD) model. This model first implements feature selection in the training set L using GMDH neural network. Then a new training set L′ can be obtained by mapping the selected key feature subset. Next, a linear regression model can be constructed in the set L′ by ordinary least squares estimation. Further, it eliminates a sample from the set L′ randomly every time, and then rebuilds a linear regression model. Finally, outlier detection is realized by calculating Cook’s distance for each sample. Four different customer classification datasets are used to conduct experiments. Results show that GOD model can effectively eliminate outliers, and compared with the five existing outlier detection models, it generally performs significantly better. This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.

中文翻译：

分类问题中基于GMDH的离群值检测模型

在许多实际的分类问题中，数据集会有一部分异常值，这可能会极大地影响构建模型的性能。为了解决这个问题，我们将数据处理（GMDH）神经网络的分组方法应用于离群值检测。这项研究建立了基于GMDH的离群值检测（GOD）模型。该模型首先使用GMDH神经网络在训练集L中实现特征选择。然后，可以通过映射选择的关键特征子集来获得新的训练集L '。接下来，可以通过普通最小二乘估计在集合L '中构建线性回归模型。此外，它从集合L中消除了一个样本每次随机′，然后重建线性回归模型。最后，通过计算每个样本的库克距离来实现离群值检测。四个不同的客户分类数据集用于进行实验。结果表明，GOD模型可以有效消除异常值，并且与现有的五个异常值检测模型相比，它的性能通常要好得多。这表明消除离群值可以有效地提高训练后的分类模型的分类精度。

更新日期：2020-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文