当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fuzzy regression functions with a noise cluster and the impact of outliers on mainstream machine learning methods in the regression setting
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-07-11 , DOI: 10.1016/j.asoc.2020.106535
Srinivas Chakravarty , Haydar Demirhan , Furkan Baser

The presence of outliers in the dependent and/or independent features distorts predictions with machine learning techniques and may lead to erroneous conclusions. It is important to implement methods that are robust against the outliers to make reliable predictions and to know the accuracy of the existing methods when data is contaminated with outliers. The focus of this study is to propose a robust fuzzy regression functions (FRFN) approach against the outliers and evaluate the performance of the proposed and several mainstream machine learning approaches in the presence of outliers for the regression problem. The proposed FRFN approach is based on fuzzy k-means clustering with a noise cluster. We compare the accuracy of Artificial Neural Networks (ANN), Support Vector Machines (SVM) and the proposed FRFN approaches with different training algorithms/kernel functions via simulated and real benchmark datasets. In total, accuracies of 36 ANN, SVM, and FRNF implementations with training algorithms and kernel and loss functions have been evaluated and compared to each other with samples containing outliers via a Monte Carlo simulation setting. It is observed in both Monte Carlo simulations and applications with benchmark dataset that FRFN with ANN trained with Bayes regularization algorithm and FRFN with SVM with Gaussian kernel outperforms the classical implementations of ANN and SVMs under the existence of outliers. The proposed noise cluster implementation considerably increases the robustness of fuzzy regression functions against outliers.



中文翻译:

模糊回归函数具有噪声簇,并且在回归设置中离群值对主流机器学习方法的影响

从属和/或独立特征中离群值的存在会扭曲机器学习技术的预测,并可能导致错误的结论。重要的是要实现对异常值具有鲁棒性的方法,以做出可靠的预测,并在数据被异常值污染时了解现有方法的准确性。本研究的重点是针对异常值提出一种鲁棒的模糊回归函数(FRFN)方法,并在存在异常值的情况下评估所提出的和几种主流机器学习方法的性能。所提出的FRFN方法基于带有噪声聚类的模糊k均值聚类。我们比较了人工神经网络(ANN)的准确性,支持向量机(SVM)和建议的FRFN方法,通过模拟和真实基准数据集具有不同的训练算法/内核功能。总共评估了36种ANN,SVM和FRNF实现的准确性,并采用了训练算法以及核函数和损失函数,并通过蒙特卡洛模拟设置将它们与包含异常值的样本进行了比较。在蒙特卡洛模拟和带有基准数据集的应用中都可以观察到,在存在异常值的情况下,使用贝叶斯正则化算法训练的ANN的FRFN和使用高斯核的SVM的FRFN优于传统的ANN和SVM。拟议的噪声簇实现大大提高了模糊回归函数对离群值的鲁棒性。

更新日期:2020-07-11
down
wechat
bug