当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2020-07-29 , DOI: 10.1109/tii.2020.3012658
Donghai Guan , Kai Chen , Guangjie Han , Shuqiang Huang , Weiwei Yuan , Mohsen Guizani , Lei Shu

The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.

中文翻译:

工业信息学中用于高维数据的一种新的类噪声检测方法

工业信息学中的数据可能是高维的且贴错了标签。不相关或嘈杂的特征对检测高维错误标签提出了重大挑战。传统方法通常采用两步解决方案,首先找到相关子空间,然后将其用于错误标记检测。这种分两步的方法难以提供最佳的误贴标签检测性能,因为它将特征选择和标签错误检测的过程分开了。为了解决这个问题,在本文中,我们将两个步骤集成在一起,并提出了一个顺序集成噪声滤波器(SENF)。在SENF中,选择相关特征并将其用于为每个实例生成噪声分数。这些噪声分数不断地指导回归学习中的特征选择。从而,SENF属于顺序集成学习的范围。我们在具有高维度和大量标签噪声的几个基准数据集上评估我们的方法。结果表明,SENF明显优于其他现有的标签噪声检测方法。
更新日期:2020-07-29
down
wechat
bug