当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Outlier Detection Ensemble with Embedded Feature Selection
arXiv - CS - Machine Learning Pub Date : 2020-01-15 , DOI: arxiv-2001.05492 Li Cheng, Yijie Wang, Xinwang Liu, Bin Li
arXiv - CS - Machine Learning Pub Date : 2020-01-15 , DOI: arxiv-2001.05492 Li Cheng, Yijie Wang, Xinwang Liu, Bin Li
Feature selection places an important role in improving the performance of
outlier detection, especially for noisy data. Existing methods usually perform
feature selection and outlier scoring separately, which would select feature
subsets that may not optimally serve for outlier detection, leading to
unsatisfying performance. In this paper, we propose an outlier detection
ensemble framework with embedded feature selection (ODEFS), to address this
issue. Specifically, for each random sub-sampling based learning component,
ODEFS unifies feature selection and outlier detection into a pairwise ranking
formulation to learn feature subsets that are tailored for the outlier
detection method. Moreover, we adopt the thresholded self-paced learning to
simultaneously optimize feature selection and example selection, which is
helpful to improve the reliability of the training set. After that, we design
an alternate algorithm with proved convergence to solve the resultant
optimization problem. In addition, we analyze the generalization error bound of
the proposed framework, which provides theoretical guarantee on the method and
insightful practical guidance. Comprehensive experimental results on 12
real-world datasets from diverse domains validate the superiority of the
proposed ODEFS.
中文翻译:
具有嵌入式特征选择的异常值检测集成
特征选择在提高异常值检测性能方面发挥着重要作用,尤其是对于噪声数据。现有方法通常分别执行特征选择和异常值评分,这会选择可能无法最佳地用于异常值检测的特征子集,从而导致性能不令人满意。在本文中,我们提出了一个具有嵌入式特征选择(ODEFS)的异常值检测集成框架来解决这个问题。具体来说,对于每个基于随机子采样的学习组件,ODEFS 将特征选择和异常值检测统一为成对排序公式,以学习为异常值检测方法量身定制的特征子集。此外,我们采用阈值自定进度学习来同时优化特征选择和示例选择,这有助于提高训练集的可靠性。之后,我们设计了一种具有证明收敛性的替代算法来解决由此产生的优化问题。此外,我们分析了所提出框架的泛化误差界,为该方法提供了理论保证和富有洞察力的实践指导。来自不同领域的 12 个真实世界数据集的综合实验结果验证了所提出的 ODEFS 的优越性。
更新日期:2020-01-17
中文翻译:
具有嵌入式特征选择的异常值检测集成
特征选择在提高异常值检测性能方面发挥着重要作用,尤其是对于噪声数据。现有方法通常分别执行特征选择和异常值评分,这会选择可能无法最佳地用于异常值检测的特征子集,从而导致性能不令人满意。在本文中,我们提出了一个具有嵌入式特征选择(ODEFS)的异常值检测集成框架来解决这个问题。具体来说,对于每个基于随机子采样的学习组件,ODEFS 将特征选择和异常值检测统一为成对排序公式,以学习为异常值检测方法量身定制的特征子集。此外,我们采用阈值自定进度学习来同时优化特征选择和示例选择,这有助于提高训练集的可靠性。之后,我们设计了一种具有证明收敛性的替代算法来解决由此产生的优化问题。此外,我们分析了所提出框架的泛化误差界,为该方法提供了理论保证和富有洞察力的实践指导。来自不同领域的 12 个真实世界数据集的综合实验结果验证了所提出的 ODEFS 的优越性。