当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive robust local online density estimation for streaming data
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-02-03 , DOI: 10.1007/s13042-021-01275-y
Zhong Chen 1 , Zhide Fang 2 , Victor Sheng 3 , Jiabin Zhao 4 , Wei Fan 5 , Andrea Edwards 1 , Kun Zhang 1
Affiliation  

Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).



中文翻译:

流数据的自适应鲁棒本地在线密度估计

准确的在线密度估计对于流行于流数据的众多应用程序至关重要。现有的密度估计在线方法在面对概念漂移和嘈杂的流数据时缺乏快速的适应性和鲁棒性,导致近似延迟甚至恶化。为了缓解这个问题,在这项工作中,我们首先提出了一种自适应局部在线核密度估计器(ALoKDE),用于对数据流进行实时密度估计。ALoKDE 由两个紧密集成的策略组成:(1)概念漂移检测的统计测试和(2)当漂移确实发生时的自适应加权局部在线密度估计。具体来说,使用加权形式,ALoKDE 试图通过考虑最新学习分布的统计特征以及每个传入实例可能引入的任何潜在分布变化来提供无偏估计。进一步开发了 ALoKDE 的稳健变体,即 R-ALoKDE,以有效处理具有不同类型/噪声级别的数据流。此外,我们分析了 ALoKDE 和 R-ALoKDE 的渐近特性,并得出了它们关于偏差、方差、MSE 和 MISE 的理论误差界限。对各种人工和现实世界(噪声)流数据的广泛比较研究证明了 ALoKDE 和 R-ALoKDE 在在线密度估计和实时分类(带噪声)方面的有效性。进一步开发以有效处理具有不同类型/噪声级别的数据流。此外,我们分析了 ALoKDE 和 R-ALoKDE 的渐近特性,并得出了它们关于偏差、方差、MSE 和 MISE 的理论误差界限。对各种人工和现实世界(噪声)流数据的广泛比较研究证明了 ALoKDE 和 R-ALoKDE 在在线密度估计和实时分类(带噪声)方面的有效性。进一步开发以有效处理具有不同类型/噪声级别的数据流。此外,我们分析了 ALoKDE 和 R-ALoKDE 的渐近特性,并得出了它们关于偏差、方差、MSE 和 MISE 的理论误差界限。对各种人工和现实世界(噪声)流数据的广泛比较研究证明了 ALoKDE 和 R-ALoKDE 在在线密度估计和实时分类(带噪声)方面的有效性。

更新日期:2021-02-03
down
wechat
bug