当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel kernel density estimator based on ensemble unbiased cross-validation
Information Sciences ( IF 8.1 ) Pub Date : 2021-09-17 , DOI: 10.1016/j.ins.2021.09.045
Yu-Lin He 1, 2 , Xuan Ye 1 , De-Fa Huang 1 , Joshua Zhexue Huang 1, 2 , Jun-Hai Zhai 3
Affiliation  

Unbiased cross-validation (UCV) is a commonly-used method to calculate the optimal bandwidth for the kernel density estimator (KDE), which estimates the underlying probability density function (PDF) for a given data set. Since the UCV method was proposed, there have been few studies that have pointed out its instability when determining the KDE bandwidth. Following the principle of stability improvement, this paper presents a novel ensemble UCV based KDE (EUCV-KDE), which determines the expectation of an estimated PDF using an ensemble of data-block based UCVs rather than a single data-point based UCV. To derive the optimal bandwidth, a novel objective function is designed for EUCV-KDE by considering the empirical and structural risk of KDE together. We validate the rationality and effectiveness of EUCV-KDE on 10 probability distributions. The experimental results show that EUCV-KDE is convergent as the number of data-block based UCVs increases and can obtain a more stable and better prediction performance than the classical UCV-KDE and the revisited cross-validation (RCV) based KDE (RCV-KDE). In addition, a real-world application based on UK climate data is provided to further validate the effectiveness of EUCV-KDE by determining the optimal bandwidth for Nadaraya-Watson kernel regression estimator.



中文翻译:

基于集成无偏交叉验证的新型核密度估计器

无偏交叉验证 (UCV) 是计算内核密度估计器 (KDE) 的最佳带宽的常用方法,它估计给定数据集的潜在概率密度函数 (PDF)。自从提出 UCV 方法以来,很少有研究指出其在确定 KDE 带宽时的不稳定性。遵循稳定性改进的原则,本文提出了一种新的基于集合 UCV 的 KDE(EUCV-KDE),它使用基于数据块的 UCV 集合而不是基于单个数据点的 UCV 来确定估计 PDF 的期望。为了获得最佳带宽,通过同时考虑 KDE 的经验和结构风险,为 EUCV-KDE 设计了一个新的目标函数。我们验证了 EUCV-KDE 在 10 个概率分布上的合理性和有效性。实验结果表明,随着基于数据块的 UCV 数量的增加,EUCV-KDE 是收敛的,并且比经典的 UCV-KDE 和基于重访交叉验证(RCV)的 KDE(RCV- KDE)。此外,一个基于现实世界的应用程序提供英国气候数据以通过确定 Nadaraya-Watson 核回归估计器的最佳带宽来进一步验证 EUCV-KDE 的有效性。

更新日期:2021-09-17
down
wechat
bug