当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On bandwidth choice for spatial data density estimation
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2020-04-21 , DOI: 10.1111/rssb.12367
Zhenyu Jiang 1 , Nengxiang Ling 2 , Zudi Lu 1 , Dag Tj⊘stheim 3 , Qiang Zhang 4
Affiliation  

Bandwidth choice is crucial in spatial kernel estimation in exploring non‐Gaussian complex spatial data. The paper investigates the choice of adaptive and non‐adaptive bandwidths for density estimation given data on a spatial lattice. An adaptive bandwidth depends on local data and hence adaptively conforms with local features of the spatial data. We propose a spatial cross‐validation (SCV) choice of a global bandwidth. This is done first with a pilot density involved in the expression for the adaptive bandwidth. The optimality of the procedure is established, and it is shown that a non‐adaptive bandwidth choice comes out as a special case. Although the cross‐validation idea has been popular for choosing a non‐adaptive bandwidth in data‐driven smoothing of independent and time series data, its theory and application have not been much investigated for spatial data. For the adaptive case, there is little theory even for independent data. Conditions that ensure asymptotic optimality of the SCV‐selected bandwidth are derived, actually, also extending time series and independent data optimality results. Further, for the adaptive bandwidth with an estimated pilot density, oracle properties of the resultant density estimator are obtained asymptotically as if the true pilot were known. Numerical simulations show that finite sample performance of the SCV adaptive bandwidth choice works quite well. It outperforms the existing R routines such as the ‘rule of thumb’ and the so‐called ‘second‐generation’ Sheather–Jones bandwidths for moderate and big data sets. An empirical application to a set of spatial soil data is further implemented with non‐Gaussian features significantly identified.

中文翻译:

关于空间数据密度估计的带宽选择

在探索非高斯复杂空间数据时,带宽选择对于​​空间核估计至关重要。本文研究了给定空间格上数据的密度估计的自适应带宽和非自适应带宽的选择。自适应带宽取决于本地数据,因此自适应地符合空间数据的本地特征。我们提出了全局带宽的空间交叉验证(SCV)选择。首先使用自适应带宽表达式中涉及的导频密度完成此操作。确定了程序的最优性,并表明非自适应带宽选择是一种特殊情况。尽管交叉验证的想法已广泛用于在独立和时间序列数据的数据驱动平滑中选择非自适应带宽,对于空间数据,尚未对其理论和应用进行深入研究。对于自适应情况,即使对于独立数据也几乎没有理论。实际上,得出了确保SCV选择带宽的渐近最优的条件,实际上,这也扩展了时间序列和独立的数据最优结果。此外,对于具有估计的导频密度的自适应带宽,渐近地获得了最终密度估计器的预言性质,就好像已知真正的导频一样。数值模拟表明,SCV自适应带宽选择的有限样本性能很好。它的性能优于现有的R例程,例如用于中等和大数据集的“经验法则”和所谓的“第二代” Sheather-Jones带宽。
更新日期:2020-04-21
down
wechat
bug