当前位置: X-MOL 学术arXiv.math.ST › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bump hunting through density curvature features
arXiv - MATH - Statistics Theory Pub Date : 2022-07-30 , DOI: arxiv-2208.00174
José E. Chacón, Javier Fernández Serrano

Bump hunting deals with finding in sample spaces meaningful data subsets known as bumps. These have traditionally been conceived as modal or concave regions in the graph of the underlying density function. We define an abstract bump construct based on curvature functionals of the probability density. Then, we explore several alternative characterizations involving derivatives up to second order. In particular, a suitable implementation of Good and Gaskins' original concave bumps is proposed in the multivariate case. Moreover, we bring to exploratory data analysis concepts like the mean curvature and the Laplacian that have produced good results in applied domains. Our methodology addresses the approximation of the curvature functional with a plug-in kernel density estimator. We provide theoretical results that assure the asymptotic consistency of bump boundaries in the Hausdorff distance with affordable convergence rates. We also present asymptotically valid and consistent confidence regions bounding curvature bumps. The theory is illustrated through several use cases in sports analytics with datasets from the NBA, MLB and NFL. We conclude that the different curvature instances effectively combine to generate insightful visualizations.

中文翻译:

通过密度曲率特征进行凹凸搜索

凹凸搜寻处理在样本空间中寻找有意义的数据子集,称为凹凸。这些传统上被认为是底层密度函数图中的模态或凹面区域。我们基于概率密度的曲率泛函定义了一个抽象的凹凸结构。然后,我们探索了涉及高达二阶导数的几种替代表征。特别是,在多变量情况下提出了 Good 和 Gaskins 的原始凹凸块的合适实现。此外,我们引入了探索性数据分析概念,例如平均曲率和拉普拉斯算子,这些概念在应用领域产生了良好的效果。我们的方法使用插件内核密度估计器解决曲率函数的近似问题。我们提供的理论结果可确保 Hausdorff 距离中凹凸边界的渐近一致性,并具有可承受的收敛速度。我们还提出了围绕曲率凹凸的渐近有效且一致的置信区域。该理论通过来自 NBA、MLB 和 NFL 的数据集的体育分析中的几个用例进行了说明。我们得出的结论是,不同的曲率实例有效地结合起来产生有洞察力的可视化。
更新日期:2022-08-02
down
wechat
bug