当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
From Distance Correlation to Multiscale Graph Correlation
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2019-04-11 , DOI: 10.1080/01621459.2018.1543125
Cencheng Shen 1 , Carey E. Priebe 2 , Joshua T. Vogelstein 3
Affiliation  

Abstract Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation (Dcorr)—a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments—to the multiscale graph correlation (MGC). By using the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence, and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to Dcorr and other popular methods. Supplementary materials for this article are available online.

中文翻译:

从距离相关到多尺度图相关

摘要 理解和开发一种可以检测一般依赖性的相关性度量不仅对统计学和机器学习至关重要,而且对大数据时代的一般科学发现也至关重要。在本文中,我们建立了一个新的框架,将距离相关性 (Dcorr)(一种最近提出的相关性度量)推广到多尺度图相关性(MGC)。通过使用特征函数并结合最近邻机制,我们将局部距离相关性的总体版本形式化,定义给定依赖项中的最佳尺度,并将最佳局部相关性命名为 MGC。新的理论框架激发了理论上合理的样本 MGC,并允许证明许多理想的属性,包括样本版本的普遍一致性、收敛性和几乎无偏性。与 Dcorr 和其他流行方法相比,MGC 的优势通过具有线性、非线性、单变量、多变量和噪声依赖性的一组综合模拟来说明,在单调依赖性中几乎没有损失,同时在一般依赖性中获得更好的性能。本文的补充材料可在线获取。多变量和噪声依赖,与 Dcorr 和其他流行方法相比,它在单调依赖中几乎没有损失,同时在一般依赖中获得更好的性能。本文的补充材料可在线获取。多变量和噪声依赖,与 Dcorr 和其他流行方法相比,它在单调依赖中几乎没有损失,同时在一般依赖中获得更好的性能。本文的补充材料可在线获取。
更新日期:2019-04-11
down
wechat
bug