当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Chi-Square Test of Distance Correlation
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-07-19 , DOI: 10.1080/10618600.2021.1938585
Cencheng Shen 1 , Sambit Panda 2 , Joshua T Vogelstein 2, 3
Affiliation  

Abstract

Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any type of dependency structure given sufficient sample size. One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data. To overcome the difficulty, in this article, we propose a chi-squared test for distance correlation. Method-wise, the chi-squared test is nonparametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel. The test exhibits a similar testing power as the standard permutation test, and can be used for K-sample and partial testing. Theory-wise, we show that the underlying chi-squared distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-squared test can be valid and universally consistent for testing independence, and establish a testing power inequality with respect to the permutation test. Supplementary files for this article are available online.



中文翻译:

距离相关性的卡方检验

摘要

距离相关性最近在数据科学界引起了很多关注:样本统计量很容易计算,并且当且仅当独立时渐近等于零,这使得它成为在给定足够样本量的情况下发现任何类型的依赖结构的理想选择。一个主要的瓶颈是测试过程:因为距离相关性的零分布取决于潜在的随机变量和度量选择,它通常需要一个排列测试来估计零和计算p-value,这对于大量数据来说是非常昂贵的。为了克服这个困难,在本文中,我们提出了距离相关性的卡方检验。在方法方面,卡方检验是非参数的,速度极快,适用于使用任何强负型度量或特征核的偏差校正距离相关。该检验表现出与标准置换检验相似的检验能力,可用于 K 样本检验和部分检验。从理论上讲,我们表明底层卡方分布很好地近似并支配上尾的极限零分布,证明卡方检验对于测试独立性是有效且普遍一致的,并建立了关于排列测试。本文的补充文件可在线获取。

更新日期:2021-07-19
down
wechat
bug