当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Testing Independence Under Biased Sampling
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-06-14 , DOI: 10.1080/01621459.2021.1912758
Yaniv Tenzer 1 , Micha Mandel 1 , Or Zuk
Affiliation  

Abstract

Testing for dependence between pairs of random variables is a fundamental problem in statistics. In some applications, data are subject to selection bias that can create spurious dependence. An important example is truncation models, in which observed pairs are restricted to a specific subset of the X-Y plane. Standard tests for independence are not suitable in such cases, and alternative tests that take the selection bias into account are required. Here, we generalize the notion of quasi-independence with respect to the sampling mechanism, and study the problem of detecting any deviations from it. We develop two tests statistics motivated by the classic Hoeffding’s statistic, and use two approaches to compute their distribution under the null: (i) a bootstrap-based approach, and (ii) a permutation-test with nonuniform probability of permutations. We also handle an important application to the case of censoring with truncation, by estimating the biased sampling mechanism from the data. We prove the validity of the tests, and show, using simulations, that they improve power compared to competing methods for important special cases. The tests are applied to four datasets, two that are subject to truncation, with and without censoring, and two to bias mechanisms related to length bias.



中文翻译:

在有偏抽样下测试独立性

摘要

测试随机变量对之间的依赖性是统计学中的一个基本问题。在某些应用程序中,数据会受到选择偏差的影响,从而产生虚假依赖。一个重要的例子是截断模型,其中观察到的对被限制在 XY 平面的特定子集中。独立性的标准测试不适用于这种情况,需要考虑选择偏差的替代测试。在这里,我们概括了关于采样机制的准独立性的概念,并研究了检测任何偏离它的问题。我们根据经典的 Hoeffding 统计量开发了两个测试统计量,并使用两种方法来计算它们在 null 下的分布:(i) 基于引导程序的方法,以及 (ii) 具有非均匀排列概率的排列检验。我们还通过从数据中估计有偏差的抽样机制来处理截断审查案例的重要应用。我们证明了测试的有效性,并使用模拟表明,与重要特殊情况下的竞争方法相比,它们提高了功效。这些测试应用于四个数据集,两个数据集会被截断,有和没有审查,两个数据集会被应用于与长度偏差相关的偏差机制。

更新日期:2021-06-14
down
wechat
bug