Test of significance for high-dimensional longitudinal data,Annals of Statistics

当前位置： X-MOL 学术 › Ann. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Test of significance for high-dimensional longitudinal data
Annals of Statistics ( IF 4.5 ) Pub Date : 2020-10-01 , DOI: 10.1214/19-aos1900
Ethan X Fang ₁ , Yang Ning ₂ , Runze Li ₁

Affiliation

This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low-dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 (2002) 479–498) procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

中文翻译：

高维纵向数据的显着性检验

本文涉及具有超高维协变量的纵向数据的统计推断。我们首先研究为感兴趣的低维参数构建置信区间和假设检验的问题。主要挑战是如何在存在高维干扰参数和纵向数据的复杂主题内相关性的情况下构建强大的测试统计量。为了应对这一挑战，我们提出了一种新的二次解相关推理函数方法，该方法同时消除了干扰参数的影响并结合了相关性以提高估计过程的效率。当感兴趣的参数是固定维数时，我们证明了所提出的估计量是渐近正态的，并且达到了半参数信息界，在此基础上，我们可以构建最佳 Wald 检验统计量。我们进一步扩展了这个结果，并在感兴趣参数的维数以多项式速率随样本量增长的情况下建立了估计量的极限分布。最后，我们研究了当感兴趣的高维回归参数向量时如何控制错误发现率 (FDR)。我们证明将 Storey (JR Stat. Soc. Ser. B. Stat. Methodol. 64 (2002) 479–498) 程序应用于每个回归参数的建议检验统计量，在纵向数据中渐近控制 FDR。我们进行模拟研究以评估所提出程序的有限样本性能。我们的模拟结果意味着新提出的程序可以控制用于测试感兴趣的低维参数的 I 类错误和多重测试问题中的 FDR。我们还将建议的程序应用于实际数据示例。

更新日期：2020-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>