Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-11-17 , DOI: 10.1080/01621459.2021.1979011
Yash Deshpande ₁ , Adel Javanmard ₂ , Mohammad Mehrabi ₂

Affiliation

Abstract

Adaptive collection of data is commonplace in applications throughout science and engineering. From the point of view of statistical inference, however, adaptive data collection induces memory and correlation in the samples, and poses significant challenge. We consider the high-dimensional linear regression, where the samples are collected adaptively, and the sample size n can be smaller than p, the number of covariates. In this setting, there are two distinct sources of bias: the first due to regularization imposed for consistent estimation, for example, using the LASSO, and the second due to adaptivity in collecting the samples. We propose “online debiasing,” a general procedure for estimators such as the LASSO, which addresses both sources of bias. In two concrete contexts (i) time series analysis and (ii) batched data collection, we demonstrate that online debiasing optimally debiases the LASSO estimate when the underlying parameter θ₀ has sparsity of order o(n−−√/ log p) $o (\sqrt{n} / log p)$ . In this regime, the debiased estimator can be used to compute p-values and confidence intervals of optimal size.

中文翻译：

自适应收集的高维数据的在线去偏及其在时间序列分析中的应用

摘要

自适应数据收集在整个科学和工程应用中很常见。然而，从统计推断的角度来看，自适应数据收集会引起样本的记忆和相关性，并提出重大挑战。我们考虑高维线性回归，自适应地收集样本，并且样本大小n可以小于p，协变量的数量。在这种情况下，有两个不同的偏差来源：第一个是由于为了一致估计而施加的正则化，例如使用 LASSO，第二个是由于收集样本时的适应性。我们提出“在线去偏差”，这是 LASSO 等估计器的通用程序，它解决了偏差的两个来源。在两个具体环境中（i）时间序列分析和（ii）批量数据收集，我们证明当基础参数θ ₀具有阶数稀疏性时，在线去偏可以最佳地对 LASSO 估计进行去偏Ø （n--√/记录p ) $o (\sqrt{n} / log p)$ 。在这种情况下，可使用去偏估计器来计算p值和最佳大小的置信区间。

更新日期：2021-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>