Semisupervised inference for explained variance in high dimensional linear regression and its applications,The Journal of the Royal Statistical Society, Series B (Statistical Methodology)

当前位置： X-MOL 学术 › J. R. Stat. Soc. B › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semisupervised inference for explained variance in high dimensional linear regression and its applications
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2020-01-20 , DOI: 10.1111/rssb.12357
T. Tony Cai ₁ , Zijian Guo ₂

Affiliation

The paper considers statistical inference for the explained variance $urn:x-wiley:13697412:media:rssb12357:rssb12357-math-0001$ under the high dimensional linear model Y=Xβ+ε in the semisupervised setting, where β is the regression vector and Σ is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semisupervised framework. The optimality result characterizes how the unlabelled data contribute to the estimation accuracy. Moreover, the limiting distribution for the proposed estimator is established and the unlabelled data have also proved useful in reducing the length of the confidence interval for the explained variance. The method proposed is extended to semisupervised inference for the unweighted quadratic functional $urn:x-wiley:13697412:media:rssb12357:rssb12357-math-0002$ . The inference results obtained are then applied to a range of high dimensional statistical problems, including signal detection and global testing, prediction accuracy evaluation and confidence ball construction. The numerical improvement of incorporating the unlabelled data is demonstrated through simulation studies and an analysis of estimating heritability for a yeast segregant data set with multiple traits.

中文翻译：

高维线性回归中解释方差的半监督推理及其应用

本文考虑的统计推断解释方差 $骨灰盒：x-wiley：13697412：media：rssb12357：rssb12357-math-0001$ 高维线性模型下Ŷ = Xβ + ε在半监督设置，其中β是回归向量，而Σ是设计协方差矩阵。提出了一种标定的估计器，它可以有效地集成标记和未标记的数据。结果表明，估计器在一般的半监督框架中达到了最小最大最优收敛速度。最优结果描述了未标记数据如何有助于估计精度。此外，为拟议的估计量建立了极限分布，并且未标记的数据也被证明有助于减少所解释方差的置信区间的长度。提出的方法扩展到了非加权二次函数的半监督推理 $骨灰盒：x-wiley：13697412：media：rssb12357：rssb12357-math-0002$ 。然后将获得的推理结果应用于一系列高维统计问题，包括信号检测和全局测试，预测准确性评估和置信度球构建。通过模拟研究和对具有多个性状的酵母分离剂数据集的估计遗传力的分析，证明了合并未标记数据的数值改进。

更新日期：2020-01-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文