Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.,The Journal of the Royal Statistical Society, Series B (Statistical Methodology)

当前位置： X-MOL 学术 › J. R. Stat. Soc. B › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 3.1 ) Pub Date : 2016-04-14 , DOI: 10.1111/rssb.12166
Jianqing Fan ₁ , Quefeng Li ₁ , Yuyan Wang ₁

Affiliation

Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultra-high dimensional setting with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate quadratic (RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA-lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light-tail situation. We further study the computational convergence of RA-Lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a byproduct, we also establish the concentration inequality for estimating population mean when there exists only the second moment. We compare RA-Lasso with other regularized robust estimators based on quantile regression and LAD regression. Extensive simulation studies demonstrate the satisfactory finite-sample performance of RA-Lasso.

中文翻译：

在缺乏对称性和轻尾假设的情况下估计高维均值回归。

存在重尾错误的数据在各个科学领域中都很常见。为了解决这个问题，近年来开发了基于分位数回归和最小绝对偏差（LAD）回归的程序。这些方法本质上是估计条件中位数（或分位数）函数。它们可能与条件均值函数有很大不同，特别是当分布不对称和异方差时。如何在仅存在二阶矩的超高维环境下有效估计均值回归函数？为了解决这个问题，我们提出了一种具有发散参数的惩罚 Huber 损失，以减少传统 Huber 损失造成的偏差。这种惩罚鲁棒近似二次（RA-二次）损失将被称为 RA-Lasso。在超高维设置中，维数可以随着样本大小呈指数增长，我们的结果表明，RA-lasso 估计器以与轻尾情况下最优速率相同的速率产生一致的估计器。我们进一步研究了 RA-Lasso 的计算收敛性，并表明复合梯度下降算法确实产生了一个在足够的迭代后允许相同最优速率的解决方案。作为副产品，我们还建立了仅存在二阶矩时用于估计总体平均值的集中不等式。我们将 RA-Lasso 与其他基于分位数回归和 LAD 回归的正则化稳健估计器进行比较。广泛的模拟研究证明了 RA-Lasso 令人满意的有限样本性能。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文