当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal subsampling for quantile regression in big data
Biometrika ( IF 2.7 ) Pub Date : 2020-07-21 , DOI: 10.1093/biomet/asaa043
Haiying Wang 1 , Yanyuan Ma 2
Affiliation  

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions and asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.

中文翻译:

大数据中分位数回归的最优子采样

我们研究了分位数回归的最佳子采样。我们推导出一般子采样估计量的渐近分布,然后推导出两个版本的最佳子采样概率。一个版本最小化线性变换参数估计器的渐近方差-协方差矩阵的迹,另一个版本最小化原始参数估计器的迹。前者不依赖于给定协变量的响应密度,并且易于实现。提出了基于最优子采样概率的算法,并建立了所得估计量的渐近分布和渐近最优性。此外,我们提出了一种基于线性变换参数估计中的最优子采样概率的迭代子采样程序,它具有很好的可扩展性以利用可用的计算资源。此外,此过程会产生参数估计量的标准误差,而无需估计给定协变量的响应密度。我们提供了基于模拟和真实数据的数值例子来说明所提出的方法。
更新日期:2020-07-21
down
wechat
bug