当前位置: X-MOL 学术Stat. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal subsampling for composite quantile regression in big data
Statistical Papers ( IF 1.3 ) Pub Date : 2022-02-08 , DOI: 10.1007/s00362-022-01292-1
Xiaohui Yuan 1, 2 , Yong Li 1 , Xiaogang Dong 2 , Tianqing Liu 3
Affiliation  

The composite quantile regression (CQR) is an efficient and robust alternative to the least squares for estimating regression coefficients in a linear model. We investigate optimal subsampling for CQR with massive datasets. By establishing the consistency and asymptotic normality of the CQR estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities under the L- and A-optimality criteria. The L-optimality criterion minimizes the trace of the asymptotic variance–covariance matrix of the estimator for a linearly transformed regression parameters and the A-optimality criterion minimizes that of the estimator for regression parameters. The L-optimal subsampling probabilities is easy to implement as they do not depend on the densities of the responses given covariates. Based on the L-optimal subsampling probabilities, we propose algorithms for computing the resulting estimators and their asymptotic distributions and asymptotic optimality are established. To obtain standard errors for CQR estimators without estimating the densities of the responses given the covariates, we propose an iterative subsampling procedure based on the L-optimal subsampling probabilities. The proposed methods are illustrated through numerical experiments on simulated and real datasets.



中文翻译:

大数据中复合分位数回归的最优子抽样

复合分位数回归 (CQR) 是用于估计线性模型中的回归系数的最小二乘法的有效且稳健的替代方案。我们研究了具有大量数据集的 CQR 的最佳子采样。通过从一般的子采样算法建立 CQR 估计量的一致性和渐近正态性,我们推导出 L 和 A 最优性标准下的最优子采样概率。L-最优性准则最小化线性变换回归参数估计量的渐近方差-协方差矩阵的迹线,A-最优性准则最小化回归参数估计量的迹线。L 最优子采样概率很容易实现,因为它们不依赖于给定协变量的响应的密度。基于 L 最优子采样概率,我们提出了计算结果估计量的算法,并建立了它们的渐近分布和渐近最优性。为了在不估计给定协变量的响应密度的情况下获得 CQR 估计器的标准误差,我们提出了一种基于 L 最优子采样概率的迭代子采样过程。通过对模拟和真实数据集的数值实验来说明所提出的方法。

更新日期:2022-02-09
down
wechat
bug