Annals of the Institute of Statistical Mathematics ( IF 0.8 ) Pub Date : 2022-01-10 , DOI: 10.1007/s10463-021-00816-0 Fengrui Di 1 , Lei Wang 1
Statistical analysis of large-scale dataset is challenging due to the limited memory constraint and computation source and calls for the efficient distributed methods. In this paper, we mainly study the distributed estimation and inference for composite quantile regression (CQR). For computational and statistical efficiency, we propose to apply a smoothing idea to the CQR loss function for the distributed data and then successively refine the estimator via multiple rounds of aggregations. Based on the Bahadur representation, we derive the asymptotic normality of the proposed multi-round smoothed CQR estimator and show that it also achieves the same efficiency of the ideal CQR estimator by analyzing the entire dataset simultaneously. Moreover, to improve the efficiency of the CQR, we propose a multi-round smoothed weighted CQR estimator. Extensive numerical experiments on both simulated and real data validate the superior performance of the proposed estimators.
中文翻译:
分布式数据的多轮平滑复合分位数回归
由于内存限制和计算源的限制,大规模数据集的统计分析具有挑战性,需要高效的分布式方法。在本文中,我们主要研究复合分位数回归(CQR)的分布式估计和推理。为了计算和统计效率,我们建议将平滑思想应用于分布式数据的 CQR 损失函数,然后通过多轮聚合连续细化估计量。基于 Bahadur 表示,我们推导了所提出的多轮平滑 CQR 估计器的渐近正态性,并表明它通过同时分析整个数据集也达到了与理想 CQR 估计器相同的效率。此外,为了提高 CQR 的效率,我们提出了一种多轮平滑加权 CQR 估计器。