Robust communication-efficient distributed composite quantile regression and variable selection for massive data,Computational Statistics & Data Analysis

当前位置： X-MOL 学术 › Comput. Stat. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust communication-efficient distributed composite quantile regression and variable selection for massive data
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-04-22 , DOI: 10.1016/j.csda.2021.107262
Kangning Wang , Shaomin Li , Benle Zhang

Statistical analysis of massive data is becoming more and more common. Distributed composite quantile regression (CQR) for massive data is proposed in this paper. Specifically, the global CQR loss function is approximated by a surrogate one on the first machine, which relates to the local data only through their gradients, then the estimator is obtained on the first machine by minimizing the surrogate loss. Because the gradients of local datasets can be efficiently communicated, the communication cost is significantly reduced. In order to reduce the computational burdens, the induced smoothing method is applied. Theoretically, the resulting estimator is proved to be statistically as efficient as the global CQR estimator. What is more, as a direct application, a smooth-threshold distributed CQR estimating equations for variable selection is proposed. The new methods inherit the robustness and efficiency advantages of CQR. The promising performances of the new methods are supported by extensive numerical examples and real data analysis.

中文翻译：

强大的通信效率高的分布式复合分位数回归和海量数据的变量选择

海量数据的统计分析变得越来越普遍。本文提出了用于海量数据的分布式复合分位数回归（CQR）。具体而言，全局CQR损失函数由第一台机器上的代理人近似，该函数仅通过其梯度与本地数据相关，然后通过最小化代理人损失在第一台机器上获得估计量。因为可以有效地传达局部数据集的梯度，所以大大降低了通讯成本。为了减轻计算负担，应用了诱导平滑方法。从理论上讲，在统计上证明了所得的估计器与全局CQR估计器一样有效。而且，作为直接应用，提出了一种用于变量选择的平滑阈值分布CQR估计方程。新方法继承了CQR的鲁棒性和效率优势。大量的数值示例和实际数据分析支持了新方法的有希望的性能。

更新日期：2021-04-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>