当前位置: X-MOL 学术Ann. Inst. Stat. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust distributed estimation and variable selection for massive datasets via rank regression
Annals of the Institute of Statistical Mathematics ( IF 0.8 ) Pub Date : 2021-06-20 , DOI: 10.1007/s10463-021-00803-5
Jiaming Luan , Hongwei Wang , Kangning Wang , Benle Zhang

Rank regression is a robust modeling tool; it is challenging to implement it for the distributed massive data owing to memory constraints. In practice, the massive data may be distributed heterogeneously from machine to machine; how to incorporate the heterogeneity is also an interesting issue. This paper proposes a distributed rank regression (\(\mathrm {DR}^{2}\)), which can be implemented in the master machine by solving a weighted least-squares and adaptive when the data are heterogeneous. Theoretically, we prove that the resulting estimator is statistically as efficient as the global rank regression estimator. Furthermore, based on the adaptive LASSO and a newly defined distributed BIC-type tuning parameter selector, we propose a distributed regularized rank regression (\(\mathrm {DR}^{3}\)), which can make consistent variable selection and can also be easily implemented by using the LARS algorithm on the master machine. Simulation results and real data analysis are included to validate our method.



中文翻译:

通过秩回归对海量数据集进行稳健的分布式估计和变量选择

秩回归是一种强大的建模工具;由于内存限制,对于分布式海量数据实现它具有挑战性。在实践中,海量数据可能在机器之间异构分布;如何合并异质性也是一个有趣的问题。本文提出了一种分布式秩回归(\(\mathrm {DR}^{2}\)),它可以通过求解加权最小二乘法在主机中实现,并且在数据异构时自适应。从理论上讲,我们证明所得估计量在统计上与全局秩回归估计量一样有效。此外,基于自适应 LASSO 和新定义的分布式 BIC 类型调整参数选择器,我们提出了分布式正则化秩回归(\(\mathrm {DR}^{3}\)),可以进行一致的变量选择,也可以在主机上使用LARS算法轻松实现。包括仿真结果和真实数据分析以验证我们的方法。

更新日期:2021-06-20
down
wechat
bug