Robust distributed estimation and variable selection for massive datasets via rank regression

Luan, Jiaming; Wang, Hongwei; Wang, Kangning; Zhang, Benle

doi:10.1007/s10463-021-00803-5

Robust distributed estimation and variable selection for massive datasets via rank regression

Published: 20 June 2021

Volume 74, pages 435–450, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Jiaming Luan¹,
Hongwei Wang¹,
Kangning Wang¹ &
…
Benle Zhang¹

554 Accesses
1 Citation
Explore all metrics

Abstract

Rank regression is a robust modeling tool; it is challenging to implement it for the distributed massive data owing to memory constraints. In practice, the massive data may be distributed heterogeneously from machine to machine; how to incorporate the heterogeneity is also an interesting issue. This paper proposes a distributed rank regression (\(\mathrm {DR}^{2}\)), which can be implemented in the master machine by solving a weighted least-squares and adaptive when the data are heterogeneous. Theoretically, we prove that the resulting estimator is statistically as efficient as the global rank regression estimator. Furthermore, based on the adaptive LASSO and a newly defined distributed BIC-type tuning parameter selector, we propose a distributed regularized rank regression (\(\mathrm {DR}^{3}\)), which can make consistent variable selection and can also be easily implemented by using the LARS algorithm on the master machine. Simulation results and real data analysis are included to validate our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed smoothed rank regression with heterogeneous errors for massive data

Article 23 October 2023

Robust reduced rank regression in a distributed setting

Article 21 January 2022

Variable Selection for Distributed Sparse Regression Under Memory Constraints

Article 01 February 2023

References

Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. The Annals of Statistics, 46, 1352–1382.
Article MathSciNet Google Scholar
Chen, X., Xie, M. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24, 1655–1684.
MathSciNet MATH Google Scholar
Chen, L., Zhou, Y. (2019). Quantile regression in big data: A divide and conquer based strategy. Computational Statistics and Data Analysis. https://doi.org/10.1016/j.csda.2019.106892.
Article MATH Google Scholar
Fan, J., Wang, D., Wang, K., Zhu, Z. (2017). Distributed estimation of principal eigenspaces. arXiv preprint arXiv:1702.06488 .
Fan, J., Guo, Y., Wang, K. (2019). Communication-efficient accurate statistical estimation. arXiv preprint arXiv:1906.04870
Feng, L., Zou, C., Wang, Z., Wei, X., Chen, B. (2015). Robust spline-based variable selection in varying coefficient model. Metrika, 78, 85–118.
Article MathSciNet Google Scholar
Jordan, M. I., Lee, J. D., Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 14, 668–681.
Article MathSciNet Google Scholar
Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Article MathSciNet Google Scholar
Lee, J., Sun, Y., Liu, Q., Taylor, J. (2015). Communication-efficient sparse regression: a one-shot approach. arXiv preprint arXiv: 1503.04337.
Lehmann, E. (1983). Theory of Point Estimation. New York: Wiley.
Book Google Scholar
Leng, C. (2010). Variable selection and coefficient estimation via regularized rank regression. Statistica Sinica, 20, 167–181.
MathSciNet MATH Google Scholar
Lin, N., Xi, R. (2011). Aggregated estimating equation estimation. Statistics and Its Interface, 4, 73–83.
Article MathSciNet Google Scholar
McKean, J. (2004). Robust analysis of linear models. Statistical Science, 19, 562–570.
Article MathSciNet Google Scholar
Rosenblatt, J., Nadler, B. (2016). On the optimality of averaging in distributed statistical learning. Information and Inference: A Journal of the IMA, 5, 379–404.
Article MathSciNet Google Scholar
Shin, Y. (2010). Local rank estimation of transformation models with functional coefficients. Econometric Theory, 26, 1807–1819.
Article MathSciNet Google Scholar
Wang, H., Li, G., Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business and Economic Statistics, 25, 347–355.
Article MathSciNet Google Scholar
Wang, J., Kolar, M., Srebro, N., Zhang, T. (2017). Efficient distributed learning with sparsity. In: International Conference on Machine Learning, 3636-3645.
Wang, L., Li, R. (2009). Wighted Wilcoxon-type smoothly clipped absolute deviation method. Biometrics, 65, 564–571.
Article MathSciNet Google Scholar
Wang, L., Kai, B., Li, R. (2009). Local rank inference for varying coefficient models. Journal of the American Statistical Association, 488, 1631–1645.
Article MathSciNet Google Scholar
Zhang, Q., Wang, W. (2007). A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the International Conference on Scientific and Statistical Database Management.
Zhang, Y., Duchi, J., Wainwright, M. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Reaearch, 14, 3321–3363.
MathSciNet MATH Google Scholar
Zhang, Y., Duchi, J., Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. Journal of Machine Learning Research, 16, 3299–3340.
MathSciNet MATH Google Scholar
Zhu, X., Li, F., Wang, H. (2019). Least squares approximation for a distributed system. arXiv preprint arXiv: 1908.04904.
Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Shandong Technology and Business University, No. 191, Binhai Middle Road, Laishan District, Yantai, 264005, China
Jiaming Luan, Hongwei Wang, Kangning Wang & Benle Zhang

Authors

Jiaming Luan
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kangning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Benle Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kangning Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

K. Wang: The authors are listed in the alphabetical order. The authors would like to thank Dr. Shaomin Li for his valuable suggestions. The authors would like to thank the editor, an associate editor and two anonymous reviewers for their constructive comments that led to a major improvement of this article. The research was supported by NNSF project of China (11901356, 11901149), wealth management project (2019ZBKY047) of Shandong Technology and Business University.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 264KB)

About this article

Cite this article

Luan, J., Wang, H., Wang, K. et al. Robust distributed estimation and variable selection for massive datasets via rank regression. Ann Inst Stat Math 74, 435–450 (2022). https://doi.org/10.1007/s10463-021-00803-5

Download citation

Received: 23 November 2020
Revised: 22 March 2021
Accepted: 06 May 2021
Published: 20 June 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10463-021-00803-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust distributed estimation and variable selection for massive datasets via rank regression

Abstract

Access this article

Similar content being viewed by others

Distributed smoothed rank regression with heterogeneous errors for massive data

Robust reduced rank regression in a distributed setting

Variable Selection for Distributed Sparse Regression Under Memory Constraints

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 264KB)

About this article

Cite this article

Keywords

Navigation

Robust distributed estimation and variable selection for massive datasets via rank regression

Abstract

Access this article

Similar content being viewed by others

Distributed smoothed rank regression with heterogeneous errors for massive data

Robust reduced rank regression in a distributed setting

Variable Selection for Distributed Sparse Regression Under Memory Constraints

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 264KB)

About this article

Cite this article

Share this article

Keywords

Search

Navigation