Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2021-04-16 , DOI: 10.1016/j.csda.2021.107251 Jianwei Shi , Guoyou Qin , Huichen Zhu , Zhongyi Zhu
In the big data era, practical applications often encounter incomplete data. Current distributed methods, ignoring missingness, may cause inconsistent estimates. Motivated by that, a distributed algorithm is developed for M-estimation with missing data. The proposed algorithm is communication-efficient, where only gradient information is transferred to the central machine. The parameters of interest and the nuisance parameters are simultaneously updated. Theoretically, it is shown that the proposed algorithm achieves a full sample performance after a moderate number of iterations. The influence of nuisance parameters on distributed M-estimation is also investigated. Simulations via synthetic data illustrate the effectiveness of the algorithm. At last, the algorithm is applied to a real data set.
中文翻译:
缺少数据的高效通信的分布式M估计
在大数据时代,实际应用中经常会遇到不完整的数据。当前的分布式方法忽略了缺失,可能会导致估算结果不一致。因此,开发了一种用于丢失数据的M估计的分布式算法。所提出的算法是通信有效的,其中仅梯度信息被传送到中央机器。感兴趣的参数和讨厌的参数被同时更新。从理论上讲,该算法在经过适度的迭代后可以达到完整的样本性能。干扰参数对分布式M的影响-估计也进行了调查。通过综合数据进行的仿真说明了该算法的有效性。最后,将该算法应用于真实数据集。