当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2021-04-29 , DOI: 10.1016/j.csda.2021.107265
Feifei Wang , Yingqiu Zhu , Danyang Huang , Haobo Qi , Hansheng Wang

One-shot-type (or divide-and-conquer) estimators have been widely used for distributed statistical analysis. However, their outstanding statistical efficiency hinges on two critical conditions. The first is the uniformity condition, which requires that the sample sizes allocated to different Workers should be as comparable as possible. The second one is the randomness condition, which requires that the data should be distributed across Workers as randomly as possible. Both conditions are often violated in practice. The violation of either condition can be seriously degrade the statistical efficiency of one-shot estimators, or even make them inconsistent. To fix this problem, a novel one-step upgraded pilot (OSUP) method is proposed. In the first step of the algorithm, a pilot estimate is computed based on randomly selected samples from different Workers. In the second step, one-step updating is conducted based on the pilot estimate by summarizing the derivative information on each Worker. The resulting OSUP estimator is theoretically proved to be as statistically efficient as the whole sample maximum likelihood estimator without any restrictive assumption about distribution uniformity and randomness. Extensive numerical studies are presented to demonstrate the finite sample performance of the OSUP estimator. Finally, by way of an illustration, an American Airlines dataset is analyzed on a Spark cluster.



中文翻译:

非均匀和非随机分布数据的分布式单步升级估计

单发型(或分治制)估计器已广泛用于分布式统计分析。但是,它们出色的统计效率取决于两个关键条件。第一个是统一性条件,它要求分配给不同工人的样本数量应尽可能地可比。第二个是随机性条件,这要求数据应尽可能随机地分布在所有Worker之间。在实践中经常违反这两个条件。违反任一条件都可能严重降低一次性估算器的统计效率,甚至使它们不一致。为了解决这个问题,提出了一种新颖的单步升级导频(OSUP)方法。在算法的第一步中,根据从不同Workers中随机选择的样本来计算导频估计。在第二步中,通过汇总有关每个Worker的派生信息,基于飞行员估计进行单步更新。理论上证明了所得的OSUP估计量在统计上与整个样本最大似然估计量一样有效,而没有任何关于分布均匀性的限制性假设随机性。进行了广泛的数值研究,以证明OSUP估计器的有限样本性能。最后,通过说明的方式,在Spark集群上分析了美国航空的数据集。

更新日期:2021-05-10
down
wechat
bug