当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Massive parallelization boosts big Bayesian multidimensional scaling
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2020-06-08 , DOI: 10.1080/10618600.2020.1754226
Andrew J Holbrook 1 , Philippe Lemey 2 , Guy Baele 2 , Simon Dellicour 2 , Dirk Brockmann 3 , Andrew Rambaut 4, 5 , Marc A Suchard 1, 6, 7
Affiliation  

Big Bayes is the computationally intensive co-application of big data and large, expressive Bayesian models for the analysis of complex phenomena in scientific inference and statistical learning. Standing as an example, Bayesian multidimensional scaling (MDS) can help scientists learn viral trajectories through space-time, but its computational burden prevents its wider use. Crucial MDS model calculations scale quadratically in the number of observations. We partially mitigate this limitation through massive parallelization using multi-core central processing units, instruction-level vectorization and graphics processing units (GPUs). Fitting the MDS model using Hamiltonian Monte Carlo, GPUs can deliver more than 100-fold speedups over serial calculations and thus extend Bayesian MDS to a big data setting. To illustrate, we employ Bayesian MDS to infer the rate at which different seasonal influenza virus subtypes use worldwide air traffic to spread around the globe. We examine 5392 viral sequences and their associated 14 million pairwise distances arising from the number of commercial airline seats per year between viral sampling locations. To adjust for shared evolutionary history of the viruses, we implement a phylogenetic extension to the MDS model and learn that subtype H3N2 spreads most effectively, consistent with its epidemic success relative to other seasonal influenza subtypes. Finally, we provide MassiveMDS, an open-source, stand-alone C++ library and rudimentary R package, and discuss program design and high-level implementation with an emphasis on important aspects of computing architecture that become relevant at scale.

中文翻译:

大规模并行化促进大贝叶斯多维缩放

Big Bayes 是大数据和大型、富有表现力的贝叶斯模型的计算密集型共同应用,用于分析科学推理和统计学习中的复杂现象。例如,贝叶斯多维尺度 (MDS) 可以帮助科学家通过时空学习病毒轨迹,但其计算负担阻碍了其更广泛的应用。至关重要的 MDS 模型计算按观察次数的二次方扩展。我们通过使用多核中央处理单元、指令级矢量化和图形处理单元 (GPU) 的大规模并行化来部分缓解这种限制。使用 Hamiltonian Monte Carlo 拟合 MDS 模型,GPU 可以提供超过 100 倍的串行计算加速,从而将贝叶斯 MDS 扩展到大数据设置。为了显示,我们使用贝叶斯 MDS 来推断不同季节性流感病毒亚型利用全球空中交通在全球传播的速度。我们检查了 5392 个病毒序列及其相关的 1400 万个配对距离,这些距离是由病毒采样位置之间每年的商业航空公司座位数引起的。为了调整病毒的共享进化历史,我们对 MDS 模型进行了系统发育扩展,并了解到 H3N2 亚型传播最有效,这与其相对于其他季节性流感亚型的流行成功一致。最后,我们提供 MassiveMDS,一个开源的、独立的 C++ 库和基本的 R 包,并讨论程序设计和高级实现,重点是大规模相关的计算架构的重要方面。
更新日期:2020-06-08
down
wechat
bug