Multi-granularity Parallel Computing in a Genome-Scale Molecular Evolution Application.,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-granularity Parallel Computing in a Genome-Scale Molecular Evolution Application.
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2009-01-01 , DOI: 10.1007/978-3-642-03275-2_6
Jesse D Walters ₁ , Thomas B Bair , Terry A Braun , Todd E Scheetz , John P Robinson , Thomas L Casavant

Affiliation

Previously [1], we reported a coarse-grained parallel computational approach to identifying rare molecular evolutionary events often referred to as horizontal gene transfers. Very high degrees of parallelism (up to 65x speedup on 4,096 processors) were reported, yet the overall execution time for a realistic problem size was still on the order of 12 days. With the availability of large numbers of compute clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, we demonstrated the computational feasibility of a method to examine "clusters" of genes using phylogenetic tree similarity as a distance metric. A full serial solution to this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach involving low-cost compute nodes. This paper now describes a multiple granularity parallelism solution that includes exploitation of multi-core shared memory nodes to address fine-grained aspects in the tree-clustering phase of our previous deployment of XenoCluster 1.0. In addition to benchmarking results that show up to 80% speedup efficiency on 8 CPU cores, we report on the biological accuracy and relevance of our results compared to a reported set of known xenologs in yeast.

中文翻译：

基因组尺度分子进化应用中的多粒度并行计算。

以前 [1]，我们报告了一种粗粒度并行计算方法来识别罕见的分子进化事件，通常称为水平基因转移。报告了非常高的并行度（在 4,096 个处理器上高达 65 倍的加速），但实际问题大小的总执行时间仍约为 12 天。随着大量计算簇的可用性，以及来自 2,000 多个物种的基因组序列，每个物种包含多达 35,000 个基因，总共有数万亿个核苷酸序列，我们证明了一种检查“簇”的方法的计算可行性使用系统发育树相似性作为距离度量的基因。这个问题的完整串行解决方案需要数年的 CPU 时间，但只对 IPC 和内存有适度的需求；因此，它是涉及低成本计算节点的网格计算方法的理想候选者。本文现在描述了一种多粒度并行解决方案，其中包括利用多核共享内存节点来解决我们之前部署的 XenoCluster 1.0 的树集群阶段的细粒度方面。除了在 8 个 CPU 内核上显示高达 80% 的加速效率的基准测试结果之外，我们还报告了与酵母中已知的一组已知异种同源物相比，我们的结果的生物学准确性和相关性。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>