High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes.,Journal of Parallel and Distributed Computing

当前位置： X-MOL 学术 › J. Parallel Distrib. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes.
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2011-09-16 , DOI: 10.1016/j.jpdc.2011.08.001
Fahad Saeed ₁ , Alan Perez-Rathke , Jaroslaw Gwarnicki , Tanya Berger-Wolf , Ashfaq Khokhar

Affiliation

Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.

中文翻译：

用于来自多个参考基因组的焦磷酸测序读数的高性能多序列比对系统。

使用焦磷酸测序产生的短读长进行基因组重测序通常依赖于将短读长映射到单个参考基因组。然而，使用成对映射算法不可能映射来自多个参考基因组的读数。为了比对彼此和参考基因组的读数，不能使用现有的多序列比对 (MSA) 方法，因为它们没有考虑这些短读数相对于基因组的位置，并且对于大序列而言效率非常低。序列数。在本文中，我们开发了一种基于域分解的高度可扩展的并行算法，称为 P-Pyro-Align，以对齐来自单个或多个参考基因组的大量读数。所提出的对齐算法准确地对齐了错误的读取，并已在使用 MPI 库的工作站集群上实现。在执行时间、比对质量和算法处理来自多个单倍型的读取的能力方面分析了不同问题规模的实验结果。我们报告了多达 50 万次读取的高质量多重比对。该算法被证明具有高度可扩展性，并且随着处理器数量的增加表现出超线性加速。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11