当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ComHapDet: a spatial community detection algorithm for haplotype assembly.
BMC Genomics ( IF 4.4 ) Pub Date : 2020-09-09 , DOI: 10.1186/s12864-020-06935-x
Abishek Sankararaman 1 , Haris Vikalo 1 , François Baccelli 1, 2
Affiliation  

Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet – a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.

中文翻译:

ComHapDet:用于单倍型装配的空间群落检测算法。

单倍型是区分核苷酸序列与其同源对的单核苷酸变异的有序列表,可能揭示一个人对遗传性和复杂性疾病的敏感性,并影响我们的身体对治疗药物的反应。从短测序读取中重建一个人的单倍型是一个NP难题,在多倍体的情况下变得更具挑战性。虽然增加测序读段和插入片段的长度有助于提高重建的准确性,但同时也加剧了单倍型装配任务的计算复杂性。这促使人们寻求能够从高通量测序数据准确而有效地组装单倍型的算法框架。我们提出了一种新型的测序读段的图形表示形式,并提出了单倍型装配问题作为空间随机图上社区检测的一个实例。为此,我们构建了一个图,其中每个读取都是一个节点,该节点带有未知的社区标签,将该读取与其采样的单倍型相关联。然后可以将单倍型重建视为一个两步过程:首先,恢复节点上的社区标签(即读取),然后使用估计的标签组装单倍型。基于此观察,我们提出了ComHapDet –一种用于二倍体和多倍体单倍型的新型装配算法,该算法允许双等位基因和多等位基因变体。该算法的性能以模拟以及实验数据为基准,该数据是通过对四倍体双等位基因茄-马铃薯(马铃薯)5号染色体进行测序获得的。结果证明了该方法的有效性,并且与现有技术相比具有优势。
更新日期:2020-09-08
down
wechat
bug