当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PolyCluster: Minimum Fragment Disagreement Clustering for Polyploid Phasing.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2018-07-23 , DOI: 10.1109/tcbb.2018.2858803
Sepideh Mazrouee , Wei Wang

Phasing is an emerging area in computational biology with important applications in clinical decision making and biomedical sciences. While machine learning techniques have shown tremendous potential in many biomedical applications, their utility in phasing has not yet been fully understood. In this paper, we investigate development of clustering-based techniques for phasing in polyploidy organisms where more than two copies of each chromosome exist in the cells of the organism under study. We develop a novel framework, called PolyCluster, based on the concept of correlation clustering followed by an effective cluster merging mechanism to minimize the amount of disagreement among short reads residing in each cluster. We first introduce a graph model to quantify the amount of similarity between each pair of DNA reads. We then present a combination of linear programming, rounding, region-growing, and cluster merging to group similar reads and reconstruct haplotypes. Our extensive analysis demonstrates the effectiveness of PolyCluster in accurate and scalable phasing. In particular, we show that PolyCluster reduces switching error of H-PoP, HapColor, and HapTree by 44.4, 51.2, and 48.3 percent, respectively. Also, the running time of PolyCluster is several orders-of-magnitude less than HapTree while it achieves a running time comparable to other algorithms.

中文翻译:

PolyCluster:用于多倍体定相的最小片段不一致聚类。

定相是计算生物学中的一个新兴领域,在临床决策和生物医学科学中具有重要应用。虽然机器学习技术在许多生物医学应用中显示出巨大的潜力,但它们在定相中的效用尚未完全被理解。在本文中,我们研究了基于聚类的技术的发展,用于在多倍体生物中进行定相,其中所研究生物的细胞中存在每个染色体的两个以上副本。我们开发了一个新的框架,称为 PolyCluster,基于相关聚类的概念,然后是有效的集群合并机制,以最大限度地减少驻留在每个集群中的短读之间的分歧量。我们首先引入一个图模型来量化每对 DNA 读数之间的相似度。然后,我们提出了线性规划、舍入、区域增长和聚类合并的组合,以对相似的读数进行分组并重建单倍型。我们的广泛分析证明了 PolyCluster 在准确和可扩展的定相方面的有效性。特别是,我们表明 PolyCluster 将 H-PoP、HapColor 和 HapTree 的切换误差分别降低了 44.4%、51.2% 和 48.3%。此外,PolyCluster 的运行时间比 HapTree 少几个数量级,而它实现了与其他算法相当的运行时间。和 HapTree 分别提高了 44.4%、51.2% 和 48.3%。此外,PolyCluster 的运行时间比 HapTree 少几个数量级,而它实现了与其他算法相当的运行时间。和 HapTree 分别提高了 44.4%、51.2% 和 48.3%。此外,PolyCluster 的运行时间比 HapTree 少几个数量级,而它实现了与其他算法相当的运行时间。
更新日期:2020-03-07
down
wechat
bug