当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Haplotype-Resolved Assembly for Synthetic Long Reads Using a Trio-Binning Strategy
bioRxiv - Bioinformatics Pub Date : 2020-06-02 , DOI: 10.1101/2020.06.01.126995
Mengyang Xu , Lidong Guo , Xiao Du , Lei Li , Li Deng , Ou Wang , Ming Ni , Huanming Yang , Xun Xu , Xin Liu , Jie Huang , Guangyi Fan

The accuracy and completeness of genome haplotyping are crucial for characterizing the relationship between human disease susceptibility and genetic variations, especially for the heterozygous variations. However, most of current variations are unphased genotypes, and the construction of long-range haplotypes remains challenging. We introduced a de novo haplotype-resolved assembly tool, HAST that exports two haplotypes of a diploid species for synthetic long reads with trio binning. It generates parental distinguishing k-mer libraries, partitions reads from the offspring according to the unique markers, and individually assembles them to resolve the haplotyping problem. Based on the stLFR co-barcoding data of an Asian as well as his parental massive parallel sequencing data, we utilized HAST to recover both haplotypes with a scaffold N50 of >11 Mb and an assembly accuracy of 99.99995% (Q63). The complete and accurate employment of long-range haplotyping information provided sub-chromosome level phase blocks (N50 ~13 Mb) with 99.6% precision and 94.1% recall on average. We suggest that the accurate and efficient approach accomplishes the regeneration of the haplotype chromosomes with trio binning, thus promoting the determination of haplotype phase, the heterosis of crossbreeding, and the formation of autopolyploid and allopolyploid.

中文翻译:

使用三重装订策略的合成长读的单倍型解析装配

基因组单倍型的准确性和完整性对​​于表征人类疾病易感性与遗传变异之间的关系至关重要,特别是对于杂合变异而言。但是,当前的大多数变异是非定相基因型,而远程单倍型的构建仍然具有挑战性。我们引入了从头单体型解析的组装工具HAST,该工具可导出两个二倍体物种的单体型,以便通过三重装仓进行合成长读。它生成父母区分的k-mer库,根据独特的标记从后代中读取分区,并单独组装它们以解决单体型问题。基于亚洲人的stLFR共条形码数据以及他的父母大量并行测序数据,我们利用HAST回收了N50> 11 Mb,组装精度为99.99995%(Q63)。完整而准确地利用远程单体型信息可提供亚染色体水平的相阻滞(N50〜13 Mb),平均精度为99.6%,召回率平均为94.1%。我们建议,准确而有效的方法通过三重装仓来完成单倍型染色体的再生,从而促进单倍型相的确定,杂交的杂种优势以及同倍体和异多倍体的形成。
更新日期:2020-06-02
down
wechat
bug