当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
GigaScience ( IF 9.2 ) Pub Date :  , DOI: 10.1093/gigascience/giaa101
Davide Bolognini 1, 2 , Alberto Magi 3 , Vladimir Benes 2 , Jan O Korbel 4 , Tobias Rausch 2, 4
Affiliation  

Abstract
Background
Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution.
Results
We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees.
Conclusions
TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.


中文翻译:

TRiCoLOR:使用全基因组长读长测序数据进行串联重复分析

摘要
背景
串联重复序列在人类基因组中广泛存在,它们的扩展会导致多种重复介导的疾病。需要全基因组发现方法来充分阐明它们在健康和疾病中的作用,但准确解决串联重复变异仍然是一项具有挑战性的任务。虽然使用短读长数据的传统基于映射的方法在它们可以解析的串联重复序列的大小和类型方面存在严重限制,但最近的第三代测序技术表现出更高的测序错误率,这使得重复序列解析变得复杂。
结果
我们开发了 TRiCoLOR,这是一种免费提供的工具,可使用来自第三代测序技术的易错长读数进行串联重复分析。该方法可以在不了解其基序或位置的先验知识的情况下识别测序数据中的重复区域,并以单倍型特定的方式解析重复多样性和周期大小。该工具包括交互式可视化已识别重复序列并追踪其谱系孟德尔一致性的方法。
结论
与合成数据的替代工具相比,TRiCoLOR 表现出卓越的性能以及更高的灵敏度和特异性。对于真实的人类全基因组测序数据,TRiCoLOR 实现了高验证率,表明其适用于识别个人基因组中的串联重复变异。
更新日期:2020-10-08
down
wechat
bug