当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
GigaScience ( IF 11.8 ) Pub Date :  , DOI: 10.1093/gigascience/giaa101
Davide Bolognini 1, 2 , Alberto Magi 3 , Vladimir Benes 2 , Jan O Korbel 4 , Tobias Rausch 2, 4
Affiliation  

Abstract
Background
Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution.
Results
We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees.
Conclusions
TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.


中文翻译:


TRiCoLOR:使用全基因组长读长测序数据进行串联重复分析


 抽象的
 背景

串联重复序列在人类基因组中广泛存在,它们的扩展会导致多种重复介导的疾病。需要全基因组发现方法来充分阐明它们在健康和疾病中的作用,但准确解决串联重复变异仍然是一项具有挑战性的任务。虽然使用短读长数据的传统基于作图的方法在其可以解决的串联重复的大小和类型方面存在严重限制,但最近的第三代测序技术表现出更高的测序错误率,这使重复分辨率变得复杂。
 结果

我们开发了 TRiCoLOR,这是一种免费提供的工具,用于使用第三代测序技术中容易出错的长读长进行串联重复分析。该方法可以识别测序数据中的重复区域,而无需事先了解其基序或位置,并以单倍型特异性方式解决重复多重性和周期大小。该工具包括交互式可视化所识别的重复序列并追踪其在谱系中的孟德尔一致性的方法。
 结论

与合成数据的替代工具相比,TRiCoLOR 表现出卓越的性能以及更高的灵敏度和特异性。对于真实的人类全基因组测序数据,TRiCoLOR 实现了很高的验证率,这表明它适合识别个人基因组中的串联重复变异。
更新日期:2020-10-08
down
wechat
bug