当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.
GigaScience ( IF 11.8 ) Pub Date : 2020-05-01 , DOI: 10.1093/gigascience/giaa048
Morteza Hosseini 1 , Diogo Pratas 1, 2 , Burkhard Morgenstern 3, 4 , Armando J Pinho 1
Affiliation  

BACKGROUND The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. RESULTS We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image. CONCLUSIONS Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.

中文翻译:


Smash++:一种无需比对且节省内存的工具,用于查找基因组重排。



背景技术高通量测序技术的发展以及由此产生的大量基因组数据加速了生物和医学研究和发现。由于基因组重排在染色体进化、遗传性疾病和癌症中的作用,对基因组重排的研究至关重要。结果我们推出了 Smash++,这是一种无需比对且内存高效的工具,用于查找和可视化 2 个 DNA 序列之间的小规模和大规模基因组重排。该计算解决方案提取 2 个序列的信息内容,利用数据压缩技术来查找重新排列。我们还推出了 Smash++ 可视化工具,该工具可以通过生成 SVG(可扩展矢量图形)图像来可视化检测到的重排及其自身和相对复杂性。结论 通过对来自细菌、真菌、鸟类和哺乳动物的几种合成和真实 DNA 序列进行测试,所提出的工具能够准确地发现基因组重排。检测到的区域与之前的研究一致,这些研究采用了基于比对的方法或进行了 FISH(荧光原位杂交)分析。所有实验中的最大峰值内存使用量约为 1 GB,这使得 Smash++ 可以在当今的标准计算机上运行。
更新日期:2020-05-20
down
wechat
bug