当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SNPs detection by eBWT positional clustering.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2019-02-06 , DOI: 10.1186/s13015-019-0137-8
Nicola Prezza 1 , Nadia Pisanti 1, 2 , Marinella Sciortino 3 , Giovanna Rosone 1
Affiliation  

BACKGROUND Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. RESULTS We develop the positional clustering theory that (i) describes how the extended Burrows-Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT. Based on this theory, we designed and implemented an alignment-free and reference-free SNPs calling method, and we devised a consequent SNPs calling pipeline. Experiments on both synthetic and real data show that SNPs can be detected with a simple scan of the eBWT and LCP arrays as, in accordance with our theoretical framework, they are within clusters in the eBWT of the reads. Finally, our tool intrinsically performs a reference-free evaluation of its accuracy by returning the coverage of each SNP. CONCLUSIONS Based on the results of the experiments on synthetic and real data, we conclude that the positional clustering framework can be effectively used for the problem of identifying SNPs, and it appears to be a promising approach for calling other type of variants directly on raw sequencing data. AVAILABILITY The software ebwt2snp is freely available for academic use at: https://github.com/nicolaprezza/ebwt2snp.

中文翻译:

通过 eBWT 位置聚类检测 SNP。

背景技术测序技术不断变得更便宜和更快,因此对设计用于有效存储原始数据并可能在其中执行分析的数据结构施加了越来越大的压力。在这种观点中,人们对仅使用(适当索引的)原始读取数据的无比对和无参考变体调用方法越来越感兴趣。结果 我们开发了位置聚类理论,该理论 (i) 描述了读取集合的扩展 Burrows-Wheeler 变换 (eBWT) 如何倾向于将覆盖相同基因组位置的碱基聚集在一起 (ii) 预测此类簇的大小,并且 ( iii) 展示了一种优雅而精确的基于 LCP 阵列的程序,用于在 eBWT 中定位此类集群。基于这一理论,我们设计并实现了一种无比对和无参考的 SNP 调用方法,我们设计了一个后续的 SNP 调用管道。对合成数据和真实数据的实验表明,可以通过简单扫描 eBWT 和 LCP 阵列来检测 SNP,因为根据我们的理论框架,它们位于读取的 eBWT 中的簇内。最后,我们的工具通过返回每个 SNP 的覆盖率来本质上对其准确性进行无参考评估。结数据。可用性 ebwt2snp 软件可免费用于学术用途:https://github.com/nicolaprezza/ebwt2snp。对合成数据和真实数据的实验表明,可以通过简单扫描 eBWT 和 LCP 阵列来检测 SNP,因为根据我们的理论框架,它们位于读取的 eBWT 中的簇内。最后,我们的工具通过返回每个 SNP 的覆盖率来本质上对其准确性进行无参考评估。结数据。可用性 ebwt2snp 软件可免费用于学术用途:https://github.com/nicolaprezza/ebwt2snp。对合成数据和真实数据的实验表明,可以通过简单扫描 eBWT 和 LCP 阵列来检测 SNP,因为根据我们的理论框架,它们位于读取的 eBWT 中的簇内。最后,我们的工具通过返回每个 SNP 的覆盖率来本质上对其准确性进行无参考评估。结数据。可用性 ebwt2snp 软件可免费用于学术用途:https://github.com/nicolaprezza/ebwt2snp。
更新日期:2019-11-01
down
wechat
bug