当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An optimized FM-index library for nucleotide and amino acid search
bioRxiv - Bioinformatics Pub Date : 2021-10-08 , DOI: 10.1101/2021.01.12.426474
Tim Anderson , Travis J Wheeler

Pattern matching is a key step in a variety of biological sequence analysis pipelines. The FM-index is a compressed data structure for pattern matching, with search run time that is independent of the length of the database text. We present AvxWindowedFMindex (AWFM-index), an open-source, thread-parallel FM-index library written in C that is optimized for indexing nucleotide and amino acid sequences. AWFM-index is easy to incorporate into bioinformatics software and is able to perform exact match count and locate queries ~2-4x faster than SeqAn3’s FM-index implementation for nucleotide search, and ~2-6x faster for amino acid search in a single-threaded context. This performance is due to (i) a new approach to storing FM-index data in a strided bit-vector format that enables extremely efficient computation of the FM-index occurrence function via AVX2 bitwise instructions, and (ii) inclusion of a cache-efficient lookup table for partial k-mer searches. AWFM-index also trivially parallelizes to multiple threads with good scaling, and enables efficient on-disk storage of the memory-intensive suffix array. The open-source library is available for download at https://github.com/TravisWheelerLab/AvxWindowFmIndex.

中文翻译:

用于核苷酸和氨基酸搜索的优化 FM-index 库

模式匹配是各种生物序列分析流程中的关键步骤。FM-index 是一种用于模式匹配的压缩数据结构,其搜索运行时间与数据库文本的长度无关。我们提出了 AvxWindowedFMindex (AWFM-index),这是一个用 C 编写的开源线程并行 FM-index 库,它针对索引核苷酸和氨基酸序列进行了优化。AWFM-index 很容易整合到生物信息学软件中,并且能够执行精确匹配计数和定位查询,比 SeqAn3 的 FM-index 实现核苷酸搜索快 2-4 倍,单次氨基酸搜索快 2-6 倍。线程上下文。这种性能是由于 (i) 一种以跨步位向量格式存储 FM-index 数据的新方法,该方法可以通过 AVX2 按位指令极其高效地计算 FM-index 发生函数,以及 (ii) 包含缓存-部分 k-mer 搜索的高效查找表。AWFM-index 还可以轻松地并行化到具有良好扩展性的多个线程,并实现内存密集型后缀数组的高效磁盘存储。开源库可从 https://github.com/TravisWheelerLab/AvxWindowFmIndex 下载。
更新日期:2021-10-11
down
wechat
bug