Refining the r-index,Theoretical Computer Science

当前位置： X-MOL 学术 › Theor. Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Refining the r-index
Theoretical Computer Science ( IF 1.1 ) Pub Date : 2019-08-07 , DOI: 10.1016/j.tcs.2019.08.005
Hideo Bannai , Travis Gagie , Tomohiro I

Gagie, Navarro and Prezza's r-index (SODA, 2018) promises to speed up DNA alignment and variation calling by allowing us to index entire genomic databases, provided certain obstacles can be overcome. In this paper we first strengthen and simplify Policriti and Prezza's Toehold Lemma (DCC '16; Algorithmica, 2017), which inspired the r-index and plays an important role in its implementation. We then show how to update the r-index efficiently after adding a new genome to the database, which is likely to be vital in practice. As a by-product of this result, we obtain an online version of Policriti and Prezza's algorithm for constructing the LZ77 parse from a run-length compressed Burrows-Wheeler Transform. Our experiments demonstrate the practicality of all three of these results. Finally, we show how to augment the r-index such that, given a new genome and fast random access to the database, we can quickly compute the matching statistics and maximal exact matches of the new genome with respect to the database.

中文翻译：

细化r索引

Gagie，Navarro和Prezza的r- index（SODA，2018）承诺通过允许我们索引整个基因组数据库来加速DNA比对和变异调用，前提是可以克服某些障碍。在本文中，我们首先加强和简化了Policriti和Prezza的Toehold Lemma（DCC '16; Algorithmica，2017），这启发了r索引并在其实现中发挥了重要作用。然后，我们展示如何更新r在向数据库中添加新基因组后，对索引进行有效索引，这在实践中可能至关重要。作为此结果的副产品，我们获得了Policriti和Prezza算法的在线版本，该算法用于从行程压缩的Burrows-Wheeler变换构造LZ77解析。我们的实验证明了所有这三个结果的实用性。最后，我们展示了如何增强r索引，从而在给定新基因组和对数据库的快速随机访问的情况下，我们可以相对于数据库快速计算匹配统计信息和新基因组的最大精确匹配。

更新日期：2019-08-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>