Data structures to represent a set of k-long DNA sequences,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data structures to represent a set of k-long DNA sequences
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-03-29 , DOI: arxiv-1903.12312
Rayan Chikhi, Jan Holub, and Paul Medvedev

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the last ten years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

中文翻译：

表示一组 k 长 DNA 序列的数据结构

生物测序数据的分析一直是字符串算法最大的应用之一。许多此类应用中使用的方法基于对 k-mers 的分析，k-mers 是数据集中存在的短固定长度字符串。虽然这些方法相当多样，但存储和查询 k-mer 集已成为共享的底层组件。一组 k-mers 具有独特的功能和应用，在过去的十年中，已经产生了许多专门的表示方法。在本次调查中，我们对已提出的用于存储和查询 k-mer 集的数据结构进行了统一的介绍和比较。我们希望这项调查将成为该领域研究人员的资源，并使该领域外的研究人员更容易接触到该领域。

更新日期：2020-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文