当前位置:
X-MOL 学术
›
arXiv.cs.DS
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Data structures to represent a set of k-long DNA sequences
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-03-29 , DOI: arxiv-1903.12312 Rayan Chikhi, Jan Holub, and Paul Medvedev
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-03-29 , DOI: arxiv-1903.12312 Rayan Chikhi, Jan Holub, and Paul Medvedev
The analysis of biological sequencing data has been one of the biggest
applications of string algorithms. The approaches used in many such
applications are based on the analysis of k-mers, which are short fixed-length
strings present in a dataset. While these approaches are rather diverse,
storing and querying a k-mer set has emerged as a shared underlying component.
A set of k-mers has unique features and applications that, over the last ten
years, have resulted in many specialized approaches for its representation. In
this survey, we give a unified presentation and comparison of the data
structures that have been proposed to store and query a k-mer set. We hope this
survey will serve as a resource for researchers in the field as well as make
the area more accessible to researchers outside the field.
中文翻译:
表示一组 k 长 DNA 序列的数据结构
生物测序数据的分析一直是字符串算法最大的应用之一。许多此类应用中使用的方法基于对 k-mers 的分析,k-mers 是数据集中存在的短固定长度字符串。虽然这些方法相当多样,但存储和查询 k-mer 集已成为共享的底层组件。一组 k-mers 具有独特的功能和应用,在过去的十年中,已经产生了许多专门的表示方法。在本次调查中,我们对已提出的用于存储和查询 k-mer 集的数据结构进行了统一的介绍和比较。我们希望这项调查将成为该领域研究人员的资源,并使该领域外的研究人员更容易接触到该领域。
更新日期:2020-06-15
中文翻译:
表示一组 k 长 DNA 序列的数据结构
生物测序数据的分析一直是字符串算法最大的应用之一。许多此类应用中使用的方法基于对 k-mers 的分析,k-mers 是数据集中存在的短固定长度字符串。虽然这些方法相当多样,但存储和查询 k-mer 集已成为共享的底层组件。一组 k-mers 具有独特的功能和应用,在过去的十年中,已经产生了许多专门的表示方法。在本次调查中,我们对已提出的用于存储和查询 k-mer 集的数据结构进行了统一的介绍和比较。我们希望这项调查将成为该领域研究人员的资源,并使该领域外的研究人员更容易接触到该领域。