CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table.,Briefings in Bioinformatics

当前位置： X-MOL 学术 › Brief. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-05-21 , DOI: 10.1093/bib/bbaa063
Jianan Wang , Su Chen , Lili Dong , Guohua Wang

MOTIVATION Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. RESULTS We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.

中文翻译：

CHTKC：一种基于无锁链式哈希表的健壮高效的 k-mer 计数算法。

动机计算 DNA 序列中每个长度为 k 的子串的出现频率是许多生物信息学应用中的一项常见任务，包括基因组组装、纠错和序列比对。尽管问题很简单，但对具有高测序深度或大基因组大小的数据集进行有效计数是一个挑战。结果我们提出了一种稳健有效的方法 CHTKC，通过使用链表解决冲突的无锁哈希表来解决 k-mer 计数问题。我们还设计了新机制来优化内存使用并处理内存不足以容纳所有 k-mer 的情况。CHTKC 已经在多个内存使用场景下的七个数据集上进行了彻底的测试，并与 Jellyfish2 和 KMC3 进行了比较。

更新日期：2020-05-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11