当前位置: X-MOL 学术3 Biotech › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Low-complexity and highly robust barcodes for error-rich single molecular sequencing
3 Biotech ( IF 2.6 ) Pub Date : 2021-01-16 , DOI: 10.1007/s13205-020-02607-5
Weigang Chen 1, 2 , Panpan Wang 1 , Lixia Wang 1 , Dalu Zhang 3 , Mingzhe Han 4 , Mingyong Han 5, 6 , Lifu Song 2, 4
Affiliation  

DNA barcodes are frequently corrupted due to insertion, deletion, and substitution errors during DNA synthesis, amplification and sequencing, resulting in index hopping. In this paper, we propose a new DNA barcode construction scheme that combines a cyclic block code with a predetermined pseudo-random sequence bit by bit to form bit pairs, and then converts the bit pairs to bases, i.e., the DNA barcodes. Then, we present a barcode identification scheme for noisy sequencing reads, which uses a combination of cyclic shifting and traditional dynamic programming to mark the insertion and deletion positions, and then performs erasure-and-error-correction decoding on the corrupted codewords. Furthermore, we verify the identification error rate of barcodes for multiple errors and evaluate the reliability of the barcodes in DNA context. This method can be easily generalized for constructing long barcodes, which may be used in scenarios with serious errors. Simulation results show that the bit error rate after identifying insertions/deletions is greatly reduced using the combination of cyclic shift and dynamic programming compared to using dynamic programming only. It indicates that the proposed method can effectively improve the accuracy for estimating insertion/deletion errors. And the overall identification error rate of the proposed method is lower than \(10^{ - 5}\) when the probability of each base mutation is less than 0.1, which is the typical scenario in third-generation sequencing.



中文翻译:

用于错误丰富的单分子测序的低复杂性和高度稳健的条形码

由于 DNA 合成、扩增和测序过程中的插入、删除和替换错误,DNA 条形码经常被破坏,从而导致索引跳跃。在本文中,我们提出了一种新的DNA条形码构建方案,将循环块代码与预定的伪随机序列逐位组合形成位对,然后将位对转换为碱基,即DNA条形码。然后,我们提出了一种噪声测序读取的条形码识别方案,该方案使用循环移位和传统动态规划的组合来标记插入和删除位置,然后对损坏的码字进行擦除和纠错解码。此外,我们验证了条码识别错误率的多个错误,并评估了条码在 DNA 上下文中的可靠性。这种方法可以很容易地推广到构建长条码,可以用于错误严重的场景。仿真结果表明,与仅使用动态规划相比,使用循环移位和动态规划相结合可以大大降低识别插入/删除后的误码率。这表明所提出的方法可以有效地提高估计插入/删除错误的准确性。并且该方法的整体识别错误率低于 这表明所提出的方法可以有效地提高估计插入/删除错误的准确性。并且该方法的整体识别错误率低于 这表明所提出的方法可以有效地提高估计插入/删除错误的准确性。并且该方法的整体识别错误率低于\(10^{ - 5}\)当每个碱基突变的概率小于 0.1 时,这是三代测序中的典型场景。

更新日期:2021-01-18
down
wechat
bug