Abstract
Nanopore sequencing is a powerful single molecule DNA sequencing technology which offers high throughput and long sequence reads. Nevertheless, its high native error rate limits the direct detection of point mutations in individual reads of amplicon libraries, as these mutations are difficult to distinguish from the sequencing noise.
In this work, we developed SINGLe (SNPs In Nanopore reads of Gene Libraries), a computational method to reduce the noise in nanopore reads of amplicons containing point variations. Our approach uses the fact that all reads are very similar to a wild type sequence, for which we experimentally characterize the position-specific systematic sequencing error pattern. We then use this information to reweight the confidence given to nucleotides that do not match the wild type in individual variant reads. We tested this method in a set of variants of KlenTaq, where the true mutation rate was well below the sequencing noise. SINGLe improves between 4 and 9 fold the signal to noise ratio, in comparison to the data returned by the basecaller guppy. Downstream, this approach improves variants clustering and consensus calling.
SINGLe is simple to implement and requires only a few thousands reads of the wild type sequence of interest, which can be easily obtained by multiplexing in a single minION run. It does not require any modification in the experimental protocol, it does not imply a large loss of sequencing throughput, and it can be incorporated downstream of standard basecalling.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
- Renamed R package. - Updated the R package to make it simpler to use. - Improved pipeline for downstream analysis. - Improved benchmark.
List of abbreviations
- ePCR
- error-prone PCR
- PCR
- polymerase chain reaction
- SINGLe
- SNIPs In Nanopore reads of Gene Libraries
- SNP
- single nucleotide polymorphisms
- Qscore
- Quality score
- VCS
- Variant consensus sequence