Review articleLong-read sequencing to understand genome biology and cell function
Introduction
Massively parallel sequencing, also known as next-generation sequencing (NGS) came as disruptive innovation into the field of life science. Within a couple of years, NGS led to a dramatic increase in knowledge on genomes of different organisms, their architecture, function, and genetic variation down to single-cell level (Shendure et al., 2017). Various methods based on semiconductors (Ion Torrent), pyrosequencing (454 Life Science, Roche), sequencing by ligation (Applied Biosystems), and sequencing by synthesis with reversible terminators (Solexa, Illumina) allowed fast and precise DNA and RNA sequencing (Metzker, 2010). However, short-read sequencing methods have shortcomings in their capability to investigate complex genomes, repetitive elements, full-length transcripts, or native base modifications. Several of the current limitations can be overcome by long-read technologies (third-generation sequencing technologies, TGS). In the following we will discuss the applications of long-read sequencing to understand genome function. The review focuses on the technical applications of long-read methods, which can be applied to the most diverse questions in cell biology.
Section snippets
Nanopore sequencing
The original idea of analyzing nucleotide sequences with nanopores was born in the 1980s, but it took more than 30 years for the technology to reach market maturity (Company: Oxford Nanopore Technologies, ONT) (Deamer et al., 2016; Kasianowicz and Bezrukov, 2016). In Nanopore sequencing a current is applied over a tiny pore to driving an ion flow. Each molecule entering the pore interferes with the ion flow and therefore induces a characteristic and measurable change in the current. ONT
Single molecule real-time (SMRT) sequencing
SMRT (single molecule real-time) sequencing from Pacific Bioscience (PacBio) also provides long reads of native DNA. The method relies on fluorescence-labeled nucleotides incorporated by a polymerase which is immobilized at the bottom of so called ZMWs (zero-mode waveguides). These picoliter-sized wells are assembled on a flow cell and allow the detection of fluorescence signals from millions of molecules in parallel. In contrast to NGS methods the incorporation of nucleotides is detected in
Other long-read/ cytogenetic technologies
Synthetic long-read technologies provide alternative methods to obtain information on long DNA fragements. Methods such as linked-read sequencing (10x Genomics) and stLFR (MGI) allow the in silico assembly of long sequences from short-read NGS data. Moreover next-generation cytogenetics enables to analyze single DNA strands at megabase scale. Optical mapping approaches (Bionano) and molecular combing techniques (Genomic Vision) are amongst these novel cytogenetic approaches. Bionano utilizes
Structural variations, complex haplotypes and chromosomal rearrangements
Structural variations (SV) are a rich source for genome evolution and inter-individual variation, but acquired SVs can also drive pathological processes such as cancer development. SVs including copy number variants (deletions, amplifications) can be detected by comparative genomic hybridization approaches (SNP-arrays, CGH-arrays) and to a certain extend by short-read sequencing methods. However, complex structural rearrangements, inversions, balanced chromosomal translocations and other copy
Repeat architecture
The size and structure of many repetitive regions of genomes is hardly accessible with short-read sequencing technologies (Tørresen et al., 2019). However, an increasing number of repetitive elements has been linked to human diseases, which has led to a growing interest in the study of these regions (Hagerman et al., 2017; McColgan and Tabrizi, 2018; Paulson, 2018). Long-read sequencing enables their analysis in a single read and thus the exact determination of length, composition, and repeat
Epigenetic regulation
Over 150 types of base modifications have been described so far (Xu and Seki, 2020). These modifications are crucial in many aspects of biology, including development, cellular maintenance, ageing, or cancer. However, available sequencing technologies allowed only limited insight into nucleic acid modifications. Because base modifications lead to characteristic changes in the current profiles when the respective bases are pulled through nanopores, the method detects various chemical
RNA sequencing, alternative splicing, and single cell sequencing
Alternative splicing of mRNAs is a mechanism to increase protein diversity and function. Nanopore and SMRT sequencing allow to determine entire transcripts within single reads, which provides a comprehensive view on isoforms and splicing events (Soneson et al., 2019). The power of long read sequencing in RNA analysis is underlined by the fact that over 50 % of the identified isoforms from Nanopore sequencing transcriptome analyses are not covered by short read sequencing datasets (Workman et
De novo genome assembly
An important application of long-read sequencing is the de novo assembly of prokaryotic and eukaryotic genomes (van Dijk et al., 2018v). Especially in polyploid organisms such as wheat or Xenopus species and in regions of low complexity the long reads facilitate correct genome assembly to large continuous contigs (Genova et al., 2019; Kapustová et al., 2019; Schmid et al., 2018; Schmidt et al., 2017; Shin et al., 2019; Wang et al., 2019). De novo assemblies are possible without laborious BAC or
Challenges of long-read sequencing
Preparing DNA for long-read sequencing has several pitfalls in terms of obtaining optimal sequencing libraries. Size-selection can be an issue since very large DNA molecules tent to block nanopores and very short molecules reduces the overall sequencing output. Moreover, libraries from freshly isolated DNA/RNA produce a higher output due to less degradation and oxidation compared to long-term stored samples. Furthermore, sample purity is an issue due to the high input of DNA for long-read
Acknowledgements
The authors have no competing interests.
References (77)
- et al.
Single-molecule sequencing: towards clinical applications
Trends Biotechnol.
(2019) - et al.
A nanopore sequencing-based assay for rapid detection of gene fusions
J. Mol. Diagn.
(2019) Repeat expansion diseases
Handb. Clin. Neurol.
(2018)- et al.
The third revolution in sequencing technology
Trends Genet.
(2018) - et al.
Opportunities and challenges in long-read sequencing data analysis
Genome Biol.
(2020) - et al.
Detecting AGG interruptions in male and female FMR1 premutation carriers by single-molecule sequencing
Hum. Mutat.
(2017) - et al.
Long read sequencing of 1,817 Icelanders provides insight into the role of structural variants in human disease
bioRxiv
(2019) - et al.
Multi-platform discovery of haplotype-resolved structural variation in human genomes
Nat. Commun.
(2019) - et al.
Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome 2
Nat. Commun.
(2019) - et al.
Mapping and phasing of structural variation in patient genomes using nanopore sequencing
Nat. Commun.
(2017)
Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome
bioRxiv
Three decades of nanopore sequencing
Nat. Biotechnol.
A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping
Nat. Commun.
Sequencing smart: de novo sequencing and assembly approaches for a non-model mammal
Gigascience
Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing
Acta Neuropathol.
Unstable TTTTA/TTTCA expansions in MARCH6 are associated with familial adult myoclonic epilepsy type 3
Nat. Commun.
Direct detection of DNA methylation during single-molecule, real-time sequencing
Nat. Methods
Highly parallel direct RNA sequencing on an array of nanopores
Nat. Methods
WENGAN: efficient and high quality hybrid assembly of human genomes
bioRxiv
Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing
Nat. Biotechnol.
Using long-read sequencing to detect imprinted DNA methylation
Nucleic Acids Res.
Picky comprehensively detects high-resolution structural variants in nanopore long reads
Nat. Methods
Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells
Nat. Biotechnol.
Fragile X syndrome
Nat. Rev. Dis. Primers
Mapping DNA replication with nanopore sequencing
bioRxiv
Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy
Nat. Genet.
Linear assembly of a human centromere on the Y chromosome
Nat. Biotechnol.
The dark matter of large cereal genomes: long tandem repeats
Int. J. Mol. Sci.
Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and nanopore sequencing
bioRxiv
Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers with nanopore or PacBio sequencing
bioRxiv
On’ three decades of nanopore sequencing’
Nat. Biotechnol.
Identification of DNA base modifications by means of pacific biosciences RS sequencing technology
Methods Mol. Biol.
Novel familial distal imprinting centre 1 (11p15.5) deletion provides further insights in imprinting regulation
Clin. Epigenetics
Alignment-free poly(A) length measurement for oxford nanopore RNA and DNA sequencing
RNA
De novo Nanopore read quality improvement using deep learning
BMC Bioinform.
High throughput, error corrected nanopore single cell transcriptome sequencing
bioRxiv
Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing
bioRxiv
FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control
Nat. Methods
Cited by (25)
Advancement in research on genes associated with fetal congenital heart disease (CHD) and diagnostic testing methods
2023, Gynecology and Obstetrics Clinical MedicineComparative genomic analysis provides insight into the phylogeny and potential mechanisms of adaptive evolution of Sphingobacterium sp. CZ-2
2023, GeneCitation Excerpt :Blue represented the estimated species divergence time, green represented the number of expanded gene families, and red represented the number of contracted gene families. NGS sequencing technology has brought a revolution in sequencing, enriching our study of gene structure and function with advantages such as low cost and high accuracy, but it still has serious limitations (Kraft and Kurth, 2020). The short reads generated by NGS sequencing platform require the use of specialized bioinformatics tools and complex post-processing pipelines, which make the manipulation of high-throughput data more difficult and increase the average time of analysis (Athanasopoulou et al., 2021).
Long-read sequencing reveals oncogenic mechanism of HPV-human fusion transcripts in cervical cancer
2023, Translational ResearchCitation Excerpt :Pacific Biosciences Isoform sequencing (PacBio Iso-seq) allows us to obtain full-length cDNA sequences without contig assembly and increases the accuracy from ∼90% up to 99.8% through a high-precision protocol-circular consensus sequencing. Therefore, it is suitable for reliable characterization of complete transcript isoforms across the entire transcriptome or within certain targeted genes.17,18 Though third-generation sequencing technologies have been adopted to profile the full-length transcriptome for some cancers,19-21 it has not been reported in cervical cancer yet.
Chimera: The spoiler in multiple displacement amplification
2023, Computational and Structural Biotechnology JournalThird-generation sequencing: A novel tool detects complex variants in the α-thalassemia gene
2022, GeneCitation Excerpt :In high-fidelity (HiFi) read detection mode, its single-molecule read-length can be longer than 10 k bp, and its accuracy can be > 99.9% if the detection depth exceeds 30× (Nurk et al., 2020). Due to its long-read length and high detection accuracy, this technology is suitable for detecting rearrangements and copy number variants of nucleic acid sequences (Kraft and Kurth, 2020). Because there are long fragments of homologous regions in the α-globin gene cluster, there are individual carriers of gene structure variants in the population in this region (Galanello and Cao, 2011).
Approaches towards understanding the mechanism-of-action of metallodrugs
2022, Coordination Chemistry ReviewsCitation Excerpt :Similarly, advancements in sequencing technologies are still helpful for compounds favoring binding with nucleic acids. One could imagine a more direct detection of the DNA-metal adducts in genome-scale, resembling what has been achieved by the third-generation/long-read sequencing methods like SMRT (Single Molecule Real-Time) [280] or Nanopore [281] for DNA modifications [282]. However, merely analyzing cell lines in petrol dishes cannot explain the miscellaneous modes in humans (or animals).