Trends in Biotechnology
Volume 39, Issue 10, October 2021, Pages 990-1003
Journal home page for Trends in Biotechnology

Review
Novel Modalities in DNA Data Storage

https://doi.org/10.1016/j.tibtech.2020.12.008Get rights and content

Highlights

  • Viable information storage in DNA is largely limited by cost and throughput.

  • Advances in synthesis and sequencing are key in driving adoption.

  • Different novel methods of storing information outside of nucleotide conversion are being explored.

  • The key workflows are not established, giving significant room for exploration.

  • The integration of molecular biology, engineering, and computing will drive further innovation.

The field of storing information in DNA has expanded exponentially. Most common modalities involve encoding information from bits into synthesized nucleotides, storage in liquid or dry media, and decoding via sequencing. However, limitations to this paradigm include the cost of DNA synthesis and sequencing, along with low throughput. Further unresolved questions include the appropriate media of storage and the scalability of such approaches for commercial viability. In this review, we examine various storage modalities involving the use of DNA from a systems-level perspective. We compare novel methods that draw inspiration from molecular biology techniques that have been devised to overcome the difficulties posed by standard workflows and conceptualize potential applications that can arise from these advances.

Section snippets

DNA as a Novel Information Storage Medium

Alternative methods of storing data are a burgeoning area of research due to the immense amount of data being generated every year. Existing storage formats are insufficient to accommodate this growth. Among various potential storage formats, ranging from storage in quantum bits to the metabolome [1., 2., 3., 4.], DNA has emerged as a promising material due to its immense storage density, its longevity, and the surfeit of tools developed for manipulating its properties.

The basic information

Encoding within Nucleotides

Most DNA storage methods store information in the form of nucleotide sequences (Figure 2A). The theoretical Shannon information capacity [12] of DNA is 2 bits/nt, although this limit is difficult to reach due to inherent errors in the DNA storage channel in the form of sequencing and synthesis error rates and biochemical constraints in sequence space [9,13]. As such, most work has been on encoding methods to optimize information redundancy. Other methods have explored ways such as using

Storage and Access

The process of storing information in DNA involves consideration of the environment in which the DNA generated is stored. The resilience of DNA is highly dependent on the storage medium in which it was kept and its inherent redundancy. Key traits of an appropriate storage medium include the ability to preserve the existing information, additional post-processing costs and steps, and physical space requirements. The ability of random access is also an important consideration in developing the

Reading

Advances in sequencing, the predominant form of information retrieval or ‘reading,’ drive the viability of DNA data storage. Predominant methods of reading DNA include sequencing by synthesis (SBS), along with third-generation sequencing methods involving single-molecule sequencing via nanopores or enzymatic well-based reactions, which have been reviewed extensively [69,70]. This section thus covers specific considerations of sequencing in DNA data storage and highlights advances made that can

Novel Applications of DNA Data Storage

The advent of new modalities reviewed in this paper has also led to the development of novel applications for DNA information storage beyond information archiving (Figure 1). One application is in the field of item authenticity and verification, whereby DNA that contains information on the veracity of an item is tagged onto the item itself. Multiple groups have developed ways of implementing DNA-based authentication, with tags being used in barcoding of oils [55], and for use in environmental

Concluding Remarks and Future Prospects

DNA has been the information store for biological information for millennia and has been the foundation of much work in modern biology. Now, its potential as a manipulatable material, coupled with the advent of technologies that can assist in this manipulation, has greatly expanded its applications. Data storage is but one of these applications. Novel advances in biotechnology have led to innovative approaches to storing data in DNA, and further innovations will only continue to increase as our

Acknowledgments

The authors thank Wen Xuan Er for her help in creation of the figures used in this paper. The authors are grateful for the support provided by the Synthetic Biology Initiative of the National University of Singapore (NUS) (DPRT/943/09/14), the Summit Research Program of the National University Health System (NUHSRO/2016/053/SRP/05), and an NUS startup grant (R-397-000-257-133).

Glossary

DNA computation
a field of research involving implementing computational processes on DNA strands.
DNA nanostructures
the concept of generating unique shapes and structures made solely from DNA by complementary base pairing of defined DNA strands.
Encoding and decoding
the algorithm used to convert binary information into another form, usually DNA nucleotide sequences.
Hachimoji DNA
a synthetic nucleic analog that comprises four novel nucleotides with unique base-pairing abilities.
Nanopores

References (111)

  • F.E. Kalff

    A kilobyte rewritable atomic memory

    Nat. Nanotechnol.

    (2016)
  • B.J. Cafferty

    Storage of information using small organic molecules

    ACS Cent. Sci.

    (2019)
  • C.E. Arcadia

    Multicomponent molecular memory

    Nat. Commun.

    (2020)
  • J.K. Rosenstein

    Principles of information storage in small-molecule mixtures

    IEEE Trans. Nanobiosci.

    (2020)
  • G.M. Church

    Next-generation digital information storage in DNA

    Science

    (2012)
  • N. Goldman

    Towards practical, high-capacity, low-maintenance information storage in synthesized DNA

    Nature

    (2013)
  • L. Ceze

    Molecular digital data storage using DNA

    Nat. Rev. Genet.

    (2019)
  • L.C. Meiser

    Reading and writing digital data in DNA

    Nat. Protoc.

    (2020)
  • R. Heckel

    A characterization of the DNA data storage channel

    Sci. Rep.

    (2019)
  • C.N. Takahashi

    Demonstration of end-to-end automation of DNA data storage

    Sci. Rep.

    (2019)
  • Y. Dong

    DNA storage: research landscape and future prospects

    Natl. Sci. Rev.

    (2020)
  • C.E. Shannon

    A mathematical theory of communication

    Bell Syst. Tech. J.

    (1948)
  • L. Organick

    Probing the physical limits of reliable DNA data retrieval [published correction appears in Nat. Commun. (2020) 11, 1080]

    Nat. Commun.

    (2020)
  • Y. Choi

    High information capacity DNA-based data storage with augmented encoding characters using degenerate bases

    Sci. Rep.

    (2019)
  • L. Anavy

    Data storage in DNA with fewer synthesis cycles using composite DNA letters

    Nat. Biotechnol.

    (2019)
  • D.A. Malyshev

    Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet

    Proc. Natl. Acad. Sci. U. S. A.

    (2012)
  • S. Hoshika

    Hachimoji DNA and RNA: a genetic system with eight building blocks

    Science

    (2019)
  • N. Roquet

    Catalog Technologies, Inc.

  • J. Bonnet

    Rewritable digital data storage in live cells via engineered control of recombination directionality

    Proc. Natl. Acad. Sci. U. S. A.

    (2012)
  • L. Yang

    Permanent genetic memory with >1-byte capacity

    Nat. Methods

    (2014)
  • M.G.T.A. Rutten

    Encoding information into polymers

    Nat. Rev. Chem.

    (2018)
  • S. Kosuri et al.

    Large-scale de novo DNA synthesis: technologies and applications

    Nat. Methods

    (2014)
  • V. Zhirnov

    Nucleic acid memory

    Nat. Mater.

    (2016)
  • R.A. Hughes et al.

    Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology

    Cold Spring Harb. Perspect. Biol.

    (2017)
  • P.L. Antkowiak

    Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction

    Nat. Commun.

    (2020)
  • E.M. LeProust

    Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process

    Nucleic Acids Res.

    (2010)
  • H. Lee

    A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform

    Nat. Commun.

    (2015)
  • B. Hwang et al.

    Toward a new paradigm of DNA writing using a massively parallel sequencing platform and degenerate oligonucleotide

    Sci. Rep.

    (2016)
  • H. Lim

    Highly selective retrieval of accurate DNA utilizing a pool of in situ-replicated DNA from multiple next-generation sequencing platforms

    Nucleic Acids Res.

    (2018)
  • N.C. Seeman et al.

    DNA nanotechnology

    Nat. Rev. Mater.

    (2018)
  • J. Li

    Engineering nucleic acid structures for programmable molecular circuitry and intracellular biocomputation

    Nat. Chem.

    (2017)
  • P. Hunter

    Nucleic acid-based nanotechnology

    EMBO Rep.

    (2018)
  • K. Halvorsen et al.

    Binary DNA nanostructures for data encryption

    PLoS One

    (2012)
  • A.R. Chandrasekaran

    Addressable configurations of DNA nanostructures for rewritable memory

    Nucleic Acids Res.

    (2017)
  • K. Chen

    Digital data storage using DNA nanostructures and solid-state nanopores

    Nano Lett.

    (2019)
  • K. Chen

    Nanopore-based DNA hard drives for rewritable and secure data storage

    Nano Lett.

    (2020)
  • S.K. Tabatabaei

    DNA punch cards for storing data on native DNA sequences via enzymatic nicking

    Nat. Commun.

    (2020)
  • C. Mayer

    An epigenetics-inspired DNA-based data storage system

    Angew. Chem. Int. Ed.

    (2016)
  • T. Lindahl et al.

    Rate of depurination of native deoxyribonucleic acid

    Biochemistry

    (1972)
  • G.P. Pfeifer

    Mutations induced by ultraviolet light

    Mutat. Res. Mol. Mech. Mutagen.

    (2005)
  • S.M.H.T. Yazdi

    A rewritable, random-access DNA-based storage system

    Sci. Rep.

    (2015)
  • J. Bornholt

    A DNA-based archival storage system

  • S.M.H.T. Yazdi

    Portable and error-free DNA-based data storage

    Sci. Rep.

    (2017)
  • L. Organick

    Random access in large-scale DNA data storage

    Nat. Biotechnol.

    (2018)
  • R. Lopez

    DNA assembly for nanopore data storage readout

    Nat. Commun.

    (2019)
  • X. Song

    Multidimensional data organization and random access in large-scale DNA storage systems

    bioRxiv

    (2019)
  • K.J. Tomek

    Driving the scalability of DNA-based information storage systems

    ACS Synth. Biol.

    (2019)
  • P. Gill et al.

    Nucleic acid isothermal amplification technologies – a review

    Nucleosides Nucleotides Nucleic Acids

    (2008)
  • K.N. Lin

    Dynamic and scalable DNA-based information storage

    Nat. Commun.

    (2020)
  • E. Wan

    Green technologies for room temperature nucleic acid storage

    Curr. Issues Mol. Biol.

    (2010)
  • Cited by (22)

    • Encoding of non-biological information for its long-term storage in DNA

      2022, BioSystems
      Citation Excerpt :

      Long-term in vitro storage includes the following stages (Fig. 1): conversion of information into nucleotide sequences (encoding, stage 1), synthesis of oligos and formation of an oligotheca (stage 2), storage of an oligotheca under stable conditions (stage 3), obtaining of sufficient amount of informative DNA by amplification and their sequencing (stage 4), recovery of digital information from nucleotide sequences (decoding, stage 5). Some excellent reviews devoted to DNA data storage have recently been published (De Silva and Ganegoda, 2016; Akram et al., 2018; Panda et al., 2018; Ceze et al., 2019; Ping et al., 2019; Lim et al., 2021; Xu et al., 2021). However, the issue of encoding digital information by nucleotides has not been given proper attention.

    • A brief review on DNA storage, compression, and digitalization

      2022, Nano Communication Networks
      Citation Excerpt :

      This novel storage architecture was achieved by encapsulating DNA using silica beads and mixing the beads with the material used to shape the target object. This essentially allows the ability to store information as DNA in any physical object and potentially opens up new areas for DNA storage, molecular identification, and DNA steganography [46]. Further exciting applications of DNA data storage can be seen in the rise of DNA computation.

    View all citing articles on Scopus
    View full text