Digital information and the technologies that manage it are an essential part of our lives. Ever-present connectivity — and the data storage demands that come with it — are reaching a fever pitch. The projected material supply for silicon-based memory technology is unable to satisfy the demand of the next decades, therefore, industry and government alike are considering alternative materials1. For the past 3.5 billion years on Earth, DNA has served as life’s hard drive. Its use for molecular data storage was first proposed by Mikhail Neiman in the 1960s (ref. 2) and realized by Joe Davis in the late 1980s (ref. 3). After being popularized by George Church and Nick Goldman in the early 2010s (refs. 4,5), and guided by the Semiconductor Synthetic Biology Roadmap from the mid-2010s (ref. 6), DNA data storage is now an established field of research and a promising alternative to conventional data storage systems. Efficient and reliable access to particular information is a crucial function of any storage technology, which is difficult to implement when data is encoded by molecular components such as DNA. In this issue of Nature Materials, James L. Banal and colleagues describe how random access to information stored on DNA can be achieved without the need for polymerase chain reaction (PCR) amplification7.

From life to libraries, information must be well organized to be accessible. In physical libraries, organization is made possible by the Dewey Decimal Classification system, where a call number is assigned to each book. By knowing the call number, it is possible to access one book, among a population of them, on the basis of subject and without needing to sequentially read every volume. In this way, libraries are a prime example of random access, which is the ability to access a single, arbitrary datum among a population of information with equal time and efficiency, regardless of the population size. Because of the vast amounts of information they can store, the ability to randomly access information in a library of books is as important as a library of DNA.

Random access in a DNA library is often performed using PCR amplification, where the DNA sequences are indexed in the regions that flank the encoded data. Using a finite number of primers, select sequences can be amplified using PCR, among a population of sequences, and then sequenced to read the encoded information. Although PCR has demonstrated successful data retrieval, there are limitations8. For example, an increase in the number of indexes used results in a decrease in the amount of data within any one sequence. In addition, the amplification step consumes an aliquot of the sequence library, necessitating periodic library amplification. Proposed methods to reduce these limitations include toehold-barcoded magnetic particles to separate specific files and super resolution microscopy to read-out the encoded data9,10. While these approaches are PCR-free, the data extraction efficiency9 and density10 are currently limited. Overcoming these limitations is key to making random access scalable to the growing demands of the field11.

Banal and co-workers overcome the limitations of PCR amplification by encapsulating data-encoded files in barcoded silica beads, creating a random access data storage system7. Instead of amplifying the target sequence from the library through PCR, they select data files physically by sorting silica beads. The silica beads include multi-functional barcodes that enable conditional sorting based on Boolean logic (Fig. 1) This is equivalent to selecting an arbitrary book from a library with the added benefit of selecting collections of books of similar subject, author, year and so on. The outcome is a DNA library that has the functionality of a Google-like search engine. As a proof of concept, 20 image files (0.1 KB each) were encoded into DNA plasmids. A single image file was inserted in each plasmid and then encapsulated on the surface of the silica beads, using the DNA chemical encapsulation method developed by Grass and colleagues12. On the outer shell of the encapsulation, three unique single-stranded DNA sequences were linked, serving as barcodes to enable the sorting of selected beads/files. Using dye-labelled single-stranded DNA that was complementary to the barcodes, selected beads were fluorescently tagged and separated using fluorescence-activated sorting (FAS). The sorted beads were chemically de-encapsulated and the plasmid was sequenced to decode the data file. In contrast to PCR, this approach directly selects the file of interest, leaving intact all others that can be easily recombined into the original library.

Fig. 1: Random access data storage system without PCR amplification.
figure 1

The query of interest is retrieved by selecting specific fluorescent barcodes signals (here, ‘president’ and ‘18th century’ are probes) and only the file that contains those two tags is sorted out from the random library. The file consists of a plasmid encapsulated in an amorphous layer (grey corona) on the surface of silica beads (green circle). The beads present, on their encapsulation surface, three short DNA sequence barcodes (gold, yellow and orange). Using FAS, conditional sorting of the barcode combinations is possible, therefore, conditional logic can be performed on the retrieved files.

An important aspect of the file-indexing method of Banal and colleagues is its conditional sorting feature. By controlling multiple fluorescence channels in the FAS and having barcodes with three tags per bead, Boolean logic operations are enabled on retrieval of the image files. NOT, OR and AND operations are demonstrated with efficiency, introducing a first view of conditional logic retrieval in a DNA archival storage system. Importantly, the attachment of barcodes on the outer shell of the encapsulation prevents barcode file crosstalk that can lead to spurious amplification in PCR-based systems.

DNA is an emerging memory material because of its information density, information retention, energy of operation and programmability1. From Mikhail Neiman’s vision to Joe Davis’ realization, the information-bearing properties of DNA make it ideal for storing biological and/or digital information. While sincere economic, scalability and sustainability challenges must be overcome to bring DNA storage into real-life applications, Banal and co-workers have taken an important step forward by describing a PCR-free random access method that allows for conditional logic to sort data files made from DNA. The simplicity of their concept is both powerful and scalable, and the quality of their engineering approach is outstanding.