CRISPR-Cas9 has become a widely adopted tool for gene editing and gene expression suppression/activation. First discovered in bacteria as a defensive mechanism against phage infection, CRISPR-Cas9 can also be introduced in mammalian system since the DNA-repairing mechanisms are evolutionally conserved throughout all organisms. The real attraction of CRISPR-Cas9 is that the mutation can be pre-designed and targeted to specific locations anywhere in the genome, making gene editing “programmable”. The core of CRISPR-Cas9 system is composed of the Cas9 endonuclease and a single-guide RNA (sgRNA) with a 20-base user-defined spacer sequence. This sgRNA can lead Cas9 nuclease to any a genomic locus that both matches the spacer sequence and has a protospacer adjacent motif (PAM) immediately downstream. That said, limitations apply. The sgRNA is prone to leading the Cas9 to genomic loci sharing sequence similarity with the sgRNA spacer. Therefore, off-target effect or the elimination of which has become the major huddle for transforming this valuable discovery into a reliable weapon for fighting human diseases.

It is understood that Cas9 off-target nuclease activity is likely a result of the evolutional arms-race between bacteria and viruses. Bacteria that survived virus infection store characteristic virus genetic sequences as spacers which can be transcribed into RNA to guide the Cas nuclease to degrade virus genetic elements containing the same sequence. This process provides bacteria an adaptive immune system to resist repeated phage invasion. Since phages’ genetic sequence inherently mutates, some of these variations would escape Cas nuclease attack. In return, the bacterial CRISPR-Cas system evolves by allowing Cas nuclease to target sequences sharing certain degrees of homology with the spacer sequence.

Off-target genome editing not only can introduce uncertainty into scientific discoveries about gene functions, but also confounds and emasculates the therapeutic applications of CRISPR-Cas9. All the strategies to battle off-target genome editing can be grouped into five categories, namely (1) computational and mathematical predication, (2) experimental off-target cleavage validation, (3) Cas9-sgRNA delivery modification, (4) high-fidelity SpCas9 engineering, and (5) guide RNA engineering.

The early computational methods to predict potential genome off-target loci were built upon the data collected using SpCas9, the most commonly used type II CRISPR-associated nuclease (Fu et al. 2013; Hsu et al. 2013; Jinek et al. 2012; Lin et al. 2014; Pattanayak et al. 2013; Tsai et al. 2015). Though some details of these findings contradict with each other, a consensus has been reached: (1) Off-target effects decrease when the number of mismatches (including both base mismatches and bulges) between sgRNA and target sequence increases; (2) Cas9 is less tolerant of mismatches proximal to PAM. Later methods were able to achieve better predication because they incorporated new knowledge of Cas9 domains, especially the related energetics parameters. (Alkan et al. 2018; Klein et al. 2018; Xu et al. 2017; Young et al. 2019; Zhang et al. 2019). In 2017, using an innovative computer algorithm, we scanned the entire protein-coding regions of human genome for best sgRNA designs (Zhou et al. 2017). Among almost two million sgRNA designed, only two can pass a stringent off-target filter. However, if we exclude off-target loci outside gene sequences and take into consideration the weaker activity of the secondary PAMs, 89% protein-coding genes can have at least 1 and 54% can have at least 10 designed sgRNAs free of potential off-target loci.

Computational tools can exhaust all possible off-target sites, but such predictions must be further validated experimentally since a large portion of predicted or identified in vitro off-target sequences do not cause in vivo cleavages (Fu et al. 2013; Hsu et al. 2013; Pattanayak et al. 2013). Whole genome sequencing is the most comprehensive approach to detect off-target mutations, but its high cost makes it impractical. Both GUIDE-Seq and DISCOVER-Seq provide a reliable and efficient alternative for genome-wide off-target cleavage detection based on next generation sequencing, but the two methods apply different techniques to capture DNA fragments generated from double-strand breaks. (Tsai et al. 2015; Wienert et al. 2019). For an even more cost-effective validation, we have created a searchable database of PCR primer pairs that span the potential off-target cleavage sites inside protein-coding sequences across human genome (www.pbsgweb.com).

The third strategy is based on the speculation that in a cell, there is only one Cas9-sgRNA complex that can occupy the target and additional complexes can only bind off-target substrates. Thus, limiting either the Cas9 or sgRNA concentration, or both, can reduce off-target effect. This hypothesis is supported by several studies (Fu et al. 2013; Hsu et al. 2013; Tsai et al. 2015). In another attempt, Kim et al. managed to premix Cas9 protein and sgRNA to form a complex (RNA complex) in vitro. Then the RNP complex is delivered directly into mammalian cells (Kim et al. 2014). The fact that RNP is rapidly degraded in cells, the authors claimed, contributes to the achieved higher specificity. Taking a different approach, Davis and collaborators fused a protein into Cas9 to disrupt its activity such that its nuclease function is activated only in the presence of a small molecule. They found that this inducible Cas9 system can reduce off-target effect at the expense of some on-target efficacy (Davis et al. 2015).

The double nicking technique employs two different Cas9 nickases, each engineered to break a separate DNA strand. Both nickases must be simultaneously present and proximal to each other to achieve a pseudo double-stranded DNA break. Since it is unlikely to have two off-target sites near each other, double nicking is the first method that greatly minimizes off-target genome editing (Cho et al. 2014; Ran et al. 2013), though it also significantly reduces the on-target efficacy. That said, a recent publication reported unexpected on-target mutagenesis in applying this method, raising safety concern (Alateeq et al. 2018). Another line of rational design of high-fidelity SpCas9 aims to reduce the energetics of the interaction between Cas9-sgRNA complex and its target DNA by introducing amino substitution(s) into the Cas9 protein. The first two high-fidelity SpCas9 mutants are eSpCas9 (Slaymaker et al. 2016) and SpCas9-HF1 (Kleinstiver et al. 2016), both can significantly improve target specificity and remain robust on-target cleavage. They cannot, however, completely abolish genome-wide off-target activities, which is motivating the creation of other high-fidelity SpCas9 variants including HypaCas9, HeFSpCas9, xCas9, SpCas9-NG, EvoCas9, and Sniper-Cas9 (Casini et al. 2018; Chen et al. 2017; Hu et al. 2018; Kulcsár et al. 2017; Lee et al. 2019; Nishimasu et al. 2018). Among them, SpCas9-NG can recognize NG PAMs and xCas9 can recognize NG, GAA, and GAT PAMS. These two variants broaden the DNA-targeting spectrum.

Engineering guide RNAs provide another feasible strategy. Using 5′-truncated sgRNA can improve specificity but would also undermine the on-target effect (Fu et al. 2014). Kocak et al. extended the 5′-end of sgRNA spacer to obtain hp-sgRNAs that can form a short hairpin structure at the 5′ terminal (Kocak et al. 2019). The hairpin structure elevates the energetics requirement for R-loop formation which is a critical step for DNA cleavage. Since RNA-DNA mispairing at off-target sites bears mitigated energetics, hp-sgRNAs can significantly increase sgRNA-Cas9 specificity. In another example, Ryan et al. chemically modified the ribose-phosphate backbone at selected sites of the 20-base guide RNA spacer and found that the modified sgRNAs can reduce off-target effects (Ryan et al. 2018). The increased specificity is likely contributed by the lowered stability between guide RNA and mismatched off-target DNA sequences.

The search continues. It seems that no single strategy discussed in this review can win the battle by itself, although high-fidelity Cas9 mutants might have been the most promising. It is becoming clear in our point of view that a combination of these strategies, for instance, a combination of computational modeling backed by repeated validation, and high-fidelity Cas9s, may soon remove the uncertainty incurred by off-target genome editing.