Introduction

Plant breeders have traditionally relied on selection of suitable genotypes from populations, or after wide crosses to leverage the naturally occurring variability in the genome. Mutagenesis agents such as radiation or chemical mutagens have routinely been used to increase genetic variability. Genome editing tools (TALENs, CRISPR/Cas) allow enhanced precision and efficiency in creating genetic diversity and is one of the tools increasingly being applied in plant sciences research and to improve plant varieties (Bailey-Serres et al. 2019; Varshney et al. 2020). Such technology has also been termed gene editing. However, edits can be made in genes, regulatory sequences, untranslated regions, or intergenic regions. Therefore, the term genome editing will be used throughout this article.

The need for detection tools begins with optimization of the genome editing process itself and continues with characterization of tissue cultures and/or regenerated plants. Detection tools are also used throughout the breeding process, as well as for seed production and the subsequent launch of products. The analytical techniques employed to achieve these aims vary between the different stages, but the methods often rely on the in-house knowledge of the sequences and genomes. When genome-edited crops are deployed as commercial products, then there will be a need to track valuable crops and products, to mitigate trade risks, and in some cases to achieve approval if the product is regulated.

Detection and identification of genome edits employ a wide range of analytical approaches throughout the development process. These include phenotypic characterization, PCR, digital PCR, and sequencing methods. Optimization of genome editing processes requires a range of analytical approaches as detection in tissue cultures can be challenging when a mixture of edits may be present in an unedited germplasm background, or in early phase plant populations. Applications of detection tools to single plants at early stages of the genome editing process may not be as challenging and presence/absence or zygosity assays (via PCR or digital PCR) are sufficient for this purpose. Once breeding populations are developed and seed is being readied for commercialization, then analytical sensitivity again becomes more critical. Identification of a low level of genome-edited product with a small sequence difference in a bulk seed or grain samples either for testing purity of edited populations or to ensure they are absent in non-edited materials can be challenging in many cases.

This article will cover the detection tools used in optimization of genome editing and the early stages of research and in breeding. We will also describe the general issues surrounding the detection of genome edits present at low levels in large seed, plant, or grain populations. The detection of genome edits in many cases also implies characterization of the modification.

What are genome edits

Genome editing is a term commonly used to describe site-directed mutagenesis techniques that allow the introduction of targeted changes in specific DNA sequences at a defined location in the genome of an organism (Zhan et al. 2021). Details of the process are described in the next section. The result of such an operation subsequently leads to “genome-edited plants” as described in other articles in this issue. Oligonucleotide-directed mutagenesis (ODM) and site-directed nucleases (SDNs) allow enhanced precision and efficiency of edits to be applied. Early gene editing applications involved the use of meganucleases (Epinat et al. 2003), zinc finger nucleases (ZFNs) (Porteus and Baltimore 2003), and transcription activator-like effector nucleases (TALENs) ((Bogdanove and Voytas 2011; Chen and Lin 2013). More recently, the advent of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR/Cas) SDN editing has enabled an increase in the application of genome editing tools to plant modification and breeding (Chen et al. 2019).

Introduction of genome edits

Genome editing is achieved by the introduction of a single-stranded nick or DNA double-strand break (DSB) at a chosen specific chromosomal location using different types of SDNs. SDNs of the CRISPR/Cas class use only a short protospacer sequence within their CRISPR RNA (crRNA) to identify the target sequence, while for Meganucleases ZFNs and TALENs, a new enzyme complex has to be generated for each target. DNA double-strand breaks induced by the SDN can be repaired via 2 pathways, either non-homologous end joining (NHEJ) or homologous recombination (HR). Higher plants predominantly use the error-prone NHEJ pathway for DNA repair. Depending on the desired edit, nicks and breaks can be introduced at a single site or at multiple sites. The repair mechanisms for SDN1 are based on NHEJ, which due to the lack of fidelity can result in randomly introduced or deleted nucleotide(s). Conversely, repair mechanisms for SDN2 and SDN3 are based on HR between the genome target sequence and an exogenously delivered donor DNA engineered with intended edits. Genome-edited plants are usually obtained by delivering SDNs, and a donor DNA in the case of homologous recombination, into cells or explants in culture. Plants are regenerated from these cultures by standard approaches. Currently, Agrobacterium tumefaciens and particle bombardment remain the main methods of choice for delivery of nuclease reagents into plant cells or explants. A selectable marker can be used to select for plant cells or tissues with integrated constructs. These constructs are intended to induce the desired edits and can then be removed from the recovered genome-edited plants by segregation in the progeny. CRISPR nucleases or other effectors can also be transiently expressed in plant cells or CRISPR reagents can be delivered in a DNA-free manner using ribonucleoproteins (RNPs) (Hamada et al. 2018).

Protoplast systems have seen a recent revival for the introduction of CRISPR/Cas components as DNA and RNPs (Woo et al. 2015; Liang et al. 2017; Lin et al. 2018). Nevertheless, regeneration of plants from protoplasts remains a major bottleneck for many important crop species. Plant protoplasts also provide a robust platform for rapid validation of genome editing reagents via transient expression and for screening for optimal editing performance. An advantage is that the whole procedure from isolation of protoplasts until assessment of the activity of the nuclease reagents can be done rapidly.

When CRISPR or other effector systems need to be expressed in plants, efficient promoters and plant codon optimized versions of these nuclease genes are generally used (Zhan et al. 2021). RNA polymerase (Pol) II–dependent promoters with strong expression in reproductive cells or corresponding ancestor cells are good candidate promoters to express the nuclease genes. The non-coding single-guide RNAs (sgRNAs) are more suited for transcription by Pol III–dependent promoters, such as U6 and U3 promoters. Multiplex editing is possible by the delivery of a set of sgRNAs typically assembled in tandem in plant expression vectors using either identical or different Pol III–dependent promoters (Zhang et al. 2015a). Another possibility is to assemble the multiple sgRNAs into a single transcription unit under the control of a single Pol II promoter. In this case, the primary transcript must be processed to generate multiple mature sgRNAs by making use of self-cleaving RNAs or cleavable RNA molecules such as csy4, ribozyme, and tRNA sequences (Xie et al. 2015; Čermák et al. 2017).

To make genome editing broadly applicable, effective DNA delivery methods and efficient cell and tissue culture procedures allowing the manipulation and regeneration of large numbers of cells need to be available. For many crop species, these have still to be further developed or improved (Altpeter et al. 2016) as referenced by other articles in this Journal issue). Developmental genes such as Baby Boom, Wuschel, and Grf5 or a chimeric Grf5 protein have been shown to increase regeneration and transformation in a variety of plants, tissue types, and genotypes previously identified as recalcitrant for transformation and regeneration (Lowe et al. 2016; Gordon-Kamm et al. 2019; Debernardi et al. 2020; Kong et al. 2020). Tissue culture-free procedures for genome editing have also been described. Maher et al. (2020) describes a non-tissue culture method for the generation of genome edited plants through de novo induction of meristems using delivery of developmental regulators and gene editing reagents in somatic cells of whole plants. Other reports on “in planta” genome editing methods are based on infecting plants with an engineered virus expressing the genome editing reagents (Ma et al. 2020) or via bombardment of wheat embryos (Liu et al. 2021). Further advances in the field of tissue culture and plant regeneration, as well as tissue-culture-independent genome editing technologies combined with an increased knowledge on which gene(s) to edit in order to improve a particular trait, will gradually broaden the applications of genome editing for crop breeding.

Types of genome-editing outcomes

There are several different outcomes that can be generated using genome editing tools and have been commonly referred to as SDN-1, SDN-2, and SDN-3 edits as described in Table 1. These classifications of genome editing are based on whether the genome editing approach makes use of a repair template and on the extent to which the edited sequences differ from their predecessor sequences and from genome variations found in populations. The types of DNA alterations produced include insertions (both of cis and exogenous DNA), deletions, combinations of the two (indels), and edits of a few or single base pairs (Table 1).

Table 1 Genome edits classified according to the edit type and mechanism of repair.

Selection and design of appropriate detection tools depends on the analytical purpose and type and size of the genome edit, the frequency of the edit in the population, a priori knowledge of the endogenous sequence, and the nature of the surrounding sequence.

  1. 1)

    Single or few base changes

    Single (or few base) changes in sequence may be relatively easy or more difficult to identify depending on the surrounding sequences, especially when using traditional approaches such as PCR. There are several PCR-based methods that use a combination of cleavage enzymes or DNA mismatches. Single-nucleotide polymorphism (SNP) detection using allele specific PCR has been extensively utilized and optimized within genomics. In addition, single base detection with TaqMan®Footnote 1 or KASP™Footnote 2 (Kompetitive Allele Specific PCR) has become commonplace. Use of TaqMan for this purpose requires specialty primers and/or probes that incorporate enhancements such as minor groove binders (MGB; Kutyavin et al. 2000; Davalieva et al. 2014) or locked nucleic acids (LNA; Johnson et al. 2004; Karkare and Bhatnagar 2006; Maertens et al. 2006) which increase the melting temperature of short sequence probes, allows for increased specificity, and correspondingly lowers the background signal. Detection of small changes is further discussed in the context of each detection application.

  2. 2)

    Deletions

    Detection of a DNA deletion, for example in Waxy corn (Qi et al. 2020), depends upon the size of the deletion and a priori knowledge of the endogenous sequence prior to editing. Small deletions may occur during the SDN-1 repair process. In this case, primer design will need to be adapted to the specific change in sequence at the site of the edit. Large deletions may lead to the proximity of sequences that are not normally co-located. As with insertions, the alterations provide a target for PCR primer and probe design to differentiate the edit from the original sequence, especially if they can be positioned at or around the site of the deletion. Alternatively, PCR primers and probes designed to identify DNA sequences intended for deletion can also be used as a diagnostic for absence of the intended deletion indicating an unsuccessful attempt at editing, or to differentiate seed or grain not containing the edit within an edited population.

  3. 3)

    Small insertions

    Small insertions (SNPs, amino acid substitutions, etc.) are typically generated via homology-directed repair using DNA donor oligos. With maximum insert lengths for use with a DNA donor oligo restricted to approximately 50 nucleotides, these changes may be relatively easy or more difficult to identify depending on the surrounding sequences, especially when using approaches such as PCR. Whether the region is AT or GC rich or repetitive can affect the ability to find good locations for primers and/or probes, as can the precise nature of the change. There is insufficient information available at present to make predictions as to whether each specific sequence change will be difficult to detect.

  4. 4)

    Larger insertions

    Sites containing an insertion of larger sizes are relatively easy to detect using standard techniques such as PCR. Where the insertion is large enough to allow the positioning of primer and/or a probe on the inserted sequence, the procedure is the same as with transgenic events. There is not yet enough research done to determine the minimum size of the insertion that is required to develop a robust assay. However, the robustness and sensitivity of the assay may be lower when a small insertion is present than when a large insertion is present.

    Should the insertion be cis-DNA (i.e., from the same species), the detection challenge is not increased as the novel juxtaposition of two sequences is the critical factor for detection, not necessarily the origin of the sequences.

A significant caveat concerning detection of DNA changes is that these modifications can also occur spontaneously in populations. De Maagd et al. (2020) stated that “many types of structural changes can occur spontaneously during cultivation and breeding, especially during conventional mutagenesis” and that most structural variations that arise by natural genome evolution are unnoticed. There are many similar observations, both without intervention (Gorbunova and Levy 1997) and as described in radiation induced mutation (Jo and Kim 2019), and hybridization (Bashir et al. 2018). In some plants such as maize, changes in the genome can occur at high frequency and in locations that vary extensively between populations due to the presence of transposons and other active genome altering mechanisms (Bennetzen 2000).

Tools for detection of genome edits

Detection tools are an important consideration when developing genome-edited plants. To obtain the most efficient system, it is important that the success or lack thereof of a genome-editing procedure is determined as quickly as possible. A particular genome editing system can be initially tested and optimized using traits successfully described in the literature and/or having an obvious phenotype. The proof of the desired outcome can therefore be repeated and visualized. However, confirmation of the intended edit will need to be established at the nucleotide level and if a non-selection approach is used, it will be important to identify the cells or plants carrying the edits of interest within the population.

Once an edited plant is produced and selected, detection tools can be used to confirm that the plant has the desired edit, evaluate any off-target changes, and track the intended genetic change throughout the entire breeding process. Proof of the effectiveness of the edit in producing the desired phenotype is monitored through the next phases of product development.

PCR

In the past, PCR methods were traditionally carried out as end-point reactions—where the desired result is a simple presence/absence with visualization of the results routinely performed via gel electrophoresis (Lipp et al. 2005). However, such methods have been superseded for many purposes (Alarcon et al. 2019) and are limited in the sensitivity and types of genome edits that can be detected (Zischewski et al. 2017). We will therefore concentrate on PCR approaches that are the most applicable to this goal.

PCR is an appropriate choice for low cost and ease of use with a wide range of insertion and deletion sizes. It can be applied either as an end-point fluorescence reaction (e.g., T7E1, Surveyor, RFLP, CAPS) or as a quantitative approach, where the amplification curve is monitored. These offer the ability to illuminate guide RNA efficacy by distinguishing mutant from wild-type cells (Zischewski et al. 2017; Lomov et al. 2019). Capillary PCR fragment size analysis can also be used to detect differences as small as a few base pairs. Depending upon the size of the edit, PCR may be followed by polyacrylamide or agarose gel-based visualization (a few base pairs up to kilobases). The band intensities can provide information regarding the ratio of mutated to unmutated cells (Lomov et al. 2019).

Development of PCR methods requires a priori knowledge of the nucleotide sequence around the area to be amplified. The sequence in the immediate edited region is known to the developers of the edit, as it is required for design of guide RNAs used in the editing process. However, this may not be publicly available, and/or extensive sequence information, as might be associated with a large insertion, may not be readily available. Sanger sequencing of an amplicon coupled with primer walking can be used to elucidate uncharacterized regions adjacent to indels. Regardless of edit type, sequencing should be applied for verification of an edit. Once the intended edit is established in a population, the shift to high-throughput and more sensitive PCR methods can be made.

Both real-time qPCR (e.g., TaqMan) and endpoint PCR (e.g., KASP) are widely applied to the detection of SNPs, indels, and insertion sites for numerous mutagenesis and other biotechnology approaches. A product of genome editing is essentially no different. The development and validation of such assays has become highly streamlined with the advent of design algorithms and proprietary chemistries. The use of such methods for SNP genotyping involves a competitive PCR. The native and edited alleles are targeted in the same reaction with two separate primers or probes and an associated fluorophore. PCR efficiency must be equivalent for each allele to avoid amplification bias. These techniques are considered qualitative or semi-quantitative and typically applied to samples taken from individual plants or single seeds. Although it is possible to detect specific single-nucleotide polymorphisms (SNPs) in breeding populations using PCR-based methods or targeted sequencing, it is usually carried out on a single plant/seed basis and with the use of multiple markers. The SNPs used in breeding for example in varietal identification are chosen due to the ease of assay design and validation. In the case of introduced small changes, the SNP and resultant assay cannot be freely chosen—it is the change that led to the characteristic expressed in the desired end product, and thus the design of the PCR-based detection method is constrained to this specific sequence.

Real-time PCR may be applied in a quantitative manner. A large insertion (or deletion) can be easily and sensitively quantified as is the practice for genetically modified organisms. PCR is also used to enumerate SNP allele frequencies (Germer et al. 2000). In certain polyploid crops such as canola and wheat, there may be multiple sub-genomes each containing a copy of the target. More than one series of PCR reactions can be applied in a nested fashion to increase specificity. The first series is dedicated to amplification of the relevant genome/s or genic region of interest. Subsequently, the product from the first round is used as the template to amplify smaller and more specific region of interest in a secondary PCR. This nested PCR approach is however more prone to false positive results (Wanger et al. 2017). If the assay is robust and reliable enough, a small edit can be quantified (as % edited content) at ratios of 1 in 1000 or even 1 in 3000. Advanced techniques and incorporation of additional chemical modifications of primers and probes (e.g., peptide nucleic acids) can afford greater specificity and thereby enhance sensitivity (Zhou et al. 2018b). It is critical to use an appropriate reference and control material (e.g., known origin and sequence composition) for comparison. Relative quantification can be deduced with the use of a standard curve from previously characterized edited reference samples.

Digital PCR

Like real-time PCR, digital PCR (dPCR) technology utilizes polymerase, primers, and Taqman probes within a standard end-point PCR reaction to amplify specific targets. dPCR works, however, by partitioning nucleic acid samples into thousands of single parallel PCR reactions, each separated into small volume compartments using droplets or chambers. PCR amplification occurs simultaneously in each partition. At the end of the run, each droplet or chamber is individually assessed for the presence (positive) or absence (negative) of a specific fluorescent signal. Using Poisson statistical analysis, the ratio of positive to negative partitions yields absolute quantification of the initial number of copies of the target sequence. As such, dPCR allows for absolute quantification of a target without a reference or the need for running standard curves.

Other benefits of the dPCR technology include the high level of precision and sensitivity. By removing the amplification efficiency reliance of qPCR, error rates are strongly reduced, enabling reliable and accurate measurement of small target concentration differences among samples. The massive sample partitioning also results in an increased signal-to-noise ratio since high-copy templates and PCR inhibitors are strongly diluted, effectively enriching template concentration in target-positive partitions. Moreover, dPCR assays can be readily adapted to any target of interest and, given that reaction volumes are in the pico- to nanoliter ranges, can obtain absolute quantifications from very small amounts of DNA. Together, these features make dPCR ideally suited for the rapid and systematic quantification of genome editing outcomes at endogenous loci. A well-designed dPCR can be used to predict the number of targeted loci at the single plant level.

Over the past few years, dPCR has been used to reliably measure gene editing frequencies across a wide range of organisms (Findley et al. 2016; Miyaoka et al. 2016; Mock et al. 2016; Falabella et al. 2017). While dPCR applications in genome-edited plant cells are still limited, the methodology clearly provides novel opportunities to quantitate small genome edits or low copy number targets in bulk populations of edited cells. Recently, Jouanin et al. (2020) used dPCR mutation assays to detect indels (1 to 50 bp) and large deletions ( >300 bp) in wheat and concluded that dPCR is suitable for high-throughput screening of copy number variation and gene editing–induced mutations in large gene families. Consonant with this are the findings of Penget al. (2020) who developed a duplexed dPCR-based method for the detection and evaluation of gene editing frequencies in rice and allotetraploid rapeseed. The authors not only showed that their dPCR-based method is sensitive to different kinds of gene editing mutations but also demonstrated its applicability to polyploid plants and processed food samples containing low initial concentrations of DNA. Moreover, compared with qPCR and NGS-based methods, the duplexed dPCR assay can yield a lower limit of detection (LOD) and was able to decipher homozygous from heterozygous mutations with superior levels of precision and sensitivity. Owing to its ease of use, reduced complexity, repeatability, and superior precision, dPCR will continue to increase in popularity (Miyaoka et al. 2016) and contribute to rapid and quantitative plant genome editing workflows.

Sequencing

Rapid evolution in next (short read)- and third (long read)-generation sequencing technologies has led to sequencing emerging as a powerful tool for the detection of DNA changes. The latest advancements in methods for target enrichment, library preparation, and tools for bioinformatics analysis resulted in increased accuracy of variant detection, higher throughput, and faster turnaround times (Salk et al. 2018).

Applications of next-generation sequencing to detect sequence variants cover whole-genome sequencing, whole-exome sequencing, and targeted sequencing. Targeted sequencing focuses on the set of genes or targets of interest allowing higher read coverage with reduced cost and dataset size. Furthermore, a larger number of plants can be pooled in one sequencing run without affecting coverage depth.

Targeted sequencing consists of two major approaches, either PCR-based target amplification (such as amplicon-based sequencing) or target capture using biotinylated hybridization probes (such as hybrid capture sequencing; Bewicke-Copley et al. 2019). Amplicon-based sequencing is the most efficient of the two methods, with usually smaller target regions and higher percentage of on-target reads, while hybrid capture sequencing results in more uniform coverage of typically larger target regions. Both are powerful approaches for accurate detection of genome edits. Multiple target regions can be assessed across many samples in parallel.

Generally, using a sequencing approach for detection of genome edits have higher cost compared to PCR-based approaches. Therefore, a targeted sequencing approach is typically used when detection via PCR cannot be developed due to technical limitations and/or in cases of high throughput analysis. During early stages of development, targeted sequencing can be applied to characterize tissue cultures or plants for which the outcome of the editing is unknown or needs to be confirmed. Traditional amplicon-based sequencing using two locus-specific primers is suitable for the characterization of single base pair variants or smaller indels. However, its application in the case of larger indels and translocations may be restricted. Newer single primer technologies based on ligation-mediated PCR have alleviated those limitations (Zheng et al. 2014). Since amplicons up to approx. 600 bp can be targeted for amplicon-based sequencing using short read technology, and even larger target regions in the case of hybrid capture, target enrichment approaches can be applied when primer design for PCR or dPCR methods is complicated. Reasons for complicated primer design with respect to PCR or dPCR are related to the upper limit of the amplicon size and the nature of the sequences flanking the edit. Unfavorable G-C content, repetitive sequence, or presence of polymorphisms not related to the trait in certain haplotypes may complicate the development of PCR-based detection methods. For genome edits targeted in highly repetitive regions (edit in conserved gene family, in polyploid species), long fragments may be captured and sequenced using long-read sequencing platforms to allow the level of specificity required.

Given the high read coverage across the target regions, targeted deep sequencing is more sensitive for detecting and quantifying low-frequency variants in heterogeneous samples in contrast to Sanger sequencing. The read coverage required is defined by the detection limit to be achieved. However, the lower limit of detection with respect to the sequencing technology is defined by the error rate, of which the absolute number increases along with the higher read depth. Errors accumulate throughout the different steps of the targeted sequencing protocols including damage during DNA sample preparation and introduction of errors during PCR amplification and sequencing. To differentiate low-frequency genome edits from technical artifacts, caution is needed when the variant allele frequency is near or below the limit of detection related to the targeted sequencing protocols, especially when sequence data is generated from low-quality DNA. The sensitivity of routine NGS approaches for the detection of low frequency variants is estimated to be approx. 1% (Salk et al. 2018).

Recent developments to increase the accuracy of next-generation sequencing protocols include computational and statistical measures to reduce the background error rate, adjustments to library preparation protocols to maintain the intactness of the DNA templates, and most effectively, single-molecule consensus sequencing. Unique molecular identifiers are attached to the target DNA fragments and remain attached throughout enrichment and sequencing. PCR duplicate reads are removed with higher accuracy which eliminates biases from variable PCR amplification and results in more precise quantification of DNA templates. In addition, sequencing errors are corrected based on the majority vote within a pool of reads originating from the same DNA fragment (Jabara et al. 2011; Kinde et al. 2011; Hong and Gresham 2017; Xu et al. 2017). Using duplex unique molecular identifiers, true variants present on both strands of the DNA fragment can be further differentiated from false positive variants only present on one of the two strands (Schmitt et al. 2012; Kennedy et al. 2014). As the result of accurate removal of PCR duplicates and precise error correction, target-enriched sequencing using duplex unique molecular identifiers is suitable for confident detection of small sequence edits with a frequency of approximately 0.1 to 0.2% in genetically heterogeneous samples (Peng et al. 2019). Base pair errors which remain unresolved with the latest targeted sequencing protocols define the limit of sensitivity of sequencing-based detection methods. Unresolved errors are typically incorporated before or during the attachment of dual unique molecular identifiers. It is recognized that the level of accuracy of sequencing is affected by different factors, including the sequencing platform, chemistry version, and sequence context, as well as by experimental variation such as the degree of DNA damage.

Digital PCR is a robust technology for detection and quantification of genome edits which are typically known a priori and present at low frequency in heterogeneous samples. Since the recent implementation of single-molecule consensus sequencing, next-generation sequencing may achieve an accuracy comparable to dPCR and can be applied when the resulting genome edit is unknown. Single-molecule consensus sequencing–based methods for the quantification or detection of genome-edited material present at low frequency in heterogeneous samples such as in vitro cultures, grain, or food have not been reported. However, the approach has been applied for reliable quantification and detection of rare variants in diverse clinical fields including cancer, aging, and metagenomics (Salk et al. 2018).

Despite the effective implementation of single-molecule consensus sequencing to increase the accuracy of sequencing technology, sequencing-based detection of small genome edits with low frequency requires high read coverage. The consequence is a higher cost compared to qPCR or dPCR methods, unless high throughput analysis is required (Aloisio et al. 2016).

Isothermal DNA detection and CRISPR-Cas-mediated edit detection

Isothermal amplification of nucleic acids is beginning to provide a rapid, sensitive, and specific diagnostic to replace the more time-consuming traditional PCR amplification methods. Much like PCR, isothermal PCR uses enzymatic amplification to amplify a nucleic acid sequence with a polymerase, but isothermal nucleic acid amplification does not require variable temperature cycling. These methods are beginning to provide the sensitivity and specificity needed for detecting single nucleotide changes (Zhou et al. 2018a; Shen et al. 2020). One recent innovation is to couple the amplification power of isothermal DNA polymerases with CRISPR-Cas specificity (Kellner et al. 2019).

The CRISPR-Cas system contains programmable endonucleases. In 2017, it was reported that CRISPR-Cas editing components could also be an effective diagnostic tool for detection of specific nucleic acid changes. For example, CAS13a has a crRNA-programmed collateral activity that can detect a specific RNA target sequence. This collateral activity has been used in a Specific High sensitivity Enzymatic Reporter unLOCKing detection method named SHERLOCK (Gootenberg et al. 2017).

Several other platforms have been developed using this collateral cleavage activity. Such platforms are typically employed via an isothermal nucleic acid amplification step using Recombinase Polymerase Amplification (RPA), Loop-mediated isothermal AMPlification (LAMP), or Helicase Dependent Amplification (HAD). CRISPR-Cas13, Cas12a, and Csm6 (Gootenberg et al. 2018) as well as Cas9 (Wang et al. 2019) systems have been demonstrated this collateral activity. In the SHERLOCK assay, a pre-amplification of the target DNA or RNA is needed (Kellner et al. 2019) in addition.

Such platforms can use different indicators including a fluorescently labeled reporter, visual detection using liquid-liquid phase separation or lateral flow detection by using antigen-labeled reporters (Kellner et al. 2019). CRISPR-Cas13 cleaves RNA but CRISPR-Cas12 cleaves DNA, so application of the appropriate enzyme can be used for either RNA or DNA detection (Gao et al. 2021). Cas13 is ultra-sensitive in detecting mutations but if the target is present in very low numbers (600K molecules), it cannot be used to achieve single nucleotide detection; a pre-amplification would be necessary (Kellner et al. 2019). Cas12a from Acidaminococcus sp. BV3L6 (AsCas12a) may not produce any signals from targets at lower concentrations and therefore pre-amplification of the target may be necessary.

This limitation will apply unless a very sensitive Cas system can be found. However, it has been shown that by choosing an optimal crRNA that yields a higher fluorescence level, and/or combining multiple crRNAs in one reaction, less input target is needed (Gao et al. 2021). Gootenberg et al. (2018) has combined Csm6 with Cas13 detection to increase the sensitivity and enhance the florescent signal. This was done to eliminate the need for pre-amplification using RPA for lateral flow detection methods.

The advantage of using a very sensitive detection method would be to eliminate the need for amplifications done through PCR or detection through sophisticated fluorophore detection instruments. Therefore, adding the isothermal pre-amplification would be a drawback to the CRIPSPR-Cas detection methods and limit its portability in low-resource settings (Kellner et al. 2019). However, the isothermal amplification can be accomplished using handheld devices or smartphone-based detection platforms without a need for any complex instrumentation (Song et al. 2018; Tsaloglou et al. 2018). This makes the CRISPR-Cas-mediated detection method very attractive for detection single base mutations (SDN1) even considering the pre-amplification step while aiming for a method that can be used in a non-laboratory setting.

The CRISPR-Cas13a and 13b family have also been used to investigate the possibility of creating a multiplexed platform. Gootenberg et al. (2018) found that the activities of LwaCas13a, Cas13b from Capnocytophaga canimorsus Cc5 (CcaCas13b), LbaCas13a, and PsmCas13b can be combined independently and measured with the four dinucleotide reporters AU, UC, AC, and GA, respectively. Using these cleavage specificities, they could detect Zika virus using HEX, dengue virus using FAM. Later using multiplexed SHERLOCK with PsmCas13b and LwaCas13a, they could detect ZIKV and DENV RNA dilutions as well as allele-specific genotyping of human saliva samples in one reaction. This advancement allows for multiple target detection at scale and for cheaper cost.

CRISPR-Cas biosensing methods have the advantage of detecting single base variations (SDN1) occurring in femtomolar or even attomolar concentrations. The simplicity of this method and the instrument free methods of detection like lateral flows made this method very helpful in any field DNA- or RNA-based diagnostics. However, some limitation and challenges need to be resolved before making this method the mutation detection method of choice. Cas9 and Cas12 require protospacer adjacent motif (PAM) sequences adjacent to target dsDNA to be able to cleave it. The protospacer flanking site (PFS) in Cas13a creates a similar limitation since the first base following the protospacer should be a non-G base. Making CRISPR-Cas detection methods be quantitative is difficult unless they are combined with quantitative PCR or digital PCR methods. This makes them less attractive (Li et al. 2019). Since CRISPR-Cas systems are very new, most of these limitations may be resolved in near future.

In summary, the various tools each offer advantages and disadvantages relative to one another. Consideration of cost and the need or not for sequence information as part of the tool development are critical components. Throughput and sensitivity are tied to the stage of the edit process and therefore will have a lot of variation depending on the context. An estimate of the present situation is shown in Table 2. The cost, throughput, and sensitivity will change over time.

Table 2 Characterization of detection tools.

Specific applications of genome edit detection methods

If a population of edits are being produced as part of a research program, then each edit will need to be characterized individually at the molecular level to compare it to the desired or expected phenotype. Detection tool(s) suitable for high-throughput analyses similar to those used for varietal identification are needed for this process.

Detection tools can be used to estimate the efficiency of editing methods, to subsequently identify edits in regenerated plants and to identify off-target edits during the development and breeding process. Tools for detection of genome edits in a research context cover a wide range of platforms including PCR, dPCR, and sequencing approaches. During the early stages of development, the challenge of detection is based on the edit frequency within a given cell population.

The frequency at which edits occur includes a lot of variables, for example the edit template sequence design, the genic location of the target, the delivery technique, and the biology of the organism of interest. Once the edit is recovered at the whole plant level, these applications only require the detection of the characteristic sequence difference in a single organism, so sensitivity is not a challenge. Detailed characterization of the intended change may also be important for and/or used in the development of intellectual property rights around the improved variety. Depending on the stage of product development, and type of edit, different technologies might be the appropriate detection tool (Table 3).

Table 3 Detection tools applicable to the genome editing process.

Optimization of genome editing methods

The detection tools used to evaluate the performance of genome editing methods should be able to allow high-throughput assessment of editing efficiencies in large numbers of samples generated (e.g., from protoplasts, in vitro cultures). In the initial phase of method development, they also should be able to detect very low levels ( <1%) of editing.

Large-scale screening of genome-edited events is essential to target the right edit before further regeneration or evaluation of the in vivo activity of the genome editing reagents (Nadakuduti et al. 2019). Targeted sequencing can detect SDN1 as well as SDN2 and SDN3 edits. The primary consideration for using targeted sequencing for detection of rare edits in tissue culture is the depth of sequencing to cover the necessary statistical power of detection of low frequency edits (Délye et al. 2015). Increasing the sequencing depth will increase the statistical power of detection of rare mutations, but also increases the cost of sequencing (Liu et al. 2014). Therefore, this approach may not be the best option when the sequence of the edits is known, in which case PCR would be the best tool for detecting rare event detection.

Detection of rare edits with large modification types (e.g., inserts or re-arrangements) can be done via PCR-based methods, with appropriate targeting to generate a PCR product only when the desired edit is present. In case of a sequence insertion by HDR using a repair DNA, this can be achieved using one primer located outside of the edited region and one primer within the insert. In case of low levels of editing, nested PCR is often needed to detect the edits. Special care should be taken to minimize the formation of PCR artifacts, especially recombinant products resulting from template switching, by limiting the number of PCR cycles, lowering the amount of input DNA, and using an appropriate polymerase. This is especially important when working in crops with complex genomes that typically have several homologous copies of the target gene or when extremely high levels of (viral) repair DNA are being used.

For detection of smaller modifications (small indels, few base changes), dPCR is a convenient method that provides excellent specificity and sensitivity. Once a good dPCR assay is validated, it typically will allow for the detection and quantification of small changes below the 1% level.

Even when taking care to minimize PCR-based artifacts, final proof that the desired editing has happened requires sequencing of the edited allele. Amplicon-based sequencing may be method of choice where the exact edit is unknown or not well defined. This may be the case where the edits are imprecise, such as insertions or deletions based on NHEJ. PCR can offer a fast and efficient initial screen to identify edited lines. Then amplicon-based sequencing is used to characterize the edit sequence.

Selection of edited products

Depending on the transformation and tissue culture system used for genome editing, selection of edited products can happen at various stages, such as in vitro cultures or tissue explants. Analytical throughput needed to detect a sufficient number of edits depends on the tissue culture system, the stage of testing, the efficiency of the editing, and whether the edit can be selected directly or selection for a co-introduced selectable marker gene is applied. Genome edits may be introduced in systems where the edit results in ability to select the results directly. An example would be where herbicide tolerance is being introduced. In these cases, the functional edited cells and resulting plants can be identified relatively easily as the cells and/or plants will be resistant to herbicide and can be selected based on phenotype.

Genome editing is often desired in cases where the phenotype is not directly selectable for example in tissue culture and can only be identified by the genotype. In these cases, a population of cells or large number of regenerated plants must be screened. Such situations mean that a more powerful and efficient approach is necessary and many of the same detection tools as used for optimization can be applied.

In cases where no direct selection via a co-introduced selectable marker gene is applied and/or with very low editing efficiencies, combining detection techniques could be applied to pre-screen large numbers of tissues or explants to identify rare edits (using PCR) or pools of plants could be analyzed using dPCR to identify those pools that contain the highest editing levels.

With higher editing efficiencies, screening of individual plants for the desired edits by PCR is a viable option. When creating specific edits, this can be done via an edit-specific PCR assay. When creating indels, individual plants having the desired changes can be identified by characterizing PCR products covering the target site for changes in fragment length, susceptibility to a restriction enzyme that has a recognition site at the target, by loss of susceptibility to cleavage in vitro by a CRISPR RNP that has the same crRNA as used for the editing. SDNs, such as Cas12a, that typically create larger deletions have the advantage that a larger fraction of the indel alleles can be detected based on fragment length.

Final proof that selected plants are edited requires targeted sequencing of the edited allele. This can be done either via cloning and Sanger sequencing, or by NGS sequencing of PCR products covering the edited allele. The level of edits detected by sequencing also gives an indication on whether the edit is chimeric or present in most of the plant. Inheritance of the edited allele to the next generation provides confirmation of the edit. Additional edits may be generated if functional SDN and gRNA are still present, and plants should be analyzed in these cases for the presence of newly formed edits.

Breeding

Although application of molecular analysis tools has increased the speed of breeding, it is still reliant upon variation that is generated via spontaneous or induced mutations (Lyzenga et al. 2021). Genome editing can be applied to breeding to target and produce desirable alleles. Once these changes have been introduced and regenerated into plants, there is a need for detection methods to follow the desired change through the breeding process to generate commercial varieties. A genome edit can be treated in the breeding process in much the same way as any small or large change or SNP with the caveat that it may be more or less easy to detect than a conventional SNP as explained in the “Tools for detection of genome edits” section. Such edits can usually be routinely followed using PCR analysis of large numbers of single plants, and sequencing may be used at critical control points to confirm the identity and integrity of the edit.

Detection of gene edits in bulk seed and grain

Some jurisdictions (Dederer and Hamburger 2020) are requiring that gene-edited products be subject to a regulatory process that includes molecular characterization of edits and a way to positively identify the edits. Such detection methods typically require that one seed or grain be detectable in between 1000 and 3000 seeds. Achieving this level of sensitivity for a small sequence difference in a bulk seed or grain sample is difficult based on currently available tools. There is anecdotal information that suggests that single base changes can be detected in bulk seed or grain at single percent or tenth percent levels; there is however no published data available to date.

PCR methods to differentiate single base pair or small edits from the background sequence are highly dependent on the surrounding DNA sequence, and the specific base mismatch concerned. Therefore, no general statement can be made as to whether such methods could be developed for specific use cases (Herrera et al. 2021 in press).

Whereas marker-assisted selection routinely employs SNP analysis for use in breeding, the selected polymorphisms are typically those easiest to assay in single plants or seeds. Where the intent is to identify specific single base pair edits, there is no freedom to select the best SNP markers, which limits assay flexibility. While that is not a big challenge to using PCR detection in breeding, detection of a nucleotide single (or small) base pair change in thousands of similar copies (such as when analyzing bulk samples of 1000 to 3000 seeds) can be a challenge.

When using PCR, the reaction rate for differentiation based on mismatch at the 3′ end of the primer will be dependent on the specific base change (Rejali et al. 2018). Various tactics can be employed to improve differentiation of the target sequence difference. These include use of PNA clamps or LNA technology (Karkare and Bhatnagar 2006), chemistries such as BHQ®-PlusFootnote 3 or Taqman™ MGB probes, and altered cycling conditions such as higher annealing temperatures (Rejali et al. 2018).

An alternative approach can be amplicon-based sequencing. This approach requires a priori knowledge of the sequence around the area of the edit. Primers can be designed to amplify the edited region, and the resulting products sequenced. The amount of the edited product in the bulk can be estimated from the ratio of the sequences present.

Operational factors

Detection of off-target edits and editing components

Commercial crop production relies on several generations of backcrossing and other traditional breeding steps to attain the final varieties yielding the favorable contributions of desirable phenotypes from both parents. The introduction of gene editing is a new technique for plant breeding with the benefit that it can be targeted to specific gene(s). There has however been a great deal of discussion of the possibility of off-target editing in plants.

The specificity of CRISPR systems seems to be of minimal concern in plants as very few off-target mutations are detected (Tang et al. 2018; Young et al. 2019; Graham et al. 2020) and potential off-target mutations can be removed through backcrossing in plants. Any potential off-target sites can be largely avoided by designing gRNAs with high specificity.

Off-target editing, or RNA-guided endonuclease (RGEN)–induced mutations, can occur at sites with sequence similarity to on-target sequence. The nuclease that causes the double-stranded break at a specific target can potentially cause unintended double-stranded breaks at locations with significant homology in the genome. Off-target edits have been observed in clinical and therapeutic applications (Zhang et al. 2015b) sometimes resulting in gene function disruptions or genomic instability. These issues can create some concern for utilization of gene editing in clinical therapies as unintended mutations can pose a health risk for unregulated somatic cell proliferation. Plants, however, seem to be very resilient to somatic cell mutations (Graham et al. 2020). Off-target edits may be observed phenotypically in some cases as “off-types” or can be detected by molecular techniques and removed during the breeding process.

While gene editing processes are designed to minimize off-target changes, off-target mutations may be attributed to five factors (Modrzejewski et al. 2020): (1) number of mismatches, (2) position of mismatches, (3) G-C content of the targeted sequence, (4) altered nuclease variants, and (5) delivery method(s). Today, most major crops have comprehensive reference genomes available. Design of gene editing components and selection of unique target sites using the reference genome sequence and bioinformatic algorithms prior to the application of gene editing can help to minimize the possibility of the occurrence of off-target events (Young et al. 2019).

Detection of off-target edits in plants can be challenging because the number and position of off-target mutations cannot be fully predicted. Bioinformatic tools can be used to predict where edits may occur. Detection methods that can be used, provided that the specific sites are known, vary in cost, complexity, equipment, and limitations, but are the same as for on-target editing. Examples are sequencing, PCR, loss of primer binding site, mismatch cleavage assay, high-resolution melt curve analysis, and modified sequencing approaches (Zischewski et al. 2017; Blondal et al. 2021). Approaches for identification of off-target edits continue to develop.

It has been asserted that the use of gene editing techniques may result in a genomic disruption (so-called scars) at the target cut site (Elison and Acar 2018). When relevant, such products can generally be avoided through the use of non-integrating editing components. For example, the use of a 2-step CRISPR approach avoids unintended changes by temporally separating the guide RNA from the desired product in a stepwise approach, preventing the final edit to be targeted by Cas9. In cases where the DNA coding for gene editing components has been introduced either transiently, or as an integrated gene construct, the lack of such sequences in the product must be identified. If the genes coding for the genome editing components, e.g., the site-directed nucleases, are stably integrated into the genome of the recipient, the initially regenerated plant will contain this foreign DNA. As the sequence of the components is known, PCR can be used for detection. If transient expression is used, e.g., through the use of TALEN proteins (Grohmann et al. 2019), no external DNA is used or expected to be in the plant, and in addition, PCR can be used to verify absence.

While off-target events are considered unacceptable in clinical therapeutic applications of genome editing (Lee et al. 2016), off-target events in the context of plant breeding are comparable in nature but less numerous than the variation that is introduced by conventional or mutation breeding. In addition, commercial varieties undergo continuous selection for the best phenotypes during breeding and commercial variety development. This selection process also applies to variety development that includes gene edits and can be used to remove any off-types.

Differentiation of genome edits from conventional mutations

While single base and other small changes can be made intentionally, such small changes are also constantly occurring in plant populations (Grohmann et al. 2019) and almost any non-deleterious mutation may be found in a commercial crop field (Sainsbury 2021). This is important from a regulatory point of view because in countries where GMO regulations are applied to gene-edited plants, the same changes selected as variants in plant breeding populations or induced by mutagenesis may not generally be regulated.

A challenge of applying detection methods to these products is that no currently available analytical tool(s) are able to determine whether a mutation occurred spontaneously in the population, or due to application of radiation or mutagens. Thus, there will always be a question as to the background level of “detection” of such changes that are not induced by laboratory methods. Grohmann et al. (2019) suggest that the minimum length of a random sequence for uniqueness in plants is between 14 and 17 base pairs, depending on the genome size. As the DNA sequence may be the same whether the changes are directed or undirected, the safety of such a product will be the same.

Conclusions

Many different types of detection tools are used in the field of genome editing. Most applications focus is on detection of DNA changes, specifically PCR methods and sequencing. The preferred detection tool depends on the specific context in which it will be used—whether to optimize the process, sort through products of genome editing, or following the edited products through breeding and product development and onward to the farmer and food, feed, and fiber. Detection tools will continue to evolve as this technology develops.