A large-scale genome and transcriptome sequencing analysis reveals the mutation landscapes induced by high-activity adenine base editors in plants

Feb 10, 2022Genome biology

Genome and gene activity sequencing reveal mutations caused by highly active adenine base editors in plants

AI simplified

CRISPR Gene Editing on OpenScience ↗PubMed ↗DOI ↗OA ↗

Abstract

High expression of (ABEs) in rice is associated with increased off-target A-to-G RNA mutations.

ABEs engineered with TadA9 result in a higher number of off-target A-to-G (SNVs) compared to other variants.
The use of CRISPR/SpCas9n-NG with ABEs leads to a greater total number of off-target SNVs in the rice genome.
On-target mutations may occur before or after T-DNA integration into plant genomes, with more off-target A>G SNVs appearing post-integration.
Off-target A>G RNA mutations are more prevalent in plants with high ABE expression, while low expression does not show these mutations.
Off-target A>G RNA mutations tend to cluster, whereas off-target A>G DNA mutations occur less frequently in clusters.

AI simplified

BACKGROUND: The high-activity (ABEs), engineered with the recently-developed tRNA adenosine deaminases (TadA8e and TadA9), show robust base editing activity but raise concerns about off-target effects.

RESULTS: In this study, we perform a comprehensive evaluation of ABE8e- and ABE9-induced DNA and RNA mutations in Oryza sativa. Whole-genome sequencing analysis of plants transformed with four ABEs, including SpCas9n-TadA8e, SpCas9n-TadA9, SpCas9n-NG-TadA8e, and SpCas9n-NG-TadA9, reveal that ABEs harboring TadA9 lead to a higher number of off-target A-to-G (A>G) (SNVs), and that those harboring CRISPR/SpCas9n-NG lead to a higher total number of off-target SNVs in the rice genome. An analysis of the T-DNAs carrying the ABEs indicates that the on-target mutations could be introduced before and/or after T-DNA integration into plant genomes, with more off-target A>G SNVs forming after the ABEs had integrated into the genome. Furthermore, we detect off-target A>G RNA mutations in plants with high expression of ABEs but not in plants with low expression of ABEs. The off-target A>G RNA mutations tend to cluster, while off-target A>G DNA mutations rarely clustered.

CONCLUSION: Our findings that Cas proteins, TadA variants, temporal expression of ABEs, and expression levels of ABEs contribute to ABE specificity in rice provide insight into the specificity of ABEs and suggest alternative ways to increase ABE specificity besides engineering TadA variants.

Key numbers

higher number and percentage of

Higher with TadA9

TadA9-based induce more than TadA8e.

one-third of plants with high expression

Off-target RNA mutations cluster

One-third of plants exhibited A>G RNA mutations due to high expression levels.

Key figures

Fig. 1

Four base editors and the experimental design for sequencing rice genomes and transcriptomes

Sets up the comparison of off-target mutation profiles across different base editors and sequencing approaches

Panel a
Gene structures of four base editors (rBE46b, rBE49b, rBE50, rBE53) showing promoters, cassettes, TadA variants, Cas9 nickase types, and terminators
Panel b
Experimental workflow with groups of rice plants for genome and transcriptome sequencing, highlighting which plants had both or only genome sequenced

Fig. 2

Genomic mutations including and in plants with different and controls

Highlights higher off-target A>G mutation rates in TadA9 and -NG base editors compared to controls

Panel a
Number of indels in plants after tissue culture (C1), infection (C2), and with four ABEs (rBE46b, rBE49b, rBE50, rBE53); no significant differences observed
Panel b
Number of SNVs in the same groups; rBE49b, rBE50, and rBE53 appear to have higher SNV counts than controls and rBE46b
Panel c
Comparison of SNVs, , and percentage of A>G SNVs between TadA8e (rBE46b, rBE50) and TadA9 (rBE49b, rBE53); TadA9 groups show higher A>G SNVs and percentages
Panel d
Comparison of SNVs, A>G SNVs, and percentage of A>G SNVs between SpCas9n (rBE46b, rBE49b) and SpCas9n-NG (rBE50, rBE53); SpCas9n-NG groups show higher SNV and A>G SNV numbers
Panel e
Ratio, number, and percentage of A>G SNVs in gene regions (gene, exon, intron, UTRs) and intergenic regions for control and four ABEs; A>G SNVs ratio is higher in gene and intergenic regions for ABEs, especially rBE53

Fig. 3

DNA mutations at sites and mutation counts in plants with different insertions

Highlights higher mutation counts and A>G mutation percentages in plants with whole rBE53 T-DNA insertions versus partial ones.

Panel a
views of read coverages at T-DNA insertion sites for lines 46bM, 49bM, and 49bAG; red rectangles highlight T-DNA insertion regions; visually, read coverage gaps correspond to insertion sites.
Panel b
Bar graphs showing numbers of unique and overlapping , , and percentages of A>G SNVs in 46bM, 49bM, and 49bAG lines; Set 1 and Set 2 represent unique SNVs in paired samples, Overlap shows shared SNVs.
Panel c
Bar graphs comparing number of SNVs, A>G SNVs, and percentage of A>G SNVs in plants with partial versus whole T-DNA insertions of rBE50 or rBE53; rBE53 whole insertion group appears to have higher values; statistical significance indicated by * (p < 0.1) and ns (p > 0.1).

Fig. 4

Off-target RNA mutations induced by in plants with different Cas9 and TadA variants

Highlights higher off-target RNA mutation ratios and expression levels in plants with TadA9-based editors versus controls.

Panels a
Number of total , number of , and percentage of A>G SNVs in plants with SpCas9 (Cas), -TadA8e (rBE46b), and SpCas9n-TadA9 (rBE49b); rBE49b plants appear to have higher counts and percentages of A>G SNVs.
Panel b
Scatterplot of A>G SNV ratios in two rBE49b lines (R49bAG_s2 and R49bAG_s3) with a Pearson correlation coefficient of 0.33 and a diagonal reference line.
Panel c
showing nucleotide conservation around edited adenines from all RNA-seq data, highlighting a strong preference for adenine at the edited position.
Panel d
Boxplot of A>G mutation ratios at RNA A>G SNV loci for plants with Cas, rBE46b, and rBE49b; rBE46b and rBE49b plants show higher A>G ratios with -log10 p-values indicating significance.
Panel e
Bar plot of average (reads per million) values of ABEs in plants without and with RNA mutations; plants with RNA mutations have visibly higher ABE RPM levels with significant difference (p < 0.001).
Panel f
Left boxplot shows A>G mutation ratios in one versus four rBE49bAG_s2 plants; middle bar plot shows -log10 Wilcoxon p-values comparing five rBE49bAG_s2 plants to Cas plants; right bar plot shows ABE RPM levels in these plants, with status indicated.

Fig. 5

RNA and DNA A>G mutation clustering patterns in plants with

Highlights higher numbers and clustering of A>G RNA mutations in plants with adenine base editors versus controls.

Panel a
view of clustered A>G RNA mutations at representative loci in transcriptomes for lines R49bAG_s2, R49bAG_s3 (with RNA mutations), and RCas_s1 (SpCas9 only); colored bars indicate nucleotide differences from reference.
Panel b
Graphs showing ratios of A>G mutations in 30-bp (5′ and 3′) centered at A>G RNA SNV loci for lines R49bAG_s2, R49bAG_s3, and RCas_s1.
Panel c
Boxplot comparing number of in flanking 5′ and 3′ 30-bp regions for RNA found in many (3–8) versus few (1–2) plants, with many plants group showing higher counts (significant at p < 0.001).
Panel d
IGV genome browser views of representative DNA SNV loci with clustered A>G SNVs in whole-genome sequencing for line 53DEP_s1, s2, and s3; colored bars indicate nucleotide differences from reference.
Panel e
Bar graph showing ratio of clustered SNVs located in compared to all SNVs.
Panel f
Boxplots comparing number of total SNVs, A>G SNVs, and percentage of A>G SNVs between plants with clustered SNVs (Group 1) and without clustered SNVs (Group 2); Group 1 shows significantly higher A>G SNVs and percentage (p < 0.01 and p < 0.001).

1 / 5

Full Text

What this is

This research investigates the mutation landscapes caused by high-activity () in rice.
It focuses on the off-target DNA and RNA mutations induced by two specific : ABE8e and ABE9.
The study employs whole-genome and transcriptome sequencing to assess the specificity and potential risks of these gene-editing tools.

Essence

High-activity , particularly those with TadA9, induce more off-target A-to-G mutations in rice than other variants. The expression level of also influences the frequency of these mutations.

Key takeaways

with TadA9 lead to a higher number of A>G () compared to those with TadA8e. This suggests that the choice of TadA variant significantly affects mutation outcomes.
Off-target A>G RNA mutations cluster in plants with high ABE expression, indicating that expression levels play a crucial role in mutation patterns.
The study reveals that on-target mutations can occur before T-DNA integration into the rice genome, which may have implications for the timing of gene editing applications.

Caveats

The study primarily focuses on rice, which may limit the generalizability of the findings to other species or crops. Different genomes may respond differently to .
The potential for off-target effects remains a concern, and further research is needed to fully understand the implications of these mutations in practical applications.

Definitions

adenine base editors (ABEs): Gene-editing tools that convert A•T base pairs to G•C base pairs without causing double-stranded DNA breaks.
single-nucleotide variants (SNVs): Alterations in a single nucleotide in the genome, which can impact gene function and traits.

AI simplified

Background

Single-nucleotide variants (SNVs), a universal feature of plant, animal, and human genomes, have been widely identified in association with agronomic traits and human diseases [1 –3]. Various clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas)-mediated base editing tools (e.g., ABEs and cytosine base editors), which efficiently produce desired point mutations in genomic DNA without causing double-stranded DNA breaks [4], have been used widely in laboratory research, crop and animal breeding, as well as human gene therapy [5 –7]. Since the mutation of G•C base pairs to A•T base pairs is the primary form of de novo mutations [8], ABEs that catalyze the conversion of A•T base pairs to G•C base pairs have great potential to correct human pathogenic point mutations [9]. However, potential DNA and RNA off-target mutations remain a serious concern and threaten to limit the application of ABEs.

The pioneer ABE7s, which are composed of a tRNA adenosine deaminase (TadA7.10) and CRISPR/Cas systems, perform remarkably clean and efficient A•T to G•C conversions in the genomes of a variety of species, including human, mouse, and rice, without inducing obvious genome-wide off-target DNA mutations [10 –14]. However, the editing efficiency of ABE7s varies in a locus-dependent manner [11, 13]. Subsequently, high-activity ABEs, such as those containing TadA8.17, TadA8.20, TadA8e, and TadA9, have been developed, engineered with various PAM-flexible Cas variants and tested in different organisms [15 –18], but a whole-genome assessment of the off-target DNA mutations induced by TadA8e and TadA9 has not yet been investigated.

The tRNA adenosine deaminase TadA, a key component of ABEs, induces site-specific inosine formation on RNAs [19]. Recently, it was reported that TadAs, ABE7s, and ABE8es induced a significantly higher number or higher mutation ratio of RNA A-to-G (A>G) SNVs when compared to Cas proteins or GFP [9, 20, 21] and that ABE8.17 and ABE8.20 induced very low levels of adenosine deamination in mRNAs if ABEs were delivered as messenger RNAs in mammalian cells [17]. Thus, several labs have developed improved TadA variants with reduced RNA activity [20, 22]. However, RNA A>G mutations induced by ABEs are complicated due to the large genomes in the heterogeneous mammalian cells as well as the conversion of adenosines into inosines mediated by endogenous adenosine deaminase RNA specific (ADAR) family. In addition, ABE-induced RNA mutations have never been reported in plant yet.

The relatively small genome (~ 0.4 Gb) of self-pollinated rice and the absence of endogenous ADAR family make rice an ideal model organism to examine the DNA and RNA specificity of gene editing tools. Here, we investigated the off-target DNA and RNA mutations induced by ABE8es and ABE9s in rice through whole-genome sequencing (WGS) and transcriptome sequencing.

Results

ABEs induced sgRNA-independent heterozygous DNA mutations

A few homozygous SNVs and indels were detected in all sequenced plants (Additional file 2: Fig. S5a, b). We counted the number of plants with the same mutation sites and found that the homozygous mutations tended to be present in more than one plant, while the heterozygous mutations tended to be present in a single plant (Additional file 2: Fig. S5c, d). These homozygous mutations could be the remaining background mutations or mutations induced by tissue culture, Agrobacterium infection, or ABEs. The induced mutations in the two alleles are two independent events following binomial distribution, so the probability of the homozygous mutations is p², the probability of being wild type (WT) is (1-p)², and the probability of the heterozygous mutations is 2 * p * (1-p), assuming that the induced mutation ratio for each allele was p and the ratio of the WT allele was 1-p. A binomial test for all loci of homozygous SNVs or indels revealed that these loci did not follow a binomial distribution (Additional file 2: Fig. S5e and Additional file 1: Tables S4 and S5), indicating that these homozygous mutations remain background SNVs and indels. These data suggests that ABEs induce sgRNA-independent heterozygous DNA mutations.

Fig. 1

Profiling of off-target effects caused by ABE-mediated base editing in rice.The gene architecture of four base editors: rBE46b, rBE49b, rBE50, and rBE53. Ubi-P, maize ubiquitin 1 promoter,, nuclear localization sequence; NOS, nopaline synthase terminator.Diagram of the experimental design. For plants in pink rectangles, both genomes and transcriptomes were sequenced. For plants in blue rectangles, only genomes were sequenced a b NLS

Genome-wide analysis of ABE-induced single-nucleotide mutations

We next examined whether Cas proteins or TadA variants play distinct roles in inducing off-target DNA mutations by comparing plants harboring rBE46b with those harboring rBE49b as well as rBE50 to rBE53 to characterize TadA8e and TadA9, and compared plants with rBE46b to rBE50 and rBE49b to rBE53 to characterize the role of SpCas9n and SpCas9n-NG in off-target effects. Although there was no significant difference between TadA8e and TadA9 when the total number of SNVs was considered, plants harboring TadA9 had a higher number and a higher percentage of A>G SNVs (Fig. 2c), indicating that TadA9-based ABEs lead to a higher number of A>G SNVs. Plants harboring SpCas9n-NG had a higher number of SNVs as well as a higher number of A>G SNVs, but not a higher percentage of A>G SNVs (Fig. 2d), indicating that SpCas9n-NG-based ABEs lead to a higher number of SNVs.

We classified all SNVs into six types and calculated the percentage of each type of SNV versus the total number of SNVs. We observed a higher percentage of C>A/G>T SNVs in plants harboring TadA8e (Additional file 2: Fig. S7). We further mapped all SNVs and A>G SNVs to different genic and intergenic regions and calculated the ratio of SNVs in given regions versus in the whole genome. As a result, the number of A>G SNVs and the total number of SNVs were higher at all genic and intergenic regions in plants for all four types of ABEs, while A>G SNVs were enriched in genic regions and depleted in intergenic regions (Fig. 2e and Additional file 2: Fig. S8). In addition, we mapped total SNVs as well as A>G SNVs to the 12 rice chromosomes and established that they were distributed throughout the rice genome (Additional file 2: Fig. S9).

Fig. 2

Characterization of ABE-induced genomic mutations.,Number of indels, SNVs, and A>G SNVs, and percentage of A>G SNVs identified for plants that had undergone tissue culture (C1) orinfection (C2) and plants harboring SpCas9n-TadA8e (rBE46b), SpCas9n-TadA9 (rBE49b), SpCas9n-NG-TadA8e (rBE50), and SpCas9n-NG-TadA9 (rBE53). In each plot, each dot represents the number of indels, SNVs, and A>G SNVs, and the percentage of A>G SNVs from an individual plant; each middle line represents the median value; and each upper line and lower line represent the standard errors.Number of SNVs and A>G SNVs, and percentage of A>G SNVs were compared for ABE-edited plants harboring TadA8e or TadA9: rBE46b versus rBE49b, and rBE50 versus rBE53.Number of SNVs and A>G SNVs, and percentage of A>G SNVs were compared for ABE-edited plants harboring SpCas9n or SpCas9n-NG: rBE46b versus rBE50, and rBE49b versus rBE53.Percentage of A>G SNVs at given regions for plants in control groups or carrying one of the four ABEs. Each bar represents the mean value, and each error bar represents the standard error. (ns) denotes-value > 0.1, (*) denotes-value < 0.1, (**) denotes-value < 0.01, and (***) denotes-value < 0.001 (one-tailed Wilcoxon test) a b c d e Agrobacterium p p p p

T-DNA insertion influences the single-nucleotide mutations

We next examined the integrity of T-DNA regions containing both a complete left border (LB) and right border (RB) and identified four plants with a partial T-DNA insertion characterized by the missing TadA8e, TadA9, or SpCas9n-NG fragment (Additional file 2: Fig. S11a). However, desired on-target mutations were detected in three out of four plants (Additional file 2: Fig. S3), suggesting that sgRNA-dependent on-target A>G editing could occur before T-DNA integration into the rice genome. We further checked the off-target SNVs between plants with or without complete T-DNA insertion and found that plants with a complete T-DNA insertion had a higher number of total SNVs, a higher number of A>G SNVs, and a higher percentage of A>G SNVs when compared to those with partial T-DNA insertion (Fig. 3c).

It was known that T-DNAs can be integrated in rice genome in more than one copy [28], so we divided the plants into two groups based on whether one copy or multiple copies of T-DNAs were integrated. We examined the number of total SNVs, the number of A>G SNVs, and the percentage of A>G SNVs in plants with rBE46b, rBE49b, rBE50 and rBE53 separately and did not observe a consistent influence of the copy number of T-DNA insertion (Additional file 2: Fig. S12).

Fig. 3

ABE-induced DNA mutations in different T-DNA insertion events.IGV browser views showing the read coverages at T-DNA insertion sites. Lines 46bM_s2 and 46bM_s3, 49bM_s2 and 49bM_s3, and 49bAG_s3 and 49bAG_s4 were germinated from the same calli. Regions in red rectangles are the T-DNA insertion sites.Number of SNVs and A>G SNVs, and percentage of A>G SNVs. Set 1 represents the unique SNVs only in 46bM_s2, 49bM_s2, and 49bAG_s3. Set 2 represents the unique SNVs only in 46bM_s3, 49bM_s3, and 49bAG_s4. Overlap represents the overlapping SNVs in 46bM_s2 and 46bM_s3, 49bM_s2 and 49bM_s3, and 49bAG_s3 and 49bAG_s4.Number of SNVs and A>G SNVs, and percentage of A>G SNVs in plants with partial or whole T-DNA insertions of rBE50 or rBE53. Each bar represents the mean value, each error bar represents the standard error, and each dot represents the number of SNVs, the number of A>G SNVs, and percentage of A>G SNVs of each plant. (ns) denotes-value > 0.1, (*) denotes-value < 0.1 (one-tailed Wilcoxon test) a b c p p

ABEs induce transcriptome-wide A>G RNA mutations

Fig. 4

Transcriptome-wide ABE-induced off-target mutations.Number of SNVs and A>G SNVs, and percentage of A>G SNVs in plants harboring SpCas9 (Cas), SpCas9n-TadA8e (rBE46b), and SpCas9n-TadA9 (rBE49b).Ratios of A>G mutations were calculated for A>G SNV loci detected in lines R49bAG_s2 and R49bAG_s3 and shown in the scatterplot. The Pearson correlation coefficient () was also calculated, and the red line is the diagonal line.A sequence logo derived from edited adenines from all RNA-seq data. Bits account for how much each column is conserved and how much the nucleotide frequencies obtained in the profile differ from those that would have been obtained by aligning oligonucleotides chosen at random.Boxplot showing ratios of A>G mutations at all RNA A>G SNV loci for plants harboring SpCas9, rBE46b, and rBE49b. A Wilcoxon test was conducted between every plant harboring ABEs versus plants harboring Cas only, and the -log10-value is shown.Bar plot showing the average RPM values of ABEs for plants without RNA mutations and plants with RNA mutations. Each bar represents the mean value, each error bar represents the standard error, and each dot represents the ABE RPM value of each plant. (***) denotes-value < 0.001 (one-tailed Wilcoxon test).Ratios of A>G mutations of all A>G RNA SNV loci were calculated for one 49bAG_s2 Tplant and four 49bAG_s2 Tplants (left). -log10-value of Wilcoxon test on A>G ratios between five 49bAG_s2 plants versus plants harboring SpCas9 (middle). RPMs of ABEs are shown in the bar plot (right). N1 and N2 are T49bAG_s2 plants with a T-DNA insertion, while N3 and N4 are T49bAG_s2 plants without a T-DNA insertion a b c d e f r p p p 1 1 1

ABEs induce clustered off-target editing

We performed similar studies on DNA off-target SNVs but did not observe general patterns of flanking A>G editing. However, we did identify 25 loci with more than one A>G SNV from 12 plants (Additional file 1: Table S10); some loci contained 5–10 A>G SNVs, and others contained 2–3 A>G SNVs (Fig. 5d and Additional file 2: Fig. S18). Overall, 45% of these SNVs were located in the genic region, which is higher than the 30% observed for all A>G SNVs in the genic region, consistent with the tendency of off-target A>G SNVs to occur in the genic region (Fig. 5e). We classified these 12 plants into group 1, and the remaining 36 plants carrying ABEs into group 2. The number of SNVs and A>G SNVs and the percentage of A>G SNVs were significantly higher for plants in group 1 compared to plants in group 2 (Fig. 5f).

Fig. 5

ABE-induced clustered RNA and DNA A>G SNVs.An IGV genome browser view showing representative loci with clustered A>G SNVs in transcriptomes.Ratios of A>G mutations were calculated in flanking 5′ and 3′ 30-bp regions centered at A>G RNA SNV loci. Lines R49bAG_s2 and R49bAG_s3 with RNA mutations and line RCas_s1 with SpCas9 only are shown.Boxplot showing number of A>G SNVs in the flanking 5′ and 3′ 30-bp regions separately for RNA SNVs in many (3–8) or few (1–2) plants.IGV genome browser views showing representative SNV loci with flanking A>G SNVs in whole-genome sequencing.Ratios of clustered SNVs located in genic regions.Plants with ABEs were classified into two groups: group 1 with clustered SNVs and group 2 without clustered SNVs. Number of SNVs and A>G SNVs, and percentage of A>G SNVs are shown separately for plants in group 1 and plants in group 2. (**) denotes-value < 0.01, and (***) denotes-value < 0.001 (one-tailed Wilcoxon test). In IGV genome browser views, the grey bar represents a sequenced nucleotide that is the same as the reference genome, while bars in other colors represent sequenced nucleotides that are partially or totally different from the reference genome: red represents nucleotide A, green represents nucleotide T, orange represents nucleotide G, and blue represents nucleotide C. The height of each color bar represents the relative composition of each nucleotide a b c d e f p p

Discussion

The targeting specificity of CRISPR tools in applications remains a considerable concern. It is well known that Cas nucleases mediate highly specific genome editing with rare off-target mutations in plants [29, 30], and high-activity CBEs cause genome-wide off-target mutations in rice and mouse [14, 31, 32]. ABE8s and ABE9s have been developed by several groups to overcome the limitation of ABE7s [15 –17]. Their robust editing efficiency raised another question: How is the specificity of those high-activity ABEs engineered with TadA8e and TadA9 deaminases? Compared to mouse and human genomes (each ~ 3 Gb), the rice genome (~ 0.4 Gb) is small, making WGS of individuals more feasible. In addition, rice is self-pollinating, circumventing the challenges of population heterogeneity of human cells, and lacks innate A-to-I RNA editing, facilitating analyses of ABE-induced RNA editing. Therefore, we performed a comprehensive evaluation of ABE8- and ABE9-induced genetic mutations through WGS and transcriptome sequencing in rice.

Cas proteins and TadA variants play different roles in ABE-induced DNA off-target mutations: ABEs harboring SpCas9n-NG, an engineered SpCas9 protein recognizing a flexible protospacer adjacent motif (PAM) [33 –38], result in a higher number of total SNVs; those harboring TadA9, a TadA variant with robust activity [16], lead to a higher number of specific A>G SNVs. Plants transformed with the ABE rBE46b (SpCas9n-TadA8e) did not have more SNVs or a higher percentage of A>G SNVs than plants subjected to Agrobacterium infection, suggesting that selection of SpCas9n and TadA8e eliminates most sgRNA-independent DNA mutations induced by ABEs. Given that no sgRNA-dependent off-target mutations were observed, we conclude that optimization of sgRNA design is an efficient way of eliminating sgRNA-dependent off-target mutations.

Using deeply sequenced genomes and transcriptomes, we systematically studied ABE-induced RNA mutations. ABEs induce RNA A>G mutations in one-third of plants with high ABE expression but do not induce mutations in two thirds of plants with low ABE expression. When ABEs segregated out, RNA mutations diminished. In addition, T-DNA integration analysis suggested that stable ABEs induce more off-target SNVs than those whose T-DNA has not been integrated into the genome. Together, these data highlight the importance of controlling the expression of ABEs in future applications, such as using inducible or photoactivatable transcription systems, ribonucleoprotein-based delivery in clinic gene therapy [39, 40], and transgene-free gene-edited plants in crop breeding.

Without the noise from A-to-I mutations mediated by ADAR proteins, we were able to obtain a clean set of ABE-induced RNA mutations and discovered that ABEs induced clustered A>G mutations, which provided useful information for defining and characterizing true ABE RNA targets. Furthermore, given the existence of common and unique mutations in plants regenerated from the same callus, we provide robust experimental evidence that plants with different on-target editing could be derived from the same T-DNA insertion event with a shared set of off-target SNVs. Therefore, we highly recommend using two independent transgenic lines from separated calli (with two different T-DNA insertion sites and two sets of non-overlapping SNVs) in gene function studies.

Conclusions

The properties of the small genome, self-pollination, and the absence of ADAR proteins make rice a model organism to employ large-scale sequencing approaches to evaluate ABEs' off-target activity. The pioneering comprehensive analysis of ABE-induced DNA and RNA mutations using whole-genome and transcriptome sequencing in rice sheds light on defining and characterizing ABEs' specificity. The discovery that Cas proteins, TadA variants, transient expression, and the expression level of ABEs contribute to ABEs' specificity in rice points out alternative ways improving ABEs' specificity including combinatorial optimization of Cas/deaminase (SpCas9n-TadA8e) and temporal control of ABEs' expression besides the traditional protein engineering of deaminases.

Materials and methods

Plasmid construction

In this study, five rice (Oryza sativa) genomic loci (OsACC, OsGS1, OsMPK13, OsGSK3, and OsGSK4) and four rice genomic loci (OsACC, OsGS1, OsMPK13, and OsTms9) were targeted by rBE46b and rBE49b, respectively. Three genes (OsSERK2, OsDEP2, and OsGSK4) were targeted by both rBE50 and rBE53. Plant IDs and their corresponding information are described in Additional file 1: Table S1. The rBE46b, rBE49b, rBE50, and rBE53 expression plasmids were constructed as previously reported [16]. The empty entry vector without any spacer was cloned into pUbi:rBE46b, pUbi:rBE49b, pUbi:rBE50, and pUbi:rBE53 using Gateway technology to yield ABEs without sgRNAs (Additional file 1: Table S1).

-mediated rice transformation and plant growth Agrobacterium

The genome editing constructs were individually introduced into the Agrobacterium tumefaciens strain EHA105 via the freeze-thaw transformation method, and 2-week-old calli derived from immature seeds of the Geng rice variety Kitaake were infected by each Agrobacterium strain. After 4 weeks of culture on MSD medium supplemented with 50 mg/L hygromycin (Roche, Germany), the resistant callus lines were transferred onto RM plates to generate transgenic rice seedlings. All information on target gene mutations of each seedling examined in this study is given in Additional file 1: Table S1.

To eliminate background mutations, 10 individual Kitaake plants grown from seeds were used directly. Seedlings were regenerated from rice calli without Agrobacterium infection (namely C1) and regenerated from calli co-cultured with the empty EH105 strain (namely C2). Also, seedlings were regenerated from calli infected with EH105 strains harboring SpCas9 only (namely Cas). All rice materials were grown in the greenhouse under a 16-h-light/8-h-dark photoperiod, 28/25 °C temperature cycle, and 75% humidity.

DNA and RNA extractions

Genomic DNA of 4-week-old rice plants was extracted using the CTAB method (Li et al., 2016). Approximately 200 mg of fresh rice leaves was collected in a 2-ml centrifuge tube containing disposable metal balls. After being quickly frozen in liquid nitrogen, samples were ground to a fine powder using a tissue grinding apparatus (Jingxin, China). Following chloroform extraction, isopropanol precipitation, and 70% EtOH washing, genomic DNAs were eluted with 50 μL of double-distilled water supplemented with 1 μL of 10 U/μL RNase I (Thermo Fisher Scientific, USA) and stored at − 80 °C for later experiments.

RNA was extracted with TRIzol reagent (Takara, Japan) according to the manufacturer's instructions. Briefly, 100 mg of fresh rice leaves was sampled, quickly frozen in liquid nitrogen, and ground to a powder with a tissue grinding apparatus. Then, 1 ml of TRIzol reagent was added to the sample followed by chloroform and isopropanol treatment. Finally, RNA pellets were dissolved in 50 μL of RNase-free water (0.1% DEPC-treated) and stored at − 80 °C for later experiments.

Detection and validation of on-target and off-target mutations

The on-target genomic regions were amplified using Phanta Max Super-Fidelity DNA Polymerase (Vazyme, China) and locus-specific primers (Additional file: Table S1, Table S11, Table S12) with genomic DNAs and cDNAs used as the template. PCR amplicons were subjected to Sanger sequencing, and Bioedit software was used for sequence data analysis. 1

Whole-genome analysis of genetic mutations

RNA-free genomic DNAs (0.2 μg) from each sample were used to construct the DNA libraries using a NEBNext Ultra DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's instructions. DNA libraries were sequenced on the Illumina platform in the 150-nt paired-end mode with an average coverage depth of 40× (Additional file: Table S2). 1

The clean reads were mapped to the Kitaake genome V3 from Phytozome (https://data.jgi.doe.gov/refine-download/phytozome) via BWA [41] and sorted using samtools (v1.9) [42]. The Genome Analysis Toolkit (GATK v4.2) was used to mark duplicated reads and recalibrate base qualities [25]. To identify high-quality genetic changes at the genomic scale, we applied three independent germline variant-calling methods: GATK, LoFreq [23], and Strelka2 [23]. We documented SNVs identified by all three methods and indels identified by GATK and Strelka. All genetic changes identified by the three methods in the 10 Kitaake plants were combined and used as background mutations. Sanger sequencing has been performed to validate the overlapping set of SNVs called by the three methods (Additional file 2: Fig. S19). The genetic mutation ratios were calculated using an in-house R program and 'AC' value from GATK's results. Both background mutations and homozygous mutations were removed from the SNVs as well as indels. The IGV browser was used to demonstrate sgRNA-directed on-target mutations [43]. Then, the on-target mutations were removed for off-target analysis. sgRNA-dependent off-target mutations were discovered using Crisflash [26], and the genetic on-target mutations were assessed using the IGV browser. A gene annotation file (OsativaKitaake_499_v3.1.gene_exons.gtf) from the Phytozome website was used to define different genomic regions, such as gene regions, exon regions, and intergenic regions. The ggpubr, ggbio, and VennDiagram R libraries were used to draw the graphs.

Analysis of T-DNA insertion sites and ABE transcripts

The clean reads were mapped to T-DNA sequences using BWA and sorted using samtools. The T-DNA insertion sites were located through T-LOC (Li et al. in preparation). The coverage of T-DNAs between the left border (LB) and right border (RB) was assessed using the R library ShortRead. The expression of ABEs was quantified as the average raw read number of Cas proteins and TadA variants normalized by the total read number in millions. Since we used T0 plants, the copy number of T-DNA integration was calculated as the relative T-DNA coverage versus half coverage of the rice genome.

Analysis of ABE-induced RNA mutations

DNA-free RNAs (0.2 μg) were used to construct the RNA-seq libraries using a NEB Next Ultra RNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's instructions. RNA-seq libraries were sequenced on the Illumina platform in the 150-nt paired-end mode (Additional file: Table S8). 1

The clean reads were mapped to the Kitaake V3 genome and annotation from Phytozome via STAR aligner with a maximum of eight mismatches per paired-end read [44]. GATK was used to mark duplicate reads and split reads that contained Ns in their cigar string and to recalibrate base qualities. SNVs were called by GATK, LoFreq, and Strelka2 for each transcriptome dataset and corresponding genome dataset. The SNVs identified by three methods in the transcriptome data but not in the genome data were kept for later analysis. Sanger sequencing has been performed to validate the overlapping set of SNVs called by the three methods (Additional file 2: Fig. S20). All the genetic changes identified by the three methods in three Agrobacterium-infected plants were combined and used as background mutations and were removed from the SNVs identified in plants transformed with SpCas9, rBE46b, and rBE49b. The A>G mutation ratios of off-target RNA loci were calculated through in-house Python programs. The 30- and 3-bp flanking sequences of the off-target RNA SNVs were extracted from the Kitaake reference genome and subjected to motif prediction using WebLogo3 (http://weblogo. threeplusone.com /) [45].

Calculation of flanking A>G mutations in genome and transcriptome data

We combined all A>G off-target SNVs obtained from plants with RNA off-target activities. For each A>G SNV, we calculated the number of reads with nucleotide A, T, G, and C separately in the 5′ and 3′ 30-bp region with a read coverage larger than 10. The genetic change ratio was calculated as the number of Gs divided by the total number of As and Gs if the reference is A. The genetic change ratio was calculated as the number of Cs divided by the total number of Cs and Ts if the reference is T. Positions with an A>G mutation ratio of higher than 0.05 were used as the numerator, while positions of A/T with a read coverage larger than 10 were used as the denominator. Similarly, we combined all A>G off-target SNVs obtained from plants through WGS and calculated the percentage of A>G mutations at the 5′ and 3′ 30-bp flanking regions.

Parameters of boxplots used in this study

The horizontal line in the box represents the median value, and the bottom and top of the box are the lower (Q1) and upper quartiles (Q3), respectively. The upper whisker is min(max(x), Q3 + 1.5 × IQR), and the lower whisker is max(min(x), Q1 − 1.5 × IQR). IQR (interquartile range) = Q3 − Q1. Black dots located outsides the whiskers are outliers.

Supplementary Information

Additional file 1: Supplementary tables. Table S1, Summary of plants with ABEs. Table S2, Mapping statistics of whole-genome sequencing. Table S3, Summary of sgRNA-dependent on-target and off-target loci. Table S4, Summary of all the homozygous SNVs. Table S5, Summary of all the homozygous indels. Table S6, Summary of genomic SNVs detected through WGS. Table S7, Summary of the overlapping SNVs between each of the plants with whole-genome sequencing. Table S8, Mapping statistics of whole-transcriptome sequencing. Table S9, Summary of all the transcriptomic SNVs. Table S10, Summary of clustered A>G DNA SNVs. Table S11, Primers used to verify DNA SNVs by Sanger sequencing. Table S12, Primers used to verify RNA SNVs by Sanger sequencing.Additional file 2: Supplementary figures. Fig. S1. Sanger sequencing chromatograms of on-target mutations in plants harboring rBE46b and rBE49b. Fig. S2. Sanger sequencing chromatograms of on-target mutations in plants harboring rBE50 and rBE53. Fig. S3. IGV browser views showing the on-target mutations for 36 plants harboring ABEs. Fig. S4. Analysis of SNVs and indels identified by whole-genome sequencing. Fig. S5. Analysis of the remaining background homozygous DNA mutations. Fig. S6. Characterization of ABE-induced genomic mutations. Fig. S7. Distribution of six types of SNVs. Fig. S8. Distribution of SNVs at given regions of the genome. Fig. S9. Chromosomal distribution of SNVs. Fig. S10. On-target and off-target mutations in plants from the same calli. Fig. S11. Off-target SNVs in plants with incomplete T-DNA insertions. Fig. S12. Distribution of SNVs with different copy numbers of T-DNA insertions. Fig. S13. Transcriptome-wide distribution of ABE-induced off-target mutations. Fig. S14. Heatmap demonstrating A>G mutations in transcriptomes with more than 5 A>G SNVs detected. Fig. S15. The 5′ and 3′ flanking A>G mutations in transcriptomes with ABEs containing A>G RNA SNVs and in transcriptomes with SpCas9 only lacking A>G RNA SNVs. Fig. S16. The 5′ and 3′ flanking A>G mutations in transcriptomes with ABEs but without A>G RNA SNVs. Fig. S17. IGV genome browser views showing the off-target RNA mutations. Fig. S18. IGV genome browser views showing A>G mutations with flanking A>G SNVs in genome sequencing data. Fig. S19. Sanger sequencing chromatograms of off-target A>G DNA mutations. Fig. S20. Sanger sequencing chromatograms of off-target A>G RNA mutations.Additional file 3: Review history

A large-scale genome and transcriptome sequencing analysis reveals the mutation landscapes induced by high-activity adenine base editors in plants

Abstract

Key numbers

Key figures

Full Text

What this is

Essence

Key takeaways

Caveats

Definitions

Background

Results

ABEs induced sgRNA-independent heterozygous DNA mutations

Genome-wide analysis of ABE-induced single-nucleotide mutations

T-DNA insertion influences the single-nucleotide mutations

ABEs induce transcriptome-wide A>G RNA mutations

ABEs induce clustered off-target editing

Discussion

Conclusions

Materials and methods

Plasmid construction

-mediated rice transformation and plant growth Agrobacterium

DNA and RNA extractions

Detection and validation of on-target and off-target mutations

Whole-genome analysis of genetic mutations

Analysis of T-DNA insertion sites and ABE transcripts

Analysis of ABE-induced RNA mutations

Calculation of flanking A>G mutations in genome and transcriptome data

Parameters of boxplots used in this study

Supplementary Information

You found one interesting study. We’ll send the next 7.

what lands in your inbox each week:

Recent issues from the crispr gene editing brief

Abstract

Key numbers

Key figures

Full Text

What this is

Essence

Key takeaways

Caveats

Definitions

Background

Results

ABEs induced sgRNA-independent heterozygous DNA mutations

Genome-wide analysis of ABE-induced single-nucleotide mutations

T-DNA insertion influences the single-nucleotide mutations

ABEs induce transcriptome-wide A>G RNA mutations

ABEs induce clustered off-target editing

Discussion

Conclusions

Materials and methods

Plasmid construction

-mediated rice transformation and plant growth Agrobacterium

DNA and RNA extractions

Detection and validation of on-target and off-target mutations

Whole-genome analysis of genetic mutations

Analysis of T-DNA insertion sites and ABE transcripts

Analysis of ABE-induced RNA mutations

Calculation of flanking A>G mutations in genome and transcriptome data

Parameters of boxplots used in this study

Supplementary Information

Related papers

You found one interesting study. We’ll send the next 7.

what lands in your inbox each week:

Recent issues from the crispr gene editing brief