What this is
- This research investigates the mutation landscapes caused by high-activity () in rice.
- It focuses on the off-target DNA and RNA mutations induced by two specific : ABE8e and ABE9.
- The study employs whole-genome and transcriptome sequencing to assess the specificity and potential risks of these gene-editing tools.
Essence
- High-activity , particularly those with TadA9, induce more off-target A-to-G mutations in rice than other variants. The expression level of also influences the frequency of these mutations.
Key takeaways
- with TadA9 lead to a higher number of A>G () compared to those with TadA8e. This suggests that the choice of TadA variant significantly affects mutation outcomes.
- Off-target A>G RNA mutations cluster in plants with high ABE expression, indicating that expression levels play a crucial role in mutation patterns.
- The study reveals that on-target mutations can occur before T-DNA integration into the rice genome, which may have implications for the timing of gene editing applications.
Caveats
- The study primarily focuses on rice, which may limit the generalizability of the findings to other species or crops. Different genomes may respond differently to .
- The potential for off-target effects remains a concern, and further research is needed to fully understand the implications of these mutations in practical applications.
Definitions
- adenine base editors (ABEs): Gene-editing tools that convert A•T base pairs to G•C base pairs without causing double-stranded DNA breaks.
- single-nucleotide variants (SNVs): Alterations in a single nucleotide in the genome, which can impact gene function and traits.
AI simplified
Background
Single-nucleotide variants (SNVs), a universal feature of plant, animal, and human genomes, have been widely identified in association with agronomic traits and human diseases [1 –3]. Various clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas)-mediated base editing tools (e.g., ABEs and cytosine base editors), which efficiently produce desired point mutations in genomic DNA without causing double-stranded DNA breaks [4], have been used widely in laboratory research, crop and animal breeding, as well as human gene therapy [5 –7]. Since the mutation of G•C base pairs to A•T base pairs is the primary form of de novo mutations [8], ABEs that catalyze the conversion of A•T base pairs to G•C base pairs have great potential to correct human pathogenic point mutations [9]. However, potential DNA and RNA off-target mutations remain a serious concern and threaten to limit the application of ABEs.
The pioneer ABE7s, which are composed of a tRNA adenosine deaminase (TadA7.10) and CRISPR/Cas systems, perform remarkably clean and efficient A•T to G•C conversions in the genomes of a variety of species, including human, mouse, and rice, without inducing obvious genome-wide off-target DNA mutations [10 –14]. However, the editing efficiency of ABE7s varies in a locus-dependent manner [11, 13]. Subsequently, high-activity ABEs, such as those containing TadA8.17, TadA8.20, TadA8e, and TadA9, have been developed, engineered with various PAM-flexible Cas variants and tested in different organisms [15 –18], but a whole-genome assessment of the off-target DNA mutations induced by TadA8e and TadA9 has not yet been investigated.
The tRNA adenosine deaminase TadA, a key component of ABEs, induces site-specific inosine formation on RNAs [19]. Recently, it was reported that TadAs, ABE7s, and ABE8es induced a significantly higher number or higher mutation ratio of RNA A-to-G (A>G) SNVs when compared to Cas proteins or GFP [9, 20, 21] and that ABE8.17 and ABE8.20 induced very low levels of adenosine deamination in mRNAs if ABEs were delivered as messenger RNAs in mammalian cells [17]. Thus, several labs have developed improved TadA variants with reduced RNA activity [20, 22]. However, RNA A>G mutations induced by ABEs are complicated due to the large genomes in the heterogeneous mammalian cells as well as the conversion of adenosines into inosines mediated by endogenous adenosine deaminase RNA specific (ADAR) family. In addition, ABE-induced RNA mutations have never been reported in plant yet.
The relatively small genome (~ 0.4 Gb) of self-pollinated rice and the absence of endogenous ADAR family make rice an ideal model organism to examine the DNA and RNA specificity of gene editing tools. Here, we investigated the off-target DNA and RNA mutations induced by ABE8es and ABE9s in rice through whole-genome sequencing (WGS) and transcriptome sequencing.
Results
ABEs induced sgRNA-independent heterozygous DNA mutations
A few homozygous SNVs and indels were detected in all sequenced plants (Additional file 2: Fig. S5a, b). We counted the number of plants with the same mutation sites and found that the homozygous mutations tended to be present in more than one plant, while the heterozygous mutations tended to be present in a single plant (Additional file 2: Fig. S5c, d). These homozygous mutations could be the remaining background mutations or mutations induced by tissue culture, Agrobacterium infection, or ABEs. The induced mutations in the two alleles are two independent events following binomial distribution, so the probability of the homozygous mutations is p2, the probability of being wild type (WT) is (1-p)2, and the probability of the heterozygous mutations is 2 * p * (1-p), assuming that the induced mutation ratio for each allele was p and the ratio of the WT allele was 1-p. A binomial test for all loci of homozygous SNVs or indels revealed that these loci did not follow a binomial distribution (Additional file 2: Fig. S5e and Additional file 1: Tables S4 and S5), indicating that these homozygous mutations remain background SNVs and indels. These data suggests that ABEs induce sgRNA-independent heterozygous DNA mutations.

Profiling of off-target effects caused by ABE-mediated base editing in rice.The gene architecture of four base editors: rBE46b, rBE49b, rBE50, and rBE53. Ubi-P, maize ubiquitin 1 promoter,, nuclear localization sequence; NOS, nopaline synthase terminator.Diagram of the experimental design. For plants in pink rectangles, both genomes and transcriptomes were sequenced. For plants in blue rectangles, only genomes were sequenced a b NLS
Genome-wide analysis of ABE-induced single-nucleotide mutations
We next examined whether Cas proteins or TadA variants play distinct roles in inducing off-target DNA mutations by comparing plants harboring rBE46b with those harboring rBE49b as well as rBE50 to rBE53 to characterize TadA8e and TadA9, and compared plants with rBE46b to rBE50 and rBE49b to rBE53 to characterize the role of SpCas9n and SpCas9n-NG in off-target effects. Although there was no significant difference between TadA8e and TadA9 when the total number of SNVs was considered, plants harboring TadA9 had a higher number and a higher percentage of A>G SNVs (Fig. 2c), indicating that TadA9-based ABEs lead to a higher number of A>G SNVs. Plants harboring SpCas9n-NG had a higher number of SNVs as well as a higher number of A>G SNVs, but not a higher percentage of A>G SNVs (Fig. 2d), indicating that SpCas9n-NG-based ABEs lead to a higher number of SNVs.
We classified all SNVs into six types and calculated the percentage of each type of SNV versus the total number of SNVs. We observed a higher percentage of C>A/G>T SNVs in plants harboring TadA8e (Additional file 2: Fig. S7). We further mapped all SNVs and A>G SNVs to different genic and intergenic regions and calculated the ratio of SNVs in given regions versus in the whole genome. As a result, the number of A>G SNVs and the total number of SNVs were higher at all genic and intergenic regions in plants for all four types of ABEs, while A>G SNVs were enriched in genic regions and depleted in intergenic regions (Fig. 2e and Additional file 2: Fig. S8). In addition, we mapped total SNVs as well as A>G SNVs to the 12 rice chromosomes and established that they were distributed throughout the rice genome (Additional file 2: Fig. S9).

Characterization of ABE-induced genomic mutations.,Number of indels, SNVs, and A>G SNVs, and percentage of A>G SNVs identified for plants that had undergone tissue culture (C1) orinfection (C2) and plants harboring SpCas9n-TadA8e (rBE46b), SpCas9n-TadA9 (rBE49b), SpCas9n-NG-TadA8e (rBE50), and SpCas9n-NG-TadA9 (rBE53). In each plot, each dot represents the number of indels, SNVs, and A>G SNVs, and the percentage of A>G SNVs from an individual plant; each middle line represents the median value; and each upper line and lower line represent the standard errors.Number of SNVs and A>G SNVs, and percentage of A>G SNVs were compared for ABE-edited plants harboring TadA8e or TadA9: rBE46b versus rBE49b, and rBE50 versus rBE53.Number of SNVs and A>G SNVs, and percentage of A>G SNVs were compared for ABE-edited plants harboring SpCas9n or SpCas9n-NG: rBE46b versus rBE50, and rBE49b versus rBE53.Percentage of A>G SNVs at given regions for plants in control groups or carrying one of the four ABEs. Each bar represents the mean value, and each error bar represents the standard error. (ns) denotes-value > 0.1, (*) denotes-value < 0.1, (**) denotes-value < 0.01, and (***) denotes-value < 0.001 (one-tailed Wilcoxon test) a b c d e Agrobacterium p p p p
T-DNA insertion influences the single-nucleotide mutations
We next examined the integrity of T-DNA regions containing both a complete left border (LB) and right border (RB) and identified four plants with a partial T-DNA insertion characterized by the missing TadA8e, TadA9, or SpCas9n-NG fragment (Additional file 2: Fig. S11a). However, desired on-target mutations were detected in three out of four plants (Additional file 2: Fig. S3), suggesting that sgRNA-dependent on-target A>G editing could occur before T-DNA integration into the rice genome. We further checked the off-target SNVs between plants with or without complete T-DNA insertion and found that plants with a complete T-DNA insertion had a higher number of total SNVs, a higher number of A>G SNVs, and a higher percentage of A>G SNVs when compared to those with partial T-DNA insertion (Fig. 3c).
It was known that T-DNAs can be integrated in rice genome in more than one copy [28], so we divided the plants into two groups based on whether one copy or multiple copies of T-DNAs were integrated. We examined the number of total SNVs, the number of A>G SNVs, and the percentage of A>G SNVs in plants with rBE46b, rBE49b, rBE50 and rBE53 separately and did not observe a consistent influence of the copy number of T-DNA insertion (Additional file 2: Fig. S12).

ABE-induced DNA mutations in different T-DNA insertion events.IGV browser views showing the read coverages at T-DNA insertion sites. Lines 46bM_s2 and 46bM_s3, 49bM_s2 and 49bM_s3, and 49bAG_s3 and 49bAG_s4 were germinated from the same calli. Regions in red rectangles are the T-DNA insertion sites.Number of SNVs and A>G SNVs, and percentage of A>G SNVs. Set 1 represents the unique SNVs only in 46bM_s2, 49bM_s2, and 49bAG_s3. Set 2 represents the unique SNVs only in 46bM_s3, 49bM_s3, and 49bAG_s4. Overlap represents the overlapping SNVs in 46bM_s2 and 46bM_s3, 49bM_s2 and 49bM_s3, and 49bAG_s3 and 49bAG_s4.Number of SNVs and A>G SNVs, and percentage of A>G SNVs in plants with partial or whole T-DNA insertions of rBE50 or rBE53. Each bar represents the mean value, each error bar represents the standard error, and each dot represents the number of SNVs, the number of A>G SNVs, and percentage of A>G SNVs of each plant. (ns) denotes-value > 0.1, (*) denotes-value < 0.1 (one-tailed Wilcoxon test) a b c p p
ABEs induce transcriptome-wide A>G RNA mutations

Transcriptome-wide ABE-induced off-target mutations.Number of SNVs and A>G SNVs, and percentage of A>G SNVs in plants harboring SpCas9 (Cas), SpCas9n-TadA8e (rBE46b), and SpCas9n-TadA9 (rBE49b).Ratios of A>G mutations were calculated for A>G SNV loci detected in lines R49bAG_s2 and R49bAG_s3 and shown in the scatterplot. The Pearson correlation coefficient () was also calculated, and the red line is the diagonal line.A sequence logo derived from edited adenines from all RNA-seq data. Bits account for how much each column is conserved and how much the nucleotide frequencies obtained in the profile differ from those that would have been obtained by aligning oligonucleotides chosen at random.Boxplot showing ratios of A>G mutations at all RNA A>G SNV loci for plants harboring SpCas9, rBE46b, and rBE49b. A Wilcoxon test was conducted between every plant harboring ABEs versus plants harboring Cas only, and the -log10-value is shown.Bar plot showing the average RPM values of ABEs for plants without RNA mutations and plants with RNA mutations. Each bar represents the mean value, each error bar represents the standard error, and each dot represents the ABE RPM value of each plant. (***) denotes-value < 0.001 (one-tailed Wilcoxon test).Ratios of A>G mutations of all A>G RNA SNV loci were calculated for one 49bAG_s2 Tplant and four 49bAG_s2 Tplants (left). -log10-value of Wilcoxon test on A>G ratios between five 49bAG_s2 plants versus plants harboring SpCas9 (middle). RPMs of ABEs are shown in the bar plot (right). N1 and N2 are T49bAG_s2 plants with a T-DNA insertion, while N3 and N4 are T49bAG_s2 plants without a T-DNA insertion a b c d e f r p p p 1 1 1
ABEs induce clustered off-target editing
We performed similar studies on DNA off-target SNVs but did not observe general patterns of flanking A>G editing. However, we did identify 25 loci with more than one A>G SNV from 12 plants (Additional file 1: Table S10); some loci contained 5–10 A>G SNVs, and others contained 2–3 A>G SNVs (Fig. 5d and Additional file 2: Fig. S18). Overall, 45% of these SNVs were located in the genic region, which is higher than the 30% observed for all A>G SNVs in the genic region, consistent with the tendency of off-target A>G SNVs to occur in the genic region (Fig. 5e). We classified these 12 plants into group 1, and the remaining 36 plants carrying ABEs into group 2. The number of SNVs and A>G SNVs and the percentage of A>G SNVs were significantly higher for plants in group 1 compared to plants in group 2 (Fig. 5f).

ABE-induced clustered RNA and DNA A>G SNVs.An IGV genome browser view showing representative loci with clustered A>G SNVs in transcriptomes.Ratios of A>G mutations were calculated in flanking 5′ and 3′ 30-bp regions centered at A>G RNA SNV loci. Lines R49bAG_s2 and R49bAG_s3 with RNA mutations and line RCas_s1 with SpCas9 only are shown.Boxplot showing number of A>G SNVs in the flanking 5′ and 3′ 30-bp regions separately for RNA SNVs in many (3–8) or few (1–2) plants.IGV genome browser views showing representative SNV loci with flanking A>G SNVs in whole-genome sequencing.Ratios of clustered SNVs located in genic regions.Plants with ABEs were classified into two groups: group 1 with clustered SNVs and group 2 without clustered SNVs. Number of SNVs and A>G SNVs, and percentage of A>G SNVs are shown separately for plants in group 1 and plants in group 2. (**) denotes-value < 0.01, and (***) denotes-value < 0.001 (one-tailed Wilcoxon test). In IGV genome browser views, the grey bar represents a sequenced nucleotide that is the same as the reference genome, while bars in other colors represent sequenced nucleotides that are partially or totally different from the reference genome: red represents nucleotide A, green represents nucleotide T, orange represents nucleotide G, and blue represents nucleotide C. The height of each color bar represents the relative composition of each nucleotide a b c d e f p p
Discussion
The targeting specificity of CRISPR tools in applications remains a considerable concern. It is well known that Cas nucleases mediate highly specific genome editing with rare off-target mutations in plants [29, 30], and high-activity CBEs cause genome-wide off-target mutations in rice and mouse [14, 31, 32]. ABE8s and ABE9s have been developed by several groups to overcome the limitation of ABE7s [15 –17]. Their robust editing efficiency raised another question: How is the specificity of those high-activity ABEs engineered with TadA8e and TadA9 deaminases? Compared to mouse and human genomes (each ~ 3 Gb), the rice genome (~ 0.4 Gb) is small, making WGS of individuals more feasible. In addition, rice is self-pollinating, circumventing the challenges of population heterogeneity of human cells, and lacks innate A-to-I RNA editing, facilitating analyses of ABE-induced RNA editing. Therefore, we performed a comprehensive evaluation of ABE8- and ABE9-induced genetic mutations through WGS and transcriptome sequencing in rice.
Cas proteins and TadA variants play different roles in ABE-induced DNA off-target mutations: ABEs harboring SpCas9n-NG, an engineered SpCas9 protein recognizing a flexible protospacer adjacent motif (PAM) [33 –38], result in a higher number of total SNVs; those harboring TadA9, a TadA variant with robust activity [16], lead to a higher number of specific A>G SNVs. Plants transformed with the ABE rBE46b (SpCas9n-TadA8e) did not have more SNVs or a higher percentage of A>G SNVs than plants subjected to Agrobacterium infection, suggesting that selection of SpCas9n and TadA8e eliminates most sgRNA-independent DNA mutations induced by ABEs. Given that no sgRNA-dependent off-target mutations were observed, we conclude that optimization of sgRNA design is an efficient way of eliminating sgRNA-dependent off-target mutations.
Using deeply sequenced genomes and transcriptomes, we systematically studied ABE-induced RNA mutations. ABEs induce RNA A>G mutations in one-third of plants with high ABE expression but do not induce mutations in two thirds of plants with low ABE expression. When ABEs segregated out, RNA mutations diminished. In addition, T-DNA integration analysis suggested that stable ABEs induce more off-target SNVs than those whose T-DNA has not been integrated into the genome. Together, these data highlight the importance of controlling the expression of ABEs in future applications, such as using inducible or photoactivatable transcription systems, ribonucleoprotein-based delivery in clinic gene therapy [39, 40], and transgene-free gene-edited plants in crop breeding.
Without the noise from A-to-I mutations mediated by ADAR proteins, we were able to obtain a clean set of ABE-induced RNA mutations and discovered that ABEs induced clustered A>G mutations, which provided useful information for defining and characterizing true ABE RNA targets. Furthermore, given the existence of common and unique mutations in plants regenerated from the same callus, we provide robust experimental evidence that plants with different on-target editing could be derived from the same T-DNA insertion event with a shared set of off-target SNVs. Therefore, we highly recommend using two independent transgenic lines from separated calli (with two different T-DNA insertion sites and two sets of non-overlapping SNVs) in gene function studies.
Conclusions
The properties of the small genome, self-pollination, and the absence of ADAR proteins make rice a model organism to employ large-scale sequencing approaches to evaluate ABEs' off-target activity. The pioneering comprehensive analysis of ABE-induced DNA and RNA mutations using whole-genome and transcriptome sequencing in rice sheds light on defining and characterizing ABEs' specificity. The discovery that Cas proteins, TadA variants, transient expression, and the expression level of ABEs contribute to ABEs' specificity in rice points out alternative ways improving ABEs' specificity including combinatorial optimization of Cas/deaminase (SpCas9n-TadA8e) and temporal control of ABEs' expression besides the traditional protein engineering of deaminases.
Materials and methods
Plasmid construction
In this study, five rice (Oryza sativa) genomic loci (OsACC, OsGS1, OsMPK13, OsGSK3, and OsGSK4) and four rice genomic loci (OsACC, OsGS1, OsMPK13, and OsTms9) were targeted by rBE46b and rBE49b, respectively. Three genes (OsSERK2, OsDEP2, and OsGSK4) were targeted by both rBE50 and rBE53. Plant IDs and their corresponding information are described in Additional file 1: Table S1. The rBE46b, rBE49b, rBE50, and rBE53 expression plasmids were constructed as previously reported [16]. The empty entry vector without any spacer was cloned into pUbi:rBE46b, pUbi:rBE49b, pUbi:rBE50, and pUbi:rBE53 using Gateway technology to yield ABEs without sgRNAs (Additional file 1: Table S1).
-mediated rice transformation and plant growth Agrobacterium
The genome editing constructs were individually introduced into the Agrobacterium tumefaciens strain EHA105 via the freeze-thaw transformation method, and 2-week-old calli derived from immature seeds of the Geng rice variety Kitaake were infected by each Agrobacterium strain. After 4 weeks of culture on MSD medium supplemented with 50 mg/L hygromycin (Roche, Germany), the resistant callus lines were transferred onto RM plates to generate transgenic rice seedlings. All information on target gene mutations of each seedling examined in this study is given in Additional file 1: Table S1.
To eliminate background mutations, 10 individual Kitaake plants grown from seeds were used directly. Seedlings were regenerated from rice calli without Agrobacterium infection (namely C1) and regenerated from calli co-cultured with the empty EH105 strain (namely C2). Also, seedlings were regenerated from calli infected with EH105 strains harboring SpCas9 only (namely Cas). All rice materials were grown in the greenhouse under a 16-h-light/8-h-dark photoperiod, 28/25 °C temperature cycle, and 75% humidity.
DNA and RNA extractions
Genomic DNA of 4-week-old rice plants was extracted using the CTAB method (Li et al., 2016). Approximately 200 mg of fresh rice leaves was collected in a 2-ml centrifuge tube containing disposable metal balls. After being quickly frozen in liquid nitrogen, samples were ground to a fine powder using a tissue grinding apparatus (Jingxin, China). Following chloroform extraction, isopropanol precipitation, and 70% EtOH washing, genomic DNAs were eluted with 50 μL of double-distilled water supplemented with 1 μL of 10 U/μL RNase I (Thermo Fisher Scientific, USA) and stored at − 80 °C for later experiments.
RNA was extracted with TRIzol reagent (Takara, Japan) according to the manufacturer's instructions. Briefly, 100 mg of fresh rice leaves was sampled, quickly frozen in liquid nitrogen, and ground to a powder with a tissue grinding apparatus. Then, 1 ml of TRIzol reagent was added to the sample followed by chloroform and isopropanol treatment. Finally, RNA pellets were dissolved in 50 μL of RNase-free water (0.1% DEPC-treated) and stored at − 80 °C for later experiments.
Detection and validation of on-target and off-target mutations
The on-target genomic regions were amplified using Phanta Max Super-Fidelity DNA Polymerase (Vazyme, China) and locus-specific primers (Additional file: Table S1, Table S11, Table S12) with genomic DNAs and cDNAs used as the template. PCR amplicons were subjected to Sanger sequencing, and Bioedit software was used for sequence data analysis. 1
Whole-genome analysis of genetic mutations
RNA-free genomic DNAs (0.2 μg) from each sample were used to construct the DNA libraries using a NEBNext Ultra DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's instructions. DNA libraries were sequenced on the Illumina platform in the 150-nt paired-end mode with an average coverage depth of 40× (Additional file: Table S2). 1
The clean reads were mapped to the Kitaake genome V3 from Phytozome (https://data.jgi.doe.gov/refine-download/phytozome↗) via BWA [41] and sorted using samtools (v1.9) [42]. The Genome Analysis Toolkit (GATK v4.2) was used to mark duplicated reads and recalibrate base qualities [25]. To identify high-quality genetic changes at the genomic scale, we applied three independent germline variant-calling methods: GATK, LoFreq [23], and Strelka2 [23]. We documented SNVs identified by all three methods and indels identified by GATK and Strelka. All genetic changes identified by the three methods in the 10 Kitaake plants were combined and used as background mutations. Sanger sequencing has been performed to validate the overlapping set of SNVs called by the three methods (Additional file 2: Fig. S19). The genetic mutation ratios were calculated using an in-house R program and 'AC' value from GATK's results. Both background mutations and homozygous mutations were removed from the SNVs as well as indels. The IGV browser was used to demonstrate sgRNA-directed on-target mutations [43]. Then, the on-target mutations were removed for off-target analysis. sgRNA-dependent off-target mutations were discovered using Crisflash [26], and the genetic on-target mutations were assessed using the IGV browser. A gene annotation file (OsativaKitaake_499_v3.1.gene_exons.gtf) from the Phytozome website was used to define different genomic regions, such as gene regions, exon regions, and intergenic regions. The ggpubr, ggbio, and VennDiagram R libraries were used to draw the graphs.
Analysis of T-DNA insertion sites and ABE transcripts
The clean reads were mapped to T-DNA sequences using BWA and sorted using samtools. The T-DNA insertion sites were located through T-LOC (Li et al. in preparation). The coverage of T-DNAs between the left border (LB) and right border (RB) was assessed using the R library ShortRead. The expression of ABEs was quantified as the average raw read number of Cas proteins and TadA variants normalized by the total read number in millions. Since we used T0 plants, the copy number of T-DNA integration was calculated as the relative T-DNA coverage versus half coverage of the rice genome.
Analysis of ABE-induced RNA mutations
DNA-free RNAs (0.2 μg) were used to construct the RNA-seq libraries using a NEB Next Ultra RNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's instructions. RNA-seq libraries were sequenced on the Illumina platform in the 150-nt paired-end mode (Additional file: Table S8). 1
The clean reads were mapped to the Kitaake V3 genome and annotation from Phytozome via STAR aligner with a maximum of eight mismatches per paired-end read [44]. GATK was used to mark duplicate reads and split reads that contained Ns in their cigar string and to recalibrate base qualities. SNVs were called by GATK, LoFreq, and Strelka2 for each transcriptome dataset and corresponding genome dataset. The SNVs identified by three methods in the transcriptome data but not in the genome data were kept for later analysis. Sanger sequencing has been performed to validate the overlapping set of SNVs called by the three methods (Additional file 2: Fig. S20). All the genetic changes identified by the three methods in three Agrobacterium-infected plants were combined and used as background mutations and were removed from the SNVs identified in plants transformed with SpCas9, rBE46b, and rBE49b. The A>G mutation ratios of off-target RNA loci were calculated through in-house Python programs. The 30- and 3-bp flanking sequences of the off-target RNA SNVs were extracted from the Kitaake reference genome and subjected to motif prediction using WebLogo3 (http://weblogo. threeplusone.com↗ /) [45].
Calculation of flanking A>G mutations in genome and transcriptome data
We combined all A>G off-target SNVs obtained from plants with RNA off-target activities. For each A>G SNV, we calculated the number of reads with nucleotide A, T, G, and C separately in the 5′ and 3′ 30-bp region with a read coverage larger than 10. The genetic change ratio was calculated as the number of Gs divided by the total number of As and Gs if the reference is A. The genetic change ratio was calculated as the number of Cs divided by the total number of Cs and Ts if the reference is T. Positions with an A>G mutation ratio of higher than 0.05 were used as the numerator, while positions of A/T with a read coverage larger than 10 were used as the denominator. Similarly, we combined all A>G off-target SNVs obtained from plants through WGS and calculated the percentage of A>G mutations at the 5′ and 3′ 30-bp flanking regions.
Parameters of boxplots used in this study
The horizontal line in the box represents the median value, and the bottom and top of the box are the lower (Q1) and upper quartiles↗ (Q3), respectively. The upper whisker is min(max(x), Q3 + 1.5 × IQR), and the lower whisker is max(min(x), Q1 − 1.5 × IQR). IQR (interquartile range) = Q3 − Q1. Black dots located outsides the whiskers are outliers.
Supplementary Information
Additional file 1: Supplementary tables. Table S1, Summary of plants with ABEs. Table S2, Mapping statistics of whole-genome sequencing. Table S3, Summary of sgRNA-dependent on-target and off-target loci. Table S4, Summary of all the homozygous SNVs. Table S5, Summary of all the homozygous indels. Table S6, Summary of genomic SNVs detected through WGS. Table S7, Summary of the overlapping SNVs between each of the plants with whole-genome sequencing. Table S8, Mapping statistics of whole-transcriptome sequencing. Table S9, Summary of all the transcriptomic SNVs. Table S10, Summary of clustered A>G DNA SNVs. Table S11, Primers used to verify DNA SNVs by Sanger sequencing. Table S12, Primers used to verify RNA SNVs by Sanger sequencing.Additional file 2: Supplementary figures. Fig. S1. Sanger sequencing chromatograms of on-target mutations in plants harboring rBE46b and rBE49b. Fig. S2. Sanger sequencing chromatograms of on-target mutations in plants harboring rBE50 and rBE53. Fig. S3. IGV browser views showing the on-target mutations for 36 plants harboring ABEs. Fig. S4. Analysis of SNVs and indels identified by whole-genome sequencing. Fig. S5. Analysis of the remaining background homozygous DNA mutations. Fig. S6. Characterization of ABE-induced genomic mutations. Fig. S7. Distribution of six types of SNVs. Fig. S8. Distribution of SNVs at given regions of the genome. Fig. S9. Chromosomal distribution of SNVs. Fig. S10. On-target and off-target mutations in plants from the same calli. Fig. S11. Off-target SNVs in plants with incomplete T-DNA insertions. Fig. S12. Distribution of SNVs with different copy numbers of T-DNA insertions. Fig. S13. Transcriptome-wide distribution of ABE-induced off-target mutations. Fig. S14. Heatmap demonstrating A>G mutations in transcriptomes with more than 5 A>G SNVs detected. Fig. S15. The 5′ and 3′ flanking A>G mutations in transcriptomes with ABEs containing A>G RNA SNVs and in transcriptomes with SpCas9 only lacking A>G RNA SNVs. Fig. S16. The 5′ and 3′ flanking A>G mutations in transcriptomes with ABEs but without A>G RNA SNVs. Fig. S17. IGV genome browser views showing the off-target RNA mutations. Fig. S18. IGV genome browser views showing A>G mutations with flanking A>G SNVs in genome sequencing data. Fig. S19. Sanger sequencing chromatograms of off-target A>G DNA mutations. Fig. S20. Sanger sequencing chromatograms of off-target A>G RNA mutations.Additional file 3: Review history