Genome biologyMay 9, 2025

Using deep learning to predict how well adenine base editing works in different types of cells

CRISPR Gene Editing Weekly Brief ↗PubMed ↗DOI ↗OA ↗

Updated Mar 31, 2026

Abstract

Strong correlations (Spearman R = 0.83-0.92) were observed between in vitro and in vivo base editing datasets.

can convert A•T to G•C base pairs, targeting a total of 2,195 pathogenic mutations with 12,000 guide RNAs.
High accuracy in predicting adenine base editing efficiencies was achieved with the BEDICT2.0 model, showing correlations of R = 0.60-0.94 in cell lines and R = 0.62-0.81 in the liver.
The findings suggest that adenine base editing may effectively correct many pathogenic mutations.
BEDICT2.0 is designed to identify specific -ABE combinations that could provide high on-target editing with reduced off-target effects.

Simplified

BACKGROUND: (ABEs) enable the conversion of A•T to G•C base pairs. Since the sequence of the target locus influences base editing efficiency, efforts have been made to develop computational models that can predict base editing outcomes based on the targeted sequence. However, these models were trained on base editing datasets generated in cell lines and their predictive power for base editing in primary cells in vivo remains uncertain.

RESULTS: In this study, we conduct base editing screens using SpRY-ABEmax and SpRY-ABE8e to target 2,195 pathogenic mutations with a total of 12,000 guide RNAs in cell lines and in the murine liver. We observe strong correlations between in vitro datasets generated by ABE-mRNA electroporation into HEK293T cells and in vivo datasets generated by adeno-associated virus (AAV)- or lipid nanoparticle (LNP)-mediated nucleoside-modified mRNA delivery (Spearman R = 0.83-0.92). We subsequently develop BEDICT2.0, a deep learning model that predicts adenine base editing efficiencies with high accuracy in cell lines (R = 0.60-0.94) and in the liver (R = 0.62-0.81).

CONCLUSIONS: In conclusion, our work confirms that adenine base editing holds considerable potential for correcting a large fraction of pathogenic mutations. We also provide BEDICT2.0 - a robust computational model that helps identify -ABE combinations capable of achieving high on-target editing with minimal bystander effects in both in vitro and in vivo settings.

Key numbers

25%

Correction Rate

Percentage of targeted pathogenic mutations corrected with efficiencies above 10%.

R = 0.62-0.81

Prediction Accuracy in Liver

Spearman correlation coefficient for BEDICT2.0 predicting editing efficiencies in the liver.

Full Text

What this is

() convert A•T to G•C base pairs without DNA breaks.
Current predictive models for base editing efficiency are limited to in vitro cell line data.
This study evaluates ABE efficiency in both cell lines and murine liver, developing a new deep learning model, BEDICT2.0, for better predictions.

Essence

Adenine base editing shows promise for correcting pathogenic mutations, and BEDICT2.0 accurately predicts editing efficiencies in various cellular contexts.

Key takeaways

can correct approximately 25% of targeted pathogenic mutations with efficiencies above 10% and no detectable bystander editing for at least one -ABE combination.
BEDICT2.0 achieves high prediction accuracy for adenine base editing efficiencies in cell lines (R = 0.60-0.94) and in the liver (R = 0.62-0.81).
Editing efficiencies vary significantly between in vitro and in vivo datasets, with mRNA delivery improving correlation with in vivo outcomes.

Caveats

The predictive accuracy of BEDICT2.0 decreases when applied to in vivo datasets, indicating limitations in current models.
While ABE variants can target a broad range of , their average on-target editing efficiencies are lower compared to more specific variants.

Definitions

Adenine Base Editors (ABEs): Tools that enable precise conversion of A•T to G•C nucleotides without double-strand breaks.
sgRNA: Single guide RNA that directs the base editor to the target DNA sequence.
PAM: Protospacer adjacent motif, a short sequence required for Cas9 binding to DNA.

Simplified

Background

Adenine base editors (ABEs) enable the precise conversion of A•T to G•C nucleotides without causing DNA double-strand breaks or requiring homology-directed repair from DNA donor templates [1–3]. They are composed of laboratory-evolved E.coli adenosine deaminases (ecTadA) fused to nuclease-impaired Cas9 (D10A) proteins, and a single guide RNA (sgRNA) which guides the base editor complex to the desired locus in the genome [4, 5]. Among the most frequently used ABE variants are ABEmax, a fusion of Streptococcus pyogenes SpCas9(D10A) and the codon-optimized ecTadA7.10 [6], and ABE8e, in which the processivity of the adenine deaminase was further enhanced by phage assisted directed protein evolution [7]. As the targeting range of these ABE variants is constrained by the NGG protospacer-adjacent motif (PAM) requirement of SpCas9 [8–12], researchers engineered variants towards extended PAM recognition, such as SpG that recognizes NGN motifs or SpRY that recognizes NRN and to a lesser extent NYN motifs [13–21].

With these PAM-relaxed base editors at hand, nearly any site in the genome can be targeted, allowing to shift the position of the target base within the protospacer. While this strategy can be used to maximize on-target editing and minimize unintended bystander editing (conversion of neighbouring adenines) [2, 22], it requires experimental testing of different sgRNA-ABE combinations. This is a laborious and time-consuming process, making computational models that predict base editing efficiencies in silico highly valuable [23–27]. However, currently available models are only trained and tested on base editing datasets generated in vitro in cell lines, and their accuracy for predicting in vivo base editing outcomes in tissues remains uncertain [28].

To address this limitation, we conducted ABE screens not only in cell lines but also in the murine liver, and developed a machine-learning model capable of predicting editing efficiencies with high accuracy in both contexts.

Results

ABE screening in cell lines

We observed high library coverage and a strong correlation of editing rates between the three biological replicates after data processing and filtering (Additional file 1: Figs. S1 and S2). Additionally, we noted a strong correlation in base editing outcomes between the datasets with 5 and 10 days of selection (Spearman’s R = 0.78–0.97, and Pearson’s r = 0.9–0.96; Fig. S1b), prompting us to focus only on the 10 days dataset for further analysis (termed HEK-Plasmid dataset). When we first assessed PAM preferences of the different ABE variants, we found that they closely resembled those of the respective Cas9 nuclease variants (Fig. 1c) [13]. Specifically, SpRY-ABE variants achieved editing on all NRN and to a lesser extend NYN motifs, while SpG-ABE variants were primarily limited to NGN and NAN PAMs and SpCas9-ABE variants were restricted to NGG and NAG PAMs.

To next evaluate the average editing efficiencies of different base editor variants, we filtered the datasets based on the preferred PAM sequences of the Cas9 variants: NRN for SpRY, NGN for SpG, and NGG for SpCas9. Subsequent high-throughput sequencing (HTS) analysis revealed higher average editing rates for SpCas9 ABEs on NGG PAMs (36.0% for SpCas9-ABEmax and 64.9% for SpCas9-ABE8e) compared to SpG and SpRY ABEs on NGN or NRN PAMs, respectively (17.4% for SpG-ABEmax, 34.2% for SpG-ABE8e, 20.2% for SpRY-ABEmax and 33.9% for SpRY-ABE8e; Additional file 1: Fig. S2b, c).

Consistent with previous findings, the analysis of A-to-G conversions across the entire protospacer showed an editing window of approximately 7 bases for ABEmax and 11 bases for ABE8e variants (Fig. 1d) [1, 7, 13]. Consequently, correction of pathogenic mutations often included bystander editing. Since coding bystander mutations can be problematic, especially for translational applications, we evaluated the frequency at which pathogenic mutations could be corrected without inducing bystanders with at least one of the sgRNA-ABE combinations. Our analysis revealed that among the 36.9% of pathogenic A-to-G mutations that could be corrected with efficiencies above 10%, 69.4% did not exhibit bystander editing using a cut-off of ≤ 0.5% (Fig. 1e).

Fig. 1

High-throughput ABE screening in HEK293T cells using target-matched sgRNA libraries.Strategy of correcting pathogenic mutations without bystander editing by sgRNA tiling. The sgRNA not including the coding bystander within the editing window is shadowed darker.Schematics of the ABE screen in HEK293T cells using plasmid transfection for ABE delivery.Total editing efficiencies for each PAM in HEK293T cells after 10 days ABE selection with SpRY-ABE8e and SpRY-ABEmax (top row), SpG-ABE8e and SpG-ABEmax (middle row), SpCas9-ABE8e and SpCas9-ABEmax (bottom row). Y-axis indicates the 1st nucleotide of the PAM motif, the x-axis the 2nd and 3.nucleotide of the PAM.Editing window for SpRY-ABE8e and SpRY-ABEmax (top row), SpG-ABE8e and SpG-ABEmax (middle row), SpCas9-ABE8e and SpCas9-ABEmax (bottom row). Datasets were filtered for best PAMs (NRN for SpRY, NGN for SpG, and NGG for SpCas9).Correction of pathogenic mutations in the library with- or without inducing non-silent bystander mutations for different base editors. Cut-offs were ≥ 10% for on-target editing and ≤ 0.5% for bystander editing. Target sites with on-target editing below 10% were defined as not corrected. Number of target sites (n) for SpRY-ABE8e: 11838, SpRY-ABEmax: 11497, SpG-ABE8e: 10287, SpG-ABEmax: 9400, SpCas9-ABE8e: 7540, SpCas9-ABEmax: 9702, ABE combined: 12000 a b c d e rd

ABE screening in the murine liver

Fig. 2

High-throughput ABE screening in the liver cells with target-matched sgRNA libraries reveals correlation to cell culture.The sgRNA library was injected in p1 pups prior to ABE injection in juvenile mice. Editing rates were analysed by HTS.Correlation of total A-to-G editing between the mRNA-LNP and AAV dataset with SpRY-ABE8e (= 2176) and SpRY-ABEmax (= 7247). The red line represents linear regression.Violin plot of total editing efficiency for SpRY-ABE8e and SpRY-ABEmax in the indicated datasets. Datasets were filtered for most efficient PAMs (NRN) and mean editing efficiency is plotted (grey line). n for SpRY-ABE8e = 7882, 1623, 3459 and SpRY-ABEmax = 7644, 5170, 5852.Total editing efficiency for each PAM present in the library for SpRY-ABE8e (left) and SpRY-ABEmax (right) for the mRNA-LNP and AAV datasets.Editing window in the mRNA-LNP and AAV datasets are for SpRY-ABE8e (left) and SpRY-ABEmax (right) filtered for best PAMs (NRN).Proportion of the different tri-nucleotide motifs for loci above mean editing efficiency (top) and below mean editing efficiency (bottom) for SpRY-ABE8e (left) and SpRY-ABEmax (right) of various screening methods a b c d e f n n

Fig. 3

Correlation of editing efficiencies between in vitro and in vivo ABE screening datasets.Correlation of total A-to-G editing efficiency between in vivo (mRNA-LNP and AAV) and in vitro (HEK-Plasmid) screening datasets for SpRY-ABE8e (left,= 2418, 5233) and SpRY-ABEmax (right,= 7817, 8770).Violin plots of total editing efficiency in mRNA-ABE datasets with SpRY-ABE8e (top) and SpRY-ABEmax (bottom) with 0.2 pmol, 1 pmol or 5 pmol mRNA transfection. Datasets were filtered for best PAMs (NRN) and mean editing efficiency is given (grey line). n for SpRY-ABE8e = 6361, 6424, 5730 and SpRY-ABEmax = 6159, 6322, 5961.Correlation of total A-to-G editing efficiency between in vivo (mRNA-LNP and AAV) and in vitro (HEK-mRNA) screening datasets for SpRY-ABE8e (left,= 2388, 5018) and SpRY-ABEmax (right,= 7308, 7897). The red line in all plots represents linear regression a b c n n n n

Development of a deep learning model for predicting adenine base editing

Next, we utilized the ABE datasets to develop and train computational models for predicting adenine base editing efficiencies. We adapted the BE-DICT model architecture [25] by changing its design from an encoder-decoder to an encoder-encoder neural network [38], reducing its computational complexity (denoted as BEDICT1.2). The new model takes both, the reference sequence (the target sequence) and an output sequence (each potential editing outcome) as input and estimates the probability of obtaining this output sequence, substantially decreasing the computation time of the model.

To determine which parts of the target sequences are crucial for predicting editing efficiencies, we trained the model on the HEK-Plasmid datasets (80% train, 10% test, 10% validation; split performed on a gene level to avoid information leakage between nearby pathogenic sequences) using three different input configurations: either only the 20nt protospacer, the protospacer plus the 4nt PAM, or the protospacer plus the 4nt PAM and plus 5nt flanking sequences (Additional file 1: Fig. S8a). As expected, including the PAM in the input significantly improved the predictive accuracy of the model, whereas including the flanking sequences did not affect the model performance. Consequently, we restricted the input sequence to the protospacer plus the PAM.

To investigate whether increasing the size of the base editing dataset could further enhance the prediction accuracy of BEDICT2.0, we subdivided the HEK-mRNA datasets into bins of varying sizes, ranging from 10 to 100% of the total data (Additional file 1: Fig. S8e). Our analysis revealed that larger dataset bins improved editing prediction accuracy. However, the improvements plateaued, suggesting that further increasing the library size would result in only marginal gains in the prediction accuracy of BEDICT2.0.

We then benchmarked the performance of BEDICT2.0 to other machine learning models designed to predict base editing outcomes with ABEmax, including BE-HIVE [23], a deep conditional autoregressive model, and DeepABE [24], a convolutional neural network (CNN) model (Additional file 3: Table S1). To minimize the influence of experimental biases, we compared BEDICT2.0 to BE-HIVE using the DeepABE training dataset, and BEDICT2.0 to DeepABE using the BE-HIVE training dataset. In these comparisons, BEDICT2.0 performed slightly better than DeepABE and comparable to BE-HIVE (Fig. 4e). Subsequently, we tested all three models on datasets in which SpCas9 ABEmax was used to edit endogenous target sites (Song et al., 2020 [24] and Marquart et al., 2021 [25]). While BEDICT2.0 showed a slight advantage on the dataset generated in our laboratory (Marquart et al. [25]) and DeepABE on the dataset generated in their laboratory (Song et al. [24]), overall the performance of the three models was comparable (Fig. 4f,g, Additional file 1: Fig. S8f).

BE-HIVE and DeepABE were trained on in vitro cell line datasets in which the base editor was delivered via plasmid. Therefore, we examined whether their performance, like that of BEDICT2.0 (trained on the HEK-plasmid dataset), would also decrease in in vivo ABEmax datasets. Applying both models to the HEK-plasmid, mRNA–LNP, and AAV SpRY-ABEmax datasets revealed that their performance dropped from Spearman and Pearson of R = 0.6, r = 0.57 (BE-HIVE) and R = 0.49, r = 0.49 (DeepABE) in the HEK-plasmid dataset to R = 0.52–0.59, r = 0.44–0.47 (BE-HIVE) and R = 0.4–0.45, r = 0.37–0.43 (DeepABE) in the in vivo datasets (Fig. 4h).

Fig. 4

Establishment and evaluation of BEDICT2.0, a machine learning model predicting ABE activity in vitro and in vivo.Schematics of BEDICT2.0 machine learning algorithm. BEDICT2.0 includes an Efficiency Model (predicts total editing efficiency) and a Proportion Model (predicts distribution within the edited reads). Outputs of both models are combined to predict editing efficiency.Comparison of the performance of BEDICT1.2 or BEDICT2.0 on various HEK-Plasmid test datasets generated in this study.Comparison of the performance of BEDICT2.0 trained on either HEK-Plasmid and tested on the in vivo datasets, trained on HEK-mRNA and tested on the in vivo datasets or trained and tested on the in vivo datasets.Editing efficiency predicted by BEDICT2.0 plotted against the measured efficiency for SpRY-ABE8e (top) and SpRY-ABEmax (bottom) for HEK-mRNA (5 pmol), mRNA-LNP or AAV datasets. The red line represents linear regression.Comparison of BEDICT2.0 to other base editing prediction models on adenine base editing datasets from target-matched sgRNA library screens. Datasets used for comparison are SpCas9-ABEmax (mES-12kChar) [] and SpCas9-ABE7.10 (HT-ABE Train) []. ML-models used for predicting ABE editing outcome: DeepABE [], BE-Hive-ABE-HEK293T [] and BEDICT2.0 (this study).Total A-to-G editing efficiency at endogenous loci in various datasets correlated to BEDICT2.0 (trained on the HEK-plasmid dataset) predictions. n for Marquart-HEK293T []: 18, Song-HEK293T: 72, Song-U2OS: 22, Song-HCT116: 41 [].Spearman and Pearson correlation of measured and predicted editing efficiencies with BE-HIVE [], DeepABE [] and BEDICT2.0 (trained on HEK-plasmid) on various datasets generated on endogenous loci.Spearman and Pearson correlation of measured and predicted editing efficiencies of BE-HIVE [] and DeepABE [] on the different SpRY-ABEmax datasets. Datasets were filtered for protospacers with NGG PAMs for DeepABE, as the model can only be applied for NGG PAMs a b c d e f g h [23] [24] [24] [23] [25] [24] [23] [24] [23] [24]

Discussion

In this study, we performed high-throughput screens to systematically evaluate the efficiency and accuracy of adenine base editors (ABEs) in correcting pathogenic mutations. We combined six ABE variants (SpCas9-ABEmax, SpG-ABEmax, SpRY-ABEmax, SpCas9-ABE8e, SpG-ABE8e, and SpRY-ABE8e) with 12,000 different sgRNAs to target more than 2,000 pathogenic mutations. In HEK293T cells, these screens revealed that approximately 25% of the targeted pathogenic mutations could be corrected with efficiencies above 10% and no detectable bystander editing for at least one ABE–sgRNA combination. Moreover, although SpRY-based ABEs enabled editing at a far broader range of PAMs than SpCas9- or SpG-based ABEs, their average on-target editing efficiencies were lower. These findings recapitulate a trade-off previously documented in Cas9 nuclease studies [39], in which a broader target scope often accompanies a reduction in average editing rates.

We next assessed how different delivery modalities and cell types influence base-editing outcomes. Screening the same sgRNA library with SpRY-ABEmax or SpRY-ABE8e delivered into the murine liver via AAV or mRNA-LNP, we observed minimal differences in the distribution of editing outcomes between these two delivery methods. In contrast, in vivo results correlated less strongly with datasets derived from plasmid-based transfection of ABEs into HEK293T cells. However, when we adapted our in vitro protocol to deliver the base editor into HEK cells via mRNA electroporation rather than plasmid transfection, the correlations with in vivo datasets improved substantially. A likely explanation is that mRNA delivery better recapitulates the physiological expression levels of the editor after in vivo delivery—with high ABE expression after plasmid transfection resulting in saturating editing rates at many target sites, obscuring meaningful differences among sgRNAs at these loci. Consistent with this hypothesis, we have previously observed that ABE expression in HEK cells after plasmid transfection can be more than 10,000-fold higher than after in vivo delivery via AAV or mRNA–LNP [28].

Building on our comprehensive base-editing datasets, we next developed and validated a deep learning model, BEDICT2.0, to predict ABE editing efficiencies. When trained on plasmid-based cell-line datasets, BEDICT2.0 performed on par with previously developed machine-learning models, such as BE-HIVE and DeepABE, on external datasets. However, as with these earlier models, its performance decreased when applied to in vivo datasets. We therefore also trained BEDICT2.0 on cell line data where ABE was delivered via mRNA, which resulted in a model that maintained high accuracy in vivo.

Conclusion

In conclusion, our work confirms that adenine base editing holds considerable potential for correcting a large fraction of pathogenic mutations. We also provide BEDICT2.0 – a robust computational model that helps identify sgRNA-ABE combinations capable of achieving high on-target editing with minimal bystander effects in both in vitro and in vivo settings.

Methods

Oligo library design

2100 Target loci were selected via ClinVar [29] and LOVD [30] database (August 2020; ‘Pathogenic’ and ‘Likely Pathogenic’ mutations, monogenic disorders, G-to-A conversion mutation targetable by ABE). In order to find PAM compatible with SpRY, genomic region flanking the target site were extracted from UCSC server (http://genome.ucsc.edu/) and scanned for NGN, NAN, NCA, NCT, NTA and NTG PAM. For each target loci, 5–6 sgRNA were selected in such a way, that the target base is at position 2–12 (starting from position 6, then 5 and 7, 4 and 8, 3 and 9, 2 and 10, 11 and 12), summarizing in a total of 12′000 sgRNA. The custom oligonucleotide was purchased from Twist Bioscience, including the following elements: G/20N spacer, SpCas9 optimized scaffold [40, 41], corresponding target locus containing the 3 nt PAM and 30 nucleotides overhang on each site of the complementary region to the spacer binding site.

Cloning of plasmids

All plasmids were either generated using isothermal assembly (NEBuilder® HiFi DNA Assembly Cloning Kit, NEB) or restriction digest and ligation using T4 ligase (NEB). PCR were conducted using NEBNext® High-Fidelity 2X PCR Master Mix (NEB).

Plasmids p2T-CMV-ABEmax -BlastR (Addgene #152989) and ABE8e (Addgene #138489) were gifts from David Liu. Plasmid p2T-ABE8e-SpCas9-BlastR was generated by ligation of the ABE8e transgene (AgeI-NotI digest of pCMV-ABE8e) into the Tol2 compatible backbone (AgeI-NotI-EcoRV digest of p2T-CMV-ABEmax-BlastR). Plasmids p2T-CMV-ABEmax-SpG-BlastR and p2T-CMV-ABE8e-SpG-BlastR were generated by isothermal assembly of either PCR amplified ABEmax or ABE8e with SpG transgene into the Tol2 compatible backbone. Plasmids p2T-CMV-ABEmax-SpRY-BlastR and p2T-CMV-ABE8e-SpRY-BlastR were generated by isothermal assembly of either PCR amplified ABEmax or ABE8e with SpRY transgene into the Tol2 compatible backbone.

Lenti-gRNA-p3-eGFP was generated by PCR amplified p3 and eGFP transgene into the Lenti compatible backbone (ApaI-MluI digest of Lenti-gRNA-puro, a gift from Hyongbum Kim [42], Addgene #84752).

AAV-library plasmid was generated as following: Linearized pcDNATM3.1/Zeo( +) plasmid (BglII-XbaI, V86020 ThermoFisher) and PCR Amplified hSyn1-eGFP-WPRE-bGHp(A)-229 transgene together with the PCR amplified U6 promoter and plasmid-library cloning site were combined by isothermal assembly. The generated plasmid was further linearized (NotI-XbaI digest) and ligated into the AAV compatible backbone (NotI-XbaI digest of an AAV plasmid (p3_NLS-(1–1153)-GG)). In a third and fourth step, PCR amplified RORI and LILO fragments from the template plasmid pT2/PGK-neo were cloned inside the newly generated plasmid by isothermal assembly, leading to the final cloned plasmid AAV-RORI-hSyn1-chl-GFP-WPRE-bGHp(A)-hU6-LILO (short: AAV-SB-Library plasmid).

All AAV-BE plasmids were generated by isothermal assembly of combinations of PCR amplified BE split (N-split: 1–573, C-split: 574–1368) transgene, PCR amplified p3 promoter, PCR amplified P2A as well as PCR amplified RFP or SB (PCR amplified from pCMV(CAT)T7-SB100, was a gift from Zsuzsanna Izsvak [43], Addgene #34879) into an AAV compatible backbone, over several steps.

Plasmid-library preparation

For plasmid-library preparation the protocol described by Marquart et al. [25] was followed with minor changes to optimize the workflow. The oligonucleotide pool was PCR-amplified in 12 cycles (Primers stated in Supplemental Information) and Q5 High-Fidelity DNA Polymerase (New England Biolabs, NEB) following the manufacturer’s instructions. The resulting amplicons were gel purified using NucleoSpin Gel and PCR Clean-up Mini kit (Macherey–Nagel) following the manufacturer’s instructions. Lenti-gRNA-puro, Lenti-sgRNA-p3-eGFP or AAV-SB-Library were digested with Esp3I restriction enzyme and Shrimp Alkaline Phosphatase (rSAP, NEB) for 12 h at 37°C. After gel purification, the oligo-pool amplicons were assembled into the linearized Lenti-gRNA-Puro Lenti-sgRNA-p3-eGFP or AAV-SB-Library plasmid using NEBuilder HiFi DNA Assembly Master Mix (NEB) for 1 h at 50°C. The product was further purified by isopropanol precipitation using one volume of isopropanol, 0.02 volume 5 M NaCl and 0.01 volume GlycoBlue coprecipitant (Invitrogen). After precipitation and ethanol wash, the air-dried pellet was resuspended in dH2O. 100 ng of plasmid library were transformed per 25 µL electrocompetent cells (ElectroMAX Stbl4, Invitrogen) using a GenePulser II device (Bio-Rad). Transformed cells were recovered in S.O.C. media and incubated for 14 h at 30°C. Colonies were scraped, pooled and let grow in bacterial media for another 6 h before plasmids were purified using a Plasmid Maxiprep kit (Qiagen).

Cell culture

HEK293T (ATCC CRL-3216) were maintained in DMEM plus GlutaMax (Thermo Fisher Scientific), supplemented with 10% (vol/vol) fetal bovine serum (FBS, Sigma-Aldrich) and 1 × penicillin–streptomycin (Thermo Fisher Scientific) at 37°C and 5% CO2. Cells were maintained at confluency below 90% and passaged every 2–3 days. N2A () were maintained in EMEM plus GlutaMax (Thermo Fisher Scientific), supplemented with 10% (vol/vol) fetal bovine serum (FBS, Sigma-Aldrich) and 1 × penicillin–streptomycin (Thermo Fisher Scientific) at 37°C and 5% CO2. Cells were maintained at confluency below 90%, passaged every 2–3 days and tested negative for Mycoplasma contamination. Cells were authenticated by the supplier by short tandem repeat analysis.

Packaging of guide RNA library into lentivirus

HEK293T cells were used for lentivirus production. 2.65 µg pCMV-VSV-G (a gift from B. Weinberg [44], Addgene #8454), 5.3 µg psPAX2 (a gift from D.Trono, Addgene #12260) and 10.8 µg target library plasmid were mixed in 506 µL Opti-MEM (Thermo Fisher Scientific). After addition of 152 µL polyethyleneimine (PEI, 1 mg/mL), the transfection mix was vortexed for 10 s and incubated 10 min, before added gently to the cells at 70–80% confluency together with 25 mL serum-free DMEM. After 1 day the medium was changed to culture medium and 2 days later, supernatant was harvested. Prior to ultracentrifugation (20′000xg, 2 h), medium was filtered using a Filtropur S 0.4 (Sarstedt) filter. Lentivirus aliquots were stored at -80 °C until use.

Pooled base editor screens

Lentivirus containing sgRNA-pool were transduced at a MOI of 0.2 and a calculated coverage of 1000 cells per gRNA in HEK293T cells at a confluence of 70–80%. One day after transduction, cells were split and selected with 2.5 µg/mL puromycin for 10 days. Selected HEK293T cells were frozen and for each new screen thawed with a coverage of 2000x. Respective base editor plasmid (9.25 ug) and helper plasmid (9.25 ug of pCMV-Tol2, a gift from Stephen Ekker [45], Addgene #31,823) were transfected in a 1:3 DNA:PEI ratio per T175 flask at a coverage of 2000x. One day after transfection, cells were split and selected with 2.5 µg/mL puromycin and 7.5 µg/mL blasticidin for 5 or 10 days. Cells were detached and genomic DNA was extracted using a Blood & Cell Culture DNA Maxi kit (Qiagen) according to the manufacturer’s instructions.

Nucleofections of HEK293T cells were performed using the NeonTM transfection system using 100 µL tips. Cells were harvested and washed 3 × with phosphate-buffered saline (PBS) prior to counting. Cells were repeatedly spun down and resuspended in R buffer (DPBS supplemented with 1 mM MgCl2 and 250 mL Sucrose) to a concentration of ~ 3 × 10⁴ cells/µL. Reactions were prepared in PBS by the respective addition of mRNA for 0.2 pmol, 1 pmol and 5 pmol. For mRNA, one pulse of 1400 mV and 20 mS pulse width was used. After nucleofection, Cells were maintained at confluency below 90% and passaged every 2–3 days. N2A () were maintained in EMEM plus GlutaMax (Thermo Fisher Scientific), supplemented with 10% (vol/vol) fetal bovine serum (FBS, Sigma-Aldrich) and 1 × penicillin–streptomycin (Thermo Fisher Scientific) at 37°C and 5% CO2 for 72 h prior to harvesting. Cells were detached and genomic DNA was extracted using a Blood & Cell Culture DNA Maxi kit (Qiagen) according to the manufacturer’s instructions. Modified nucleoside-containing mRNA was generated using N1mΨ-5′-triphosphate (TriLink) instead of UTP. Co-transcriptional addition of the trinucleotide cap1 analog, CleanCap (TriLink), was used to cap the in vitro transcribed mRNAs.

AAV production

AAV vectors were either produced by the Viral Vector Facility of the Neuroscience Center Zurich or in-house. Briefly, AAV vectors were ultracentrifuged and diafiltered.

To generate Pseudotyped AAV9 vectors (AAV2/9), packaging, capsid, and helper plasmids (Addgene #112865 and #112867) were co-transfected in HEK293T cells and incubated for six days until harvest. The vectors were then precipitated using PEG and NaCl and subjected to gradient centrifugation with OptiPrep (Sigma-Aldrich) for further purification, following the previously described method. Subsequently, the concentrated vectors were obtained using Vivaspin® 20 centrifugal concentrators (VWR). Physical titres (vector genomes per milliliter, vg/mL) were determined using a Qubit 3.0 fluorometer (Thermo Fisher Scientific). AAV2/9 viruses were stored at -80°C until they were used. If required, they were diluted using phosphate-buffered saline (PBS) from Thermo Fisher Scientific.

mRNA production and LNP encapsulation

mRNA production and LNP encapsulation were performed as previously described [46]. Briefly, coding sequences of base editors were cloned into an mRNA production plasmid, using HiFi DNA Assembly Master Mix (NEB). mRNAs were transcribed to contain 101 nucleotide-long poly(A) tails. m1Ψ-5′-triphosphate (TriLink) instead of UTP was used to generate modified nucleoside-containing mRNA. Capping of the in vitro transcribed mRNAs was performed co-transcriptionally using the trinucleotide cap1 analog, CleanCap (TriLink). mRNA was purified by cellulose (Sigma-Aldrich) purification as described [47]. All mRNAs were analysed by agarose gel electrophoresis and were stored frozen at − 20°C. The purified mRNAs were encapsulated in LNP, previously described in [28], and stored at -80°C until they were injected into mice.

Animal studies

Animal experiments were performed in accordance with protocols approved by the Kantonales Veterinäramt Zürich (license number ZH159-20) and in compliance with all relevant ethical regulations. C57BL/6 J mice were housed in a pathogen-free animal facility at the Institute of Pharmacology and Toxicology of the University of Zurich. Mice were kept in a temperature- and humidity-controlled room on a 12-h light/dark cycle. Mice were fed a standard laboratory chow (Kliba Nafag no. 3437 with 18.5% crude protein).

Unless otherwise noted, new-born animals (P1) received 1.2 × 10¹¹ (AAV; 30 µL in total) AAV vector genomes per animal and construct or full dose (Lentivirus; 30 µL in total) of lentivirus via the temporal vein. Adult mice were injected with 3 mg/kg of total RNA (LNP) or 1 × 10¹² AAV vector genomes per animal via tail vein at 5–6 weeks of age, with total injection volumes of 120 µL. The average weights of neonatal (1 day) and adult mice (5 weeks) were 1.5 and 20 g, respectively. In case of delivering the library via lentivirus, mice were euthanized 6–8 weeks after injection or further injected with LNP or AAV. Adult mice were euthanized 1 week (LNP) or 6 weeks (AAV) after injection, if not stated otherwise. In case of delivering the library via AAV and SB, mice were euthanized 6 or 12 weeks after injection.

Primary hepatocyte isolation

Primary hepatocytes were isolated as previously described11. In short, mice were euthanized and immediately perfused with Hank’s Buffer (Hank’s balanced salt solution (Thermo Fisher Scientific, 0.5 mM EDTA) via inferior vena cava. Mice were further perfused with digestion medium (low-glucose DMEM plus 1 × penicillin–streptomycin (Thermo Fisher Scientific), 15 mM HEPES and freshly added Liberase (Roche)) before isolated livers were gently dissociated in isolation medium (low-glucose DMEM supplemented with 10% (vol/vol) FBS plus 1 × penicillin–streptomycin (Thermo Fisher Scientific) and GlutaMax (Thermo Fisher Scientific)). Isolated Hepatocytes were filtered (100 µm filter), washed with isolation media/ PBS and further pelleted for DNA isolation.

Library preparation for targeted amplicon sequencing of DNA

Next-generation sequencing (NGS) preparation of genomic DNA was performed as previously described [34]. Briefly, the library was amplified from genomic DNA by a first PCR using primers containing Illumina forward and reverse adaptor sequences (See Supplementary Note for oligonucleotides used in this study). PCR was optimized for high genomic DNA input using NEBNext® UltraTM II Q5 polymerase (NEB) and a coverage of 200-1000x, depending on screening method and replicate. PCR for each replicate were pooled and gel purified, before barcodes with primer containing unique sets of pe/p7 Illumina barcodes were added in a second PCR, using Q5 High-Fidelity DNA Polymerase (NEB). PCR were pooled and cleaned through gel purification before quantification on the Qubit 4 (Invitrogen). Pooled sgRNA screens were sequenced on a NovaSeq 6000 (Illumina, 300 cycles, paired-end). Amplicon sequences were analysed using custom Python scripts.

Supplementary Information

Additional file 1. Contains supplementary figures (Fig. S1 – Fig. S8) and additional information to BEDICT2.0 and SHAP analysis. Additional file 2. Contains analysis of linear correlation (Table S1-S5). Additional file 3. Contains Spearman and Pearson correlations for prediction accuracy of ML prediction models on various datasets (Table S1). Additional file 4. Contains additional informations to primers, oligonucleotide pool, ABE sequences and editing efficiency per position for all relevant datasets of this study.

Using deep learning to predict how well adenine base editing works in different types of cells

Abstract

Key numbers