What this is
- screening was used to explore functions in human primary , key cells in the brain.
- The study identified over 150 -gene interactions that regulate genes linked to Alzheimer's disease.
- AstroREG, a new resource, was developed to predict -gene interactions, enhancing understanding of gene regulation in .
Essence
- This research utilized screening to identify functional in human primary , revealing critical regulatory interactions with genes associated with Alzheimer's disease.
Key takeaways
- screening tested 979 candidate , leading to the discovery of 158 functional -gene pairs, highlighting the regulatory complexity in .
- The study found that 76% of regulatory interactions occurred within 200 kb, with frequently regulating distal genes rather than the nearest ones.
- AstroREG was created to predict -gene interactions, achieving a precision of 0.8 and significantly expanding the catalog of regulatory interactions in .
Caveats
- The study's findings may not generalize to other cell types, as the screen was focused solely on .
- While the AstroREG resource provides valuable predictions, its accuracy is not sufficient to replace experimental validation across the genome.
Definitions
- CRISPRi: A technique that uses CRISPR technology to inhibit gene expression without altering the DNA sequence.
- Enhancer: A regulatory DNA sequence that can increase the likelihood of transcription of a particular gene.
- Astrocyte: A type of glial cell in the brain that supports neurons and modulates synaptic function.
Simplified
Main
CRISPR screening with gene expression readout enables the examination of enhancer function within its native genomic locus1–5. Large-scale screens using this technology to study hundreds to thousands of enhancers have been performed primarily in K562 cells1–3,6, while smaller-scale experiments have been conducted in T cells7, induced pluripotent stem cells (iPSC)-derived neurons8 and as a massively parallel reporter assay (MPRA) validation approach9,10.
Here we set out to uncover transcriptional regulatory circuitry in human primary astrocytes, the most abundant type of glia in the human brain, having key roles in modulating neuronal function and plasticity11,12.
![Click to view full size CRISPRi screen of intergenic enhancers in astrocytes. , Study overview. The selection pipeline (see Results for details) led to a set of 979 candidate enhancer regions. Candidate regions were tested in a CRISPRi screen with scRNA-seq readout, which identified 158 functional enhancer–gene interactions. Functional enhancers and their target genes were extensively characterized and used to construct an enhancer–gene regulatory network controlling astrocyte-specific gene expression. Furthermore, the CRISPRi results were used to train an RF model, which was then applied to predict enhancer–gene regulatory interactions genome wide. The AstroREG resource includes the experimentally tested and predicted enhancer–gene interactions. Panelwas created with BioRender.com., PCA plot of RNA-seq data from NHAs and brain samples immunopanned with cell-type-specific antibodies from ref.. All samples in this study were from healthy donors except peritumor astrocytes and sclerotic astrocytes; the cortex sample did not undergo immunopanning. The PCA plot was generated based on the intersample Spearman correlation matrix., Top, barplot of cell type and developmental stage annotation of NHA CRISPRi screen scRNA-seq data (= 47,577 cells). Reference-based annotation of single cells was performed using SingleR, with reference snRNA-seq from the human brain maturation atlas(). Cells annotated as not assigned did not reach the minimal identity threshold for any cell type. Bottom, UMAP plots of CRISPRi screen scRNA-seq data with cells colored by SingleR cell-type annotation (left), developmental stage annotation (middle) and cell-cycle stage annotation (right); the latter was obtained using the CellCycle Scoring Seurat function. The identity of C1 is characterized in Supplementary Note., Overlap of candidate peaks with ENCODE-annotatedgenomic regions. Distal enhancers,= 4 × 10; proximal enhancers,= 4 × 10; promoters,= 2 × 10. Two-sided Fisher’s exact test., Upset plot displaying the overlap between the selected candidate enhancer set and four datasets used in the selection pipeline—two ATAC–seq datasets of NeuNcells from adult human brain, a dataset of iPSC-derived astrocytesand a dataset of astrocytes immunopanned with anti-HepaCAM or FACS-sorted with anti-GFAP from hCS maintained for up to 595 days in culture., Heatmap of candidate enhancer chromatin accessibility based on pseudobulked snATAC–seq data across cell types and developmental stages from ref.. Rows display enhancers and are sorted by the enhancer’s module membership in the co-accessibility network (). Cell types—astrocytes (astro), excitatory neurons (exc), inhibitory neurons (inh), microglia (micro) and oligodendrocytes and OPCs (oligo). Developmental stages—fetal (gestational age of 22–24 weeks), neonatal (28 days), infancy (86 days–1 year), childhood (1.5–8 years), adolescence (10–16 years) and adulthood (20–40 years)., Co-accessibility network module significance. Cell type and stage—a linear model was used to assess the variation of module eigengene values across all cell types and developmental stages.values were extracted using ANOVA. Astrocyte—a two-sided Wilcoxon rank-sum test was performed for a two-group comparison of module eigengene values between astrocytes and all other cell types. Astrocyte stage—a Spearman correlation coefficient was used to assess the effect of developmental stage within the astrocyte samples. PCA, principal component analysis; OPCs, oligodendrocyte precursor cells; hCS, human cortical spheroids; C0, cluster 0; C1, cluster 1; ANOVA, analysis of variance. a a b c d e f g [16] [17] [23] −259 −44 −27 − , [18] [19] [21] [20] [17] n P P P P Methods 1 Methods Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig1_HTML.jpg.jpg)
CRISPRi screen of intergenic enhancers in astrocytes. , Study overview. The selection pipeline (see Results for details) led to a set of 979 candidate enhancer regions. Candidate regions were tested in a CRISPRi screen with scRNA-seq readout, which identified 158 functional enhancer–gene interactions. Functional enhancers and their target genes were extensively characterized and used to construct an enhancer–gene regulatory network controlling astrocyte-specific gene expression. Furthermore, the CRISPRi results were used to train an RF model, which was then applied to predict enhancer–gene regulatory interactions genome wide. The AstroREG resource includes the experimentally tested and predicted enhancer–gene interactions. Panelwas created with BioRender.com., PCA plot of RNA-seq data from NHAs and brain samples immunopanned with cell-type-specific antibodies from ref.. All samples in this study were from healthy donors except peritumor astrocytes and sclerotic astrocytes; the cortex sample did not undergo immunopanning. The PCA plot was generated based on the intersample Spearman correlation matrix., Top, barplot of cell type and developmental stage annotation of NHA CRISPRi screen scRNA-seq data (= 47,577 cells). Reference-based annotation of single cells was performed using SingleR, with reference snRNA-seq from the human brain maturation atlas(). Cells annotated as not assigned did not reach the minimal identity threshold for any cell type. Bottom, UMAP plots of CRISPRi screen scRNA-seq data with cells colored by SingleR cell-type annotation (left), developmental stage annotation (middle) and cell-cycle stage annotation (right); the latter was obtained using the CellCycle Scoring Seurat function. The identity of C1 is characterized in Supplementary Note., Overlap of candidate peaks with ENCODE-annotatedgenomic regions. Distal enhancers,= 4 × 10; proximal enhancers,= 4 × 10; promoters,= 2 × 10. Two-sided Fisher’s exact test., Upset plot displaying the overlap between the selected candidate enhancer set and four datasets used in the selection pipeline—two ATAC–seq datasets of NeuNcells from adult human brain, a dataset of iPSC-derived astrocytesand a dataset of astrocytes immunopanned with anti-HepaCAM or FACS-sorted with anti-GFAP from hCS maintained for up to 595 days in culture., Heatmap of candidate enhancer chromatin accessibility based on pseudobulked snATAC–seq data across cell types and developmental stages from ref.. Rows display enhancers and are sorted by the enhancer’s module membership in the co-accessibility network (). Cell types—astrocytes (astro), excitatory neurons (exc), inhibitory neurons (inh), microglia (micro) and oligodendrocytes and OPCs (oligo). Developmental stages—fetal (gestational age of 22–24 weeks), neonatal (28 days), infancy (86 days–1 year), childhood (1.5–8 years), adolescence (10–16 years) and adulthood (20–40 years)., Co-accessibility network module significance. Cell type and stage—a linear model was used to assess the variation of module eigengene values across all cell types and developmental stages.values were extracted using ANOVA. Astrocyte—a two-sided Wilcoxon rank-sum test was performed for a two-group comparison of module eigengene values between astrocytes and all other cell types. Astrocyte stage—a Spearman correlation coefficient was used to assess the effect of developmental stage within the astrocyte samples. PCA, principal component analysis; OPCs, oligodendrocyte precursor cells; hCS, human cortical spheroids; C0, cluster 0; C1, cluster 1; ANOVA, analysis of variance. a a b c d e f g [16] [17] [23] −259 −44 −27 − , [18] [19] [21] [20] [17] n P P P P Methods 1 Methods Source data
Results
Selection and characterization of candidate enhancers in astrocytes
To identify functional enhancers in human primary astrocytes, we used normal human astrocytes (NHAs) from fetal brain (Methods). The transcriptomes of NHAs with or without dCas9–KRAB integration (Methods) were highly correlated (r = 0.95, ρ = 0.9; Extended Data Fig. 1), and clustered with fetal astrocytes (ref. 16; Fig.1b). We further confirmed that the developmental stage of NHAs was preserved at transcriptome and chromatin state level—99.7% of dCas9–KRAB–NHA cells were annotated as fetal astrocytes based on single-cell RNA-seq (scRNA-seq) with a human brain reference dataset17 (Fig.1c), and ATAC–seq data from dCas9–KRAB–NHAs clustered with snATAC–seq from fetal astrocytes17 (Extended Data Fig. 1). For the rest of the article, we use ‘NHAs’ to refer to dCas9–KRAB–NHAs.
Candidate regions were selected from intergenic PsychENCODE13 candidate enhancers (>2 kb from any gene) and required to be in an open chromatin state in NHAs and in at least two other astrocytes or NeuN− datasets18–21. In addition, we required candidate enhancers to reside in a topologically associated domain (TAD)22 containing at least one gene highly expressed in NHAs (Methods). This yielded 979 intergenic candidate enhancers (Fig. 1a, Supplementary Table 1 and Extended Data Fig. 2), none of which overlapped with copy number variations identified in NHAs (Supplementary Table 1).
The candidate regions were enriched for distal enhancers23 (odds ratio (OR) = 43.8, P = 4 × 10−259, Fisher’s exact test), depleted of promoters (OR = 0.02, P = 2 × 10−27, Fisher’s exact test) and proximal enhancers (OR = 0.05, P = 4 × 10−44, Fisher’s exact test; Fig.1d). Among the four ATAC–seq datasets used for candidate selection18–21, the chromatin state of candidate regions best resembled that of brain organoid astrocytes (93% overlap; Fig.1e) consistent with NHA’s fetal origin. Most candidate regions (61%, n = 602) were accessible in both fetal-derived and adult-derived astrocytes (Fig.1e).
To further characterize the regulatory landscape captured by candidate enhancers, we constructed a co-accessibility network using snATAC–seq from the human brain across developmental stages17 (Methods). Candidate enhancers clustered into six modules, and 86% of enhancers belonged to modules enriched in astrocytes or more broadly glial cells (Fig.1f,g and Supplementary Table 1). Five modules showed developmental stage-specific chromatin state (analysis of variance P < 0.05; Fig.1f,g), and one captured developmental variation within astrocytes (P = 0.03; Fig.1f,g). Bulk brain-derived ATAC–seq and H3K27ac from ref. 24 revealed cell-type-specific profiles consistent with those observed in snATAC–seq data, with module M3 showing the strongest astrocyte-specific signal, while H3K4me3 showed low coverage in astrocytes, as expected (Extended Data Fig. 3).
Overall, our results demonstrate that NHAs retain a fetal astrocyte identity, and that the candidate enhancer set captures astrocyte-specific regions, as well as regions shared across brain cell types and developmental stages. Although our selection prioritized broadly relevant candidate enhancers, that is, in open state across fetal and adult astrocytes, the candidate set also includes regions with developmentally regulated chromatin states.
CRISPRi screening of candidate enhancers in astrocytes
We constructed an sgRNA library using the CROP-seq-opti-GFP backbone containing 2,478 sgRNAs targeting the 979 candidate enhancers, 20 positive and 50 negative control sgRNAs (Supplementary Table 2), as well as separate positive and negative control libraries (n = 250 sgRNAs each; Methods). Candidate enhancers varied in length between 87 and 1,136 bp with a median of 298 bp (Extended Data Fig. 3), and were tiled with 1 sgRNA per 200 bp and ≥2 sgRNAs per enhancer (Extended Data Fig. 3).
sgRNAs were transduced into NHAs at a low multiplicity of infection1 (MOI) of 5, to preserve viability. scRNA-seq (Fig.1a,c) was performed 7 days post-transduction (12.5B reads, 47,577 high-quality cells, median depth of 52,472 unique molecular identifiers (UMIs) per cell; Methods). Each enhancer was targeted in an average of 327 cells (range = 65–1,301), and a median of 7.4 guides was detected per cell (Extended Data Fig. 4).
For each enhancer, differential expression (DE) analyses compared the targeted and nontargeted cells, using SCEPTRE25, assessing all expressed genes within 500 kb from the enhancer midpoint (Methods; Extended Data Fig. 4). Significant effects were defined using an empirical false discovery rate (FDR) of <0.1 as previously described1 (Methods). Robust dCas9–KRAB silencing activity was confirmed by significant downregulation of 122 of 125 positive control genes (Supplementary Table 3), while a well-calibrated statistical approach without P-value inflation was confirmed based on the negative control library (Extended Data Fig. 4).
To identify well-powered nonfunctional candidate regions (that is, regions showing no effect in the CRISPRi screen despite sufficient statistical power), we used a simulation-based method14, and identified 605 regions with >80% power to detect a 15% expression change for at least one gene (Methods; Supplementary Table 3). This set was used for downstream comparisons of functional and nonfunctional candidate regions.
Functional enhancers were enriched for prior CRISPRi/MPRA hits in K562↗ (ref. 67) (P = 2.2 × 10−3, OR = 2.71, Fisher’s exact test), and for predicted NHA superenhancers26 (P = 1.0 × 10−6, OR = 4.0, Fisher’s exact test; Fig.2c). Functional enhancer–gene pairs (EGPs) were depleted of interactions crossing TAD boundaries (OR = 0.28, P = 4.0 × 10−6, Fisher’s exact test; Fig.2c). Furthermore, enhancer-regulated genes significantly overlapped genes previously predicted to contribute to the core regulatory circuitry in astrocytes27 (P = 2.7 × 10−6, OR = 4.40, Fisher’s exact test).
To validate the results with an independent method, we individually silenced 19 functional enhancers (corresponding to 20 EGPs), prioritizing those that controlled genes dysregulated in AD (Fig. 5b) and spanned a range of effect sizes. Eighteen of the twenty EGPs (90%) showed significant effects (FDR < 0.05) on the target gene expression measured with NanoString nCounter (Methods; Supplementary Table 3). All 20 EGPs showed effects in the same direction as in the single-cell screen, with high correlation between effect sizes measured with the two methods (r = 0.86; Fig.2d). Seven EGPs were further tested through qRT–PCR in an independent NHA line of a different sex (Methods), confirming six of seven (Extended Data Fig. 4). To assess the effect of enhancer activity on protein expression, we performed a western blot for FTH1 after individually silencing Enh210, which regulates FTH1 (Methods). This led to a 24% decrease in FTH1 protein level (Extended Data Fig. 4).
These findings demonstrated that the identified enhancer–gene interactions aligned with known enhancer properties and were well-validated across methods and cell lines.
![Click to view full size CRISPRi screening identifies functional regulatory elements in astrocytes. , Volcano plot of log(FC) (axis) versus empiricalvalue (axis) for 7,759 EGPs tested. Each dot represents an EGP. Orange, significant pairs (empirical FDR < 0.1); blue, nonsignificant pairs., Histograms of the number of regulated genes per functional enhancer (= 145; left) and number of functional enhancers per regulated gene (= 116; right)., Percentage of EGPs overlapping experimentally validated enhancers in K562 cells (data from ref.), superenhancers (NHA data from ref.) and crossing at least one TAD boundary (Hi-C data from iPSC-derived astrocyte data from ref.). Statistics were calculated using a two-sided Fisher’s exact test (for superenhancers and K562 enhancers,= 145 and 605 functional enhancers and inactive candidates, respectively; for EGPs crossing TAD boundaries,= 158 and 2,168 functional and inactive EGPs, respectively)., Scatterplot of FCs in the scRNA-seq CRISPRi screen (axis) and CRISPRi silencing of individual enhancers with NanoString readout of the target gene expression (axis). a b c d 2 x P y n n n n x y [67] [26] [22] Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig2_HTML.jpg.jpg)
CRISPRi screening identifies functional regulatory elements in astrocytes. , Volcano plot of log(FC) (axis) versus empiricalvalue (axis) for 7,759 EGPs tested. Each dot represents an EGP. Orange, significant pairs (empirical FDR < 0.1); blue, nonsignificant pairs., Histograms of the number of regulated genes per functional enhancer (= 145; left) and number of functional enhancers per regulated gene (= 116; right)., Percentage of EGPs overlapping experimentally validated enhancers in K562 cells (data from ref.), superenhancers (NHA data from ref.) and crossing at least one TAD boundary (Hi-C data from iPSC-derived astrocyte data from ref.). Statistics were calculated using a two-sided Fisher’s exact test (for superenhancers and K562 enhancers,= 145 and 605 functional enhancers and inactive candidates, respectively; for EGPs crossing TAD boundaries,= 158 and 2,168 functional and inactive EGPs, respectively)., Scatterplot of FCs in the scRNA-seq CRISPRi screen (axis) and CRISPRi silencing of individual enhancers with NanoString readout of the target gene expression (axis). a b c d 2 x P y n n n n x y [67] [26] [22] Source data
Functional enhancers often regulate distal genes, are transcribed and show cell-type-specific and developmental-stage-specific chromatin state
Although enhancers were significantly more likely to regulate the nearest gene than expected by chance (P < 2.2 × 10−16, OR = 11.7, Fisher’s exact test), only 38.6% EGPs involved the nearest gene. An additional 15.8% involved in a more distal gene with no intervening gene, 29.1% skipped a highly expressed gene to regulate a more distal gene and 16.4% skipped only lowly or nonexpressed genes (Fig.3b). When not targeting their nearest gene, enhancers bypassed up to 34 genes (mean = 7.4).
To assess enhancer RNA transcription in astrocytes, we applied transient-transcriptome sequencing (TT-seq29) to NHA cells (Extended Data Fig. 5 and Supplementary Table 1; Methods). Functional enhancers were strongly associated with transcription (P = 1.3 × 10−11, OR = 3.7, Fisher’s exact test), with 73.1% being transcribed (Fig. 3c,d). Among transcribed enhancers, functional enhancers were more likely to be bidirectional (P = 0.01, OR = 1.8, Fisher’s exact test; Fig. 3d), and had higher transcription levels (P = 0.045, Wilcoxon test; Fig. 3e).
Using snATAC–seq data across cell types and developmental stages (Fig.1f,g), we found that functional enhancers showed higher module membership (kME values) than inactive candidates for the astrocyte-enriched module M2 (P = 0.04, Wilcoxon test). Moreover, 48% of functional enhancers (n = 70), considered individually, showed significant variation in chromatin state across cell types, 4% (n = 7) across developmental stages and 34% (n = 50) across both cell type and developmental stages (Fig. 3f,g; analysis of variance adjusted P < 0.05).
When comparing H3K27ac coverage in astrocytes24 between functional and nonfunctional candidate regions, there was no significant difference (P = 0.21, t test), indicating that, among open chromatin regions, higher H3K27ac coverage values are not predictive of enhancer function. Furthermore, when considering functional enhancers only, there was no significant difference in H3K27ac between astrocytes and other cell types (P = 0.66, t test). However, H3K27ac at functional enhancers belonging to the astrocyte-specific co-accessibility module M3 was significantly enriched in astrocytes compared to other cell types (P = 3.6 × 10−7, t test). The promoter mark H3K4me3 (ref. 24) at functional enhancers was significantly depleted in astrocytes compared to other cell types, as expected (P = 0.005, t test).
Together, these data add several critical layers to the AstroREG resource, including eRNA expression data and characterization of chromatin state across cell types and developmental stages, and underscore the limitation of noncoding variant annotation to the nearest gene.
![Click to view full size Genomic and transcriptional properties of functional enhancers. , Density plot of the distance between enhancer midpoint and the TSS of the tested gene. Dotted vertical lines represent median for each category., EGP categories based on the presence or absence of intervening genes between the enhancer and its target gene. Left, schematic representation of EGP categories, where black arcs link the enhancer to its target gene. Right, barplot of the percent of inactive and functional EGPs within each category, with colors corresponding to the left panel. Panelwas created with BioRender.com., UCSC genome browsertracks displaying TT-seq and ATAC–seq data, as well as fine-mapped GTEx eQTLsfor three enhancers regulating LGALS3., Percent of inactive candidate enhancers and functional enhancers (= 605 and 145, respectively) that are transcribed., Distribution of TT-seq read counts for transcribed functional enhancers and candidate peaks (= 106 and 318, respectively). For the outer violin, width corresponds to the density of points. For the inner boxplot, the top, middle and bottom horizontal lines correspond to the 25th, 50th and 75th quantiles of the data, while whiskers extend the range of the data up to a maximum of 1.5× the interquartile range; no outliers beyond this range were plotted individually.values = two-sided Wilcoxon rank-sum test., Chromatin accessibility of functional enhancers across cell types and developmental stages. The heatmap displays pseudobulked snATAC–seq data from ref.. Rows—enhancers. Row side colors display adjusted ANOVAvalues for cell-type (p.CT) and developmental stage (p.Stage). Cell types—astrocytes (astro), excitatory neurons (exc), inhibitory neurons (inh), microglia (micro) and oligodendrocytes and OPCs (oligo). Developmental stages—fetal (gestational age of 22–24 weeks), neonatal (28 days), infancy (86 days–1 year), childhood (1.5–8 years), adolescence (10–16 years) and adulthood (20–40 years)., Barplot of the number of functional enhancers that show significant variation (adjusted ANOVA< 0.05) across cell types, developmental stage (stage), both CT and stage, or neither. a b b c d e f g [68] [50] [17] n n P P P Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig3_HTML.jpg.jpg)
Genomic and transcriptional properties of functional enhancers. , Density plot of the distance between enhancer midpoint and the TSS of the tested gene. Dotted vertical lines represent median for each category., EGP categories based on the presence or absence of intervening genes between the enhancer and its target gene. Left, schematic representation of EGP categories, where black arcs link the enhancer to its target gene. Right, barplot of the percent of inactive and functional EGPs within each category, with colors corresponding to the left panel. Panelwas created with BioRender.com., UCSC genome browsertracks displaying TT-seq and ATAC–seq data, as well as fine-mapped GTEx eQTLsfor three enhancers regulating LGALS3., Percent of inactive candidate enhancers and functional enhancers (= 605 and 145, respectively) that are transcribed., Distribution of TT-seq read counts for transcribed functional enhancers and candidate peaks (= 106 and 318, respectively). For the outer violin, width corresponds to the density of points. For the inner boxplot, the top, middle and bottom horizontal lines correspond to the 25th, 50th and 75th quantiles of the data, while whiskers extend the range of the data up to a maximum of 1.5× the interquartile range; no outliers beyond this range were plotted individually.values = two-sided Wilcoxon rank-sum test., Chromatin accessibility of functional enhancers across cell types and developmental stages. The heatmap displays pseudobulked snATAC–seq data from ref.. Rows—enhancers. Row side colors display adjusted ANOVAvalues for cell-type (p.CT) and developmental stage (p.Stage). Cell types—astrocytes (astro), excitatory neurons (exc), inhibitory neurons (inh), microglia (micro) and oligodendrocytes and OPCs (oligo). Developmental stages—fetal (gestational age of 22–24 weeks), neonatal (28 days), infancy (86 days–1 year), childhood (1.5–8 years), adolescence (10–16 years) and adulthood (20–40 years)., Barplot of the number of functional enhancers that show significant variation (adjusted ANOVA< 0.05) across cell types, developmental stage (stage), both CT and stage, or neither. a b b c d e f g [68] [50] [17] n n P P P Source data
Uncovering regulatory circuitry controlling gene expression in astrocytes
To investigate astrocyte-specific regulatory circuits, we performed TF footprinting using TOBIAS37 on the deeply sequenced NHA ATAC–seq data (>800 million reads). Footprinting identifies TF motifs overlapping ATAC–seq footprints, regions of depleted signal within open chromatin indicative of protein binding. Motifs are classified as ‘bound’ or ‘unbound’ based on the footprint overlap, although overlapping TF motifs cannot be resolved to a specific TF. Despite this limitation, ATAC–seq footprinting enables genome-wide detection of candidate TF binding events. All ‘bound’ TF calls for screened regions are listed in Supplementary Table 5.
Functional enhancers were significantly more likely to be bound by TFs expressed in astrocytes and showed a stronger footprinting signal (Fig. 4c). A total of 83 TFs showed enriched binding to functional enhancers compared to nonfunctional enhancers (FDR < 0.05, Fisher’s exact test; Extended Data Fig. 7 and Supplementary Table 5), including 13 with particularly strong enrichment (OR > 3), of which FOXK1 is involved in astrocyte activation34 while RUNX2 mediates astrocyte maturation38.
We then constructed a distal regulatory network by integrating TF footprinting, CRISPRi screening and human brain snRNA-seq and snATAC–seq17. We defined astrocyte-specific enhancers using snATAC–seq, and astrocyte-specific genes and TFs using snRNA-seq from the same samples17 (Methods). Functional EGPs were more likely to be coregulated across cell types (P = 0.01, OR = 1.64, Fisher’s exact test), and called astrocyte-specific (P = 4.8 × 10−9, OR = 3.87, Fisher’s exact test). The regulatory network consisted of (1) 18 astrocyte-specific genes regulated by 19 astrocyte-specific enhancers (Fig.4d,e) and (2) 36 astrocyte-specific TFs belonging to 20 motif clusters from the JASPAR database39, which bind at least one of the 19 astrocyte-specific enhancers (Methods; Fig.4e, Extended Data Fig. 7 and Supplementary Table 5). Our data allowed to construct network edges that are experimentally supported by CRISPRi screen data (enhancer–gene edges), and based on ATAC–seq footprinting for TF–enhancer edges (Fig.4e).
To determine TF–enhancer interactions supported by chromatin immunoprecipitation followed by sequencing (ChIP–seq) data from glial cells, we integrated ChIP–seq data from the following two complementary resources: ref. 40 and the ChIP–Atlas41 (Methods). The data in ref. 40 include TF–ChIP–seq from bulk brain tissue and sorted nuclei (neurons (NeuN+), oligodendrocytes (Olig2+) and astrocytes/microglia (NeuN−/Olig2−)) for the following six TFs in our network: FOS, ZBTB7B, MEIS2, NFIB, REST and SOX9. The ChIP–Atlas provides additional glia-relevant data, including TP53 in iPSC-derived astrocytes, TEAD1 in glioblastoma (GBM) cells and JUN in RPE-1 cells (a glial-like retinal cell type of neuro-ectodermal origin).
Across 33 predicted TF–enhancer pairs for which ChIP–seq data were available, 24 (72%) were supported by binding of the relevant TF to the predicted enhancer in at least one dataset (Fig. 4e, Extended Data Fig. 7 and Supplementary Table 5). Notably, all TF–enhancer overlaps identified in ref. 40 were detected in the astrocyte-enriched fraction (NeuN−/Olig2−) or bulk tissue, with none observed exclusively in neurons or oligodendrocytes—providing further support for the astrocyte-specific nature of this regulatory network.
The AP1 complex components FOS and JUN are known primarily as immediate early genes, upregulated in response to neuronal activity and have a role in cellular proliferation in astrocytes42. Our data demonstrate that, in a resting state, astrocytes show the highest expression of JUN and FOS among brain cell types, and these TFs bind to enhancers that regulate astrocyte-specific genes (Fig.4e, Extended Data Fig. 7 and Supplementary Table 5). Notably, 11 of 12 FOS/JUN–enhancer interactions detected by ATAC–seq footprinting were confirmed by TF–ChIP, with 7 in both datasets (ref. 40 and ChIP–Atlas).
Overall, these results uncover a regulatory network involved in astrocyte-specific gene expression, highlighting key transcription factors (TFs) and distal enhancers that contribute to the regulatory landscape in astrocytes.
![Click to view full size Regulatory circuitry controlling gene expression in astrocytes. , Gene Ontology terms and functional gene sets over-represented among enhancer-regulated genes (seeandfor the description of functional gene sets)., Visualization of enhancer-regulated genes belonging to functional gene sets. Gene nodes are linked to functional terms they belong to. Plots were generated using Cytoscape with a prefuse force-directed layout adjusted for best display. Node size represents node degree. Edge length is arbitrary. Edge color displays whether the gene is upregulated or downregulated with astrocyte activation and aging, noting that astrocyte-specific genes are all upregulated in astrocytes relative to other cell types., TF footprinting of ATAC–seq data from NHA. Top, ATAC–seq signal at bound and unbound TF motifs. Bottom left, number of bound and unbound TF motifs per enhancer. Bottom right, fraction of bound TF motifs per enhancer. Orange = functional enhancers (= 145), blue = inactive candidate enhancers (= 645). Dark dots display mean values for each group.values were calculated via two-sided Wilcoxon rank-sum test., Heatmap of astrocyte-specific gene expression and astrocyte-specific enhancer open chromatin state. Heatmaps display pseudobulked snRNA-seq data for genes and snATAC–seq data for enhancers (data from ref.). Expression and ATAC–seq coverage values are logtransformed with a 0.1 offset and scaled across rows. Astrocyte-enriched genes and enhancers were identified using a linear model controlling for developmental stage, at FDR < 0.05. Asterisks mark genes dysregulated in AD (Fig.)., Regulatory network displaying the interactions between astrocyte-specific enhancers and genes listed shown in, and astrocyte-specific TFs shown in Extended Data Fig.. Network plot was generated through Cytoscapeusing a prefuse force-directed layout adjusted for best network display. Node size represents the node degree. Edge length is arbitrary. TFs with similar motifs are grouped into TF clusters (JASPAR;) with individual TF names separated by ‘/’. Red edges, TF–enhancer interactions supported by TF–ChIP data./**,///; adolesc, adolescence; rsp., response; ext., external; stim, stimulation. a b c d e d Results Methods 5b 7 Methods n n P FOS JUN FOS JDP2 JUN JUNB [17] [69] 2 Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig4_HTML.jpg.jpg)
Regulatory circuitry controlling gene expression in astrocytes. , Gene Ontology terms and functional gene sets over-represented among enhancer-regulated genes (seeandfor the description of functional gene sets)., Visualization of enhancer-regulated genes belonging to functional gene sets. Gene nodes are linked to functional terms they belong to. Plots were generated using Cytoscape with a prefuse force-directed layout adjusted for best display. Node size represents node degree. Edge length is arbitrary. Edge color displays whether the gene is upregulated or downregulated with astrocyte activation and aging, noting that astrocyte-specific genes are all upregulated in astrocytes relative to other cell types., TF footprinting of ATAC–seq data from NHA. Top, ATAC–seq signal at bound and unbound TF motifs. Bottom left, number of bound and unbound TF motifs per enhancer. Bottom right, fraction of bound TF motifs per enhancer. Orange = functional enhancers (= 145), blue = inactive candidate enhancers (= 645). Dark dots display mean values for each group.values were calculated via two-sided Wilcoxon rank-sum test., Heatmap of astrocyte-specific gene expression and astrocyte-specific enhancer open chromatin state. Heatmaps display pseudobulked snRNA-seq data for genes and snATAC–seq data for enhancers (data from ref.). Expression and ATAC–seq coverage values are logtransformed with a 0.1 offset and scaled across rows. Astrocyte-enriched genes and enhancers were identified using a linear model controlling for developmental stage, at FDR < 0.05. Asterisks mark genes dysregulated in AD (Fig.)., Regulatory network displaying the interactions between astrocyte-specific enhancers and genes listed shown in, and astrocyte-specific TFs shown in Extended Data Fig.. Network plot was generated through Cytoscapeusing a prefuse force-directed layout adjusted for best network display. Node size represents the node degree. Edge length is arbitrary. TFs with similar motifs are grouped into TF clusters (JASPAR;) with individual TF names separated by ‘/’. Red edges, TF–enhancer interactions supported by TF–ChIP data./**,///; adolesc, adolescence; rsp., response; ext., external; stim, stimulation. a b c d e d Results Methods 5b 7 Methods n n P FOS JUN FOS JDP2 JUN JUNB [17] [69] 2 Source data
Identification of enhancers controlling genes involved in AD
Several enhancer-regulated genes were differentially expressed in AD cases versus controls, and had expression levels associated with neuropathological features and cognitive function (Fig. 5b). Among these, LGALS3 is an astrocyte-enriched gene (Fig. 4b,d) encoding Galectin-3, which modulates microglial activation during neurodegeneration48. LGALS3 also mediates the astrocyte-microglia crosstalk involved in the neuropathological effects of APOE4, the strongest genetic risk factor for AD49, and was downregulated in the brains of female AD donors carrying the APOE4 allele49. We identified three enhancers controlling the expression of LGALS3, two showing astrocyte-specific open chromatin state, thereby serving as potential targets for cell-type-specific manipulation of LGALS3 expression in the brain. In addition to LGALS3, we uncovered functional enhancers that show cell-type-specific chromatin state for 11 other genes dysregulated in AD astrocytes (Fig. 5b).
Individual enhancer silencing experiments with NanoString readout (summarized in Fig. 2d) confirmed the regulatory effects of 16 enhancers on genes dysregulated in AD, including all three enhancers controlling LGALS3 expression (Fig. 5c).
Collectively, these data reveal that enhancer–gene regulatory interactions identified through CRISPRi screening are relevant to diseases, particularly in the context of AD.
![Click to view full size Identification of enhancers that control genes dysregulated in AD. , Over-representation of enhancer-regulated genes (= 116) versus well-powered nonregulated genes (= 953) among disease-associated gene sets. Left, disease associations from DisGeNet. Right, genes differentially expressed in astrocytes between cases and controls (seeandfor details on each published dataset). Point size indicates the number of intersecting genes. Dotted vertical line,= 0.01, two-sided Fisher’s exact test., Network plots displaying genes dysregulated in AD based on data from the ROSMAP consortium(significance threshold, FDR < 0.05). AD_vs_controls—genes DE in two-group comparisons of AD versus controls. Cognitive—genes showing significant expression correlation with measures of cognitive function. Neuropathology—genes showing expression correlation with measures of neuropathology. Network plots were generated using Cytoscape, with a prefuse force-directed layout adjusted for best display; genes were further grouped by degree using a circular layout. Gene node color represents node degree. Edge length is arbitrary. Red labels—genes for which we identified enhancers with astrocyte-specific chromatin state (Fig.)., Effect of individual enhancer silencing on gene expression measured with NanoString digital RNA counting. The figure displays data for a subset of the enhancers shown in Fig., which control genes dysregulated in AD. Dotplots display NanoString normalized probe counts. Each plot displays the enhancers tested for a given gene. Individual enhancer silencing experiments using CRISPRi were performed in biological duplicates. The control group consists of nontargeted samples in the same batch (; Supplementary Table). Statistics were calculated using a linear model with multiple testing correction (). Horizontal line—median; vertical line extends from the 25th to 75th quantiles of the data. CNS, central nervous system; MS, multiple sclerosis; ASD, autism spectrum disorder. a b c n n P [43] [45] Results Supplementary Methods 4d 2c Methods 3 Methods Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig5_HTML.jpg.jpg)
Identification of enhancers that control genes dysregulated in AD. , Over-representation of enhancer-regulated genes (= 116) versus well-powered nonregulated genes (= 953) among disease-associated gene sets. Left, disease associations from DisGeNet. Right, genes differentially expressed in astrocytes between cases and controls (seeandfor details on each published dataset). Point size indicates the number of intersecting genes. Dotted vertical line,= 0.01, two-sided Fisher’s exact test., Network plots displaying genes dysregulated in AD based on data from the ROSMAP consortium(significance threshold, FDR < 0.05). AD_vs_controls—genes DE in two-group comparisons of AD versus controls. Cognitive—genes showing significant expression correlation with measures of cognitive function. Neuropathology—genes showing expression correlation with measures of neuropathology. Network plots were generated using Cytoscape, with a prefuse force-directed layout adjusted for best display; genes were further grouped by degree using a circular layout. Gene node color represents node degree. Edge length is arbitrary. Red labels—genes for which we identified enhancers with astrocyte-specific chromatin state (Fig.)., Effect of individual enhancer silencing on gene expression measured with NanoString digital RNA counting. The figure displays data for a subset of the enhancers shown in Fig., which control genes dysregulated in AD. Dotplots display NanoString normalized probe counts. Each plot displays the enhancers tested for a given gene. Individual enhancer silencing experiments using CRISPRi were performed in biological duplicates. The control group consists of nontargeted samples in the same batch (; Supplementary Table). Statistics were calculated using a linear model with multiple testing correction (). Horizontal line—median; vertical line extends from the 25th to 75th quantiles of the data. CNS, central nervous system; MS, multiple sclerosis; ASD, autism spectrum disorder. a b c n n P [43] [45] Results Supplementary Methods 4d 2c Methods 3 Methods Source data
Functional annotation of noncoding variants
Using all those significant eQTLs reported by each dataset, only 27 of the 158 regulatory interactions (17%) were captured by any of the eQTL datasets (Fig.6b and Extended Data Fig. 8). In addition, 43% had no significant eQTL reported, and, in 40% of cases, enhancers were linked to a different gene by eQTL than by CRISPRi screening in astrocytes. Using the more stringent fine-mapped eQTLs, 20 of the 158 regulatory interactions were captured (13%), 57% had no eQTL assignment and 30% were assigned to a different gene. Notably, eQTLs linking the enhancer to the same gene as identified in the CRISPRi screen data were more likely to be replicated across datasets (OR = 5.1, P = 2 × 10−4, Fisher’s exact test; Fig.6c). Moreover, astrocyte-specific eQTL calls obtained by deconvolution of bulk RNA-seq52 or scRNA-seq53 were markedly underpowered, capturing none of the 158 regulatory interactions (Extended Data Fig. 8).
For example, we identified the following two enhancers regulating ID3 (inhibitor of DNA binding, also known as inhibitor of differentiation): Enh53 (chr1: 23,554,524–23,555,279, 4.6-kb downstream) and Enh54 (chr1: 23,562,311–23,562,640, 3.0-kb upstream; Extended Data Fig. 8). ‘ID3’ encodes a transcriptional regulator highly expressed in neural progenitors, where it controls proliferation54, while in differentiated cells it is most highly expressed in astrocytes (Fig.4d). Enh53 and Enh54 overlap superenhancers55, were previously shown to control ‘ID3’ expression in human primary macrophages55 and we confirmed their regulatory effects in astrocytes by NanoString and qRT–PCR in an independent cell line (Fig. 5c and Extended Data Fig. 4). However, no brain eQTL dataset captured these interactions, possibly due to a combination of low ID3 expression, low astrocyte abundance and/or the lack of common variants within enhancers.
Together, these data demonstrate the power of CRISPRi screening of enhancers to provide a new functional annotation for noncoding regions in brain cell types.
To progress from region-level to nucleotide-level annotation, we performed in silico mutagenesis using a deep-learning framework (Beluga) in ref. 56, predicting the effect of sequence changes on chromatin state in astrocytes (Methods). We also obtained disease-impact scores (DISs)56 for all SNPs located within 1 kb of the tested enhancers, which estimates the probability of a variant being pathogenic56, and found that SNPs within 1 kb from functional enhancers were more likely to have a significant DIS than those in nonfunctional candidates (P = 0.009, OR = 1.16, Fisher’s exact test).
Using WGS data from the AD sequencing project (ADSP)57, we found that 57% of the functional enhancers contained AD-associated variants in at least one ADSP analysis (Methods). One such enhancer, Enh427 (chr17: 34,252,507–34,253,084), regulates CCL2 (Extended Data Fig. 4), a chemokine dysregulated in AD astrocytes and associated with AD neuropathology (Fig. 5b), which recruits microglia and monocytes to amyloid-β plaques, thereby amplifying the inflammatory response and contributing to neuronal damage58,59. The CCL2 enhancer contains three nominally AD-associated variants (Fig. 6d), and the variant with the most significant association with AD exhibited the most significant DIS (Fig. 6d), demonstrating the added value of variant-level prioritization by in silico mutagenesis. Supplementary Table 8 provides DIS data for all functional enhancers, and the in silico mutagenesis data are available at https://voineagulabunsw.github.io/astrocyte_crispri_resource/↗.
![Click to view full size CRISPRi screening provides new functional annotation for regulatory elements and sequence variants. , Distribution of the number of overlapping, nearby and LD-linked SNPs per enhancer (). SNP data from dbSNP153; nearby—within ±1 kb of the enhancer’s endpoints; LD—> 0.8 with any overlapping or nearby SNPs. Violin plot width is proportional to the density of points. Horizontal line = median., Number of functional EGPs identified in the CRISPRi screen captured by brain eQTL data. Significant—eQTLs with adjusted< 0.05 pooled across the following four datasets: GTEx, mmQTL, MetaBrainand Bryois. Fine-mapped—eQTL data pooled from GTEx and mmQTL CAVIAR output. Allvalues were obtained from their respective original publications., Barplot displaying the percentage (axis) of eQTLs reproduced across studies, which is called significant in ≥2 datasets of four brain resources (GTEx, mmQTL, Metabrain and Bryois). Same gene targeted—eQTLs linking SNPs to the same gene as the CRISPRi screen; different gene targeted—eQTLs linking SNPs to a different gene from the CRISPRi screen.value were calculated via a two-sided Fisher’s exact test., Top, UCSC genome browsertracks displaying ATAC–seq data at thelocus, as well as TT-seq data and the UCSC GTEx CAVIAR fine-mapped eQTL track for the region around Enh427. Please note that there are no significant eQTLs overlapping or in the vicinity of the enhancer. Bottom, the heatmap displays in silico mutagenesis data generated with DeepSEA (Beluga) for Enh427. Columns correspond to nucleotide positions, while rows represent chromatin state marks. The heatmap depicts the effect of mutating each nucleotide on the predicted value for each chromatin mark in astrocytes. Specifically, the values show the log(FC) in prediction between the reference and mutated sequences, taking the maximum of all three possible nucleotide substitutions at each position; the values arescore scaled across rows. DIS = −log(DISvalue), thus DIS > 1.3 corresponding tovalue < 0.05 is considered significant. ADvar—purple bars depict nucleotide positions with significantvalues in the ADSP consortium WGS data. AD WGS—the lowest associationvalue reported in the ADSP consortium WGS data at the corresponding nucleotide position (Supplementary Table). AD WGSvalues were downloaded fromLD, linkage disequilibrium. a b c d Methods 8 r P P x P CCL2 z e e P P P P 2 [50] [51] [52] [53] [68] [56] 2 10 https://dss.niagads.org/open-access-data-portal/ Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig6_HTML.jpg.jpg)
CRISPRi screening provides new functional annotation for regulatory elements and sequence variants. , Distribution of the number of overlapping, nearby and LD-linked SNPs per enhancer (). SNP data from dbSNP153; nearby—within ±1 kb of the enhancer’s endpoints; LD—> 0.8 with any overlapping or nearby SNPs. Violin plot width is proportional to the density of points. Horizontal line = median., Number of functional EGPs identified in the CRISPRi screen captured by brain eQTL data. Significant—eQTLs with adjusted< 0.05 pooled across the following four datasets: GTEx, mmQTL, MetaBrainand Bryois. Fine-mapped—eQTL data pooled from GTEx and mmQTL CAVIAR output. Allvalues were obtained from their respective original publications., Barplot displaying the percentage (axis) of eQTLs reproduced across studies, which is called significant in ≥2 datasets of four brain resources (GTEx, mmQTL, Metabrain and Bryois). Same gene targeted—eQTLs linking SNPs to the same gene as the CRISPRi screen; different gene targeted—eQTLs linking SNPs to a different gene from the CRISPRi screen.value were calculated via a two-sided Fisher’s exact test., Top, UCSC genome browsertracks displaying ATAC–seq data at thelocus, as well as TT-seq data and the UCSC GTEx CAVIAR fine-mapped eQTL track for the region around Enh427. Please note that there are no significant eQTLs overlapping or in the vicinity of the enhancer. Bottom, the heatmap displays in silico mutagenesis data generated with DeepSEA (Beluga) for Enh427. Columns correspond to nucleotide positions, while rows represent chromatin state marks. The heatmap depicts the effect of mutating each nucleotide on the predicted value for each chromatin mark in astrocytes. Specifically, the values show the log(FC) in prediction between the reference and mutated sequences, taking the maximum of all three possible nucleotide substitutions at each position; the values arescore scaled across rows. DIS = −log(DISvalue), thus DIS > 1.3 corresponding tovalue < 0.05 is considered significant. ADvar—purple bars depict nucleotide positions with significantvalues in the ADSP consortium WGS data. AD WGS—the lowest associationvalue reported in the ADSP consortium WGS data at the corresponding nucleotide position (Supplementary Table). AD WGSvalues were downloaded fromLD, linkage disequilibrium. a b c d Methods 8 r P P x P CCL2 z e e P P P P 2 [50] [51] [52] [53] [68] [56] 2 10 https://dss.niagads.org/open-access-data-portal/ Source data
Expanding the enhancer–gene regulatory map of astrocytes using prediction models
To expand the enhancer–gene regulatory map, we evaluated four predictive models (ABC4,60, TAP-seq RF15, ENCODE’s rE2G-extended and rE2G-DNase) that outperform a broad range of deep-learning and standard machine learning models14. Because these models were developed using data from K562 cells4,14,15,61, we assessed their performance on both astrocyte and K562 CRISPRi screen datasets as ground truth.
We then developed an RF model of enhancer–gene regulation in astrocytes (EGrf), using the following seven variables: three enhancer chromatin state features (ChIP H3K27ac, ChIP H3K4me3, and ATAC–seq), gene stability index36, enhancer–gene distance, the number of transcription start sites (TSSs) between the gene and enhancer, and a binary indicator of whether the gene in the enhancer–gene pair is the closest gene to the enhancer. Of these, the gene stability index and the closest gene variables had not been included in the other predictive models we evaluated.
EGrf achieved an AUPRC of 0.57 and 35% precision at 70% recall, a higher predictive accuracy compared to the rE2G-DNase model in astrocytes (P = 0.04; Fig. 7a,b). While several other variables were predictive on their own (enhancer transcription, Hi-C contacts, maximum DeepSEA-based variant scores), their inclusion in the EGrf model did not improve the performance alongside the existing seven variables (Extended Data Fig. 9). Notably, replacing ATAC–seq with DNase-seq significantly reduced the model’s performance (Extended Data Fig. 9). In Supplementary Note 2, we describe the contribution of each variable to EGrf, and introduce an RF model to predict EG pairs in K562 cells, which performs as well as rE2G-extended, while using fewer variables.
We then applied EGrf to intergenic ATAC–seq peaks not tested by CRISPRi to assess their effects on expressed genes within a 500 kb region. At a stringent threshold of 80% precision (prediction score = 0.535), EGrf-predicted 1,348 enhancer–gene regulatory interactions (Supplementary Table), involving 1,223 unique enhancers and 805 unique genes, substantially expanding the catalog of predicted functional regulatory interactions in astrocytes. 9
When assessing the properties of the predicted EGPs, we found that they were enriched for enhancers previously identified by CRISPRi and MPRA58, and for regions carrying superenhancer marks (Fig. 7c). The median enhancer–gene distance was 26.69 kb. The predicted enhancer-regulated genes were depleted of housekeeping genes, while the predicted functional enhancers were much more likely to be transcribed as measured by TT-seq (P = 8.1 × 10−217, OR = 6.6), supporting the validity of the predicted EGPs.
We then carried out individual CRISPRi experiments for five enhancers predicted by EGrf to control genes implicated in AD58,62–64 and neurodegeneration65, including an enhancer controlling CLU, a major risk gene for late onset AD. Each enhancer was targeted by two sgRNAs and each sgRNA was tested in duplicate, with the expression of the predicted target gene assessed by qRT–PCR. These data validated four of the five predicted regulatory interactions (Fig. 7d), consistent with EGrf’s 0.8 precision.
Together, our data establish a framework for functional characterization of enhancer–gene regulatory networks in a given cell type, through CRISPRi screening followed by genome-wide prediction using an RF approach.
![Click to view full size Prediction of enhancer–gene regulatory interactions in astrocytes using EGrf. , Performance comparison of enhancer–gene prediction models in astrocytes. Boxplots display AUPRC values across 1,000-fold bootstraps. The top, middle and bottom horizontal lines correspond to the 25th, 50th and 75th percentiles of the data, while whiskers extend the range of the data up to a maximum of 1.5× the interquartile range; any outliers beyond this range were plotted individually. *= 0.02, **= 0.001, two-sidedtest comparing AUPRC values for each model with those of EGrf., Precision–recall curves for all models shown in. Dotted line, recall = 0.7. The dot marks the recommended threshold for rE2G., Percentage of enhancers overlapping experimentally validated enhancers from K562 cellsand superenhancers, and percentage of EGPs crossing at least one TAD boundary (Hi-C data from iPSC-derived astrocyte data from ref.). Statistics were calculated using a two-sided Fisher’s exact test comparing EGrf-predicted functional enhancers with all intergenic NHA ATAC–seq peaks, and EGrf-predicted EGPs with all EGPs considered for prediction (; Supplementary Table)., Functional assessment of EGrf-predicted EGPs. Each enhancer was individually targeted by CRISPRi with two sgRNAs, and each sgRNA was transduced in independent duplicate experiments (= 4 per enhancer). The expression of the predicted target gene was assessed by qRT–PCR. For each plot, the control group consists of samples transduced with negative control sgRNAs (= 3, triangles) and additional samples transduced with sgRNAs targeting the other five enhancers, depending on RNA sample availability (= 2–15, circles).values were calculated via a two-tailed unpairedtest, error bars = s.d. sgRNA sequences and qRT–PCR primers are listed in Supplementary Tablesand, respectively. a b a c d P P t n n n P t [14] [67] [26] [22] Methods 9f 2 10 Source data](https://europepmc.org/articles/PMC12971484/bin/41593_2025_2154_Fig7_HTML.jpg.jpg)
Prediction of enhancer–gene regulatory interactions in astrocytes using EGrf. , Performance comparison of enhancer–gene prediction models in astrocytes. Boxplots display AUPRC values across 1,000-fold bootstraps. The top, middle and bottom horizontal lines correspond to the 25th, 50th and 75th percentiles of the data, while whiskers extend the range of the data up to a maximum of 1.5× the interquartile range; any outliers beyond this range were plotted individually. *= 0.02, **= 0.001, two-sidedtest comparing AUPRC values for each model with those of EGrf., Precision–recall curves for all models shown in. Dotted line, recall = 0.7. The dot marks the recommended threshold for rE2G., Percentage of enhancers overlapping experimentally validated enhancers from K562 cellsand superenhancers, and percentage of EGPs crossing at least one TAD boundary (Hi-C data from iPSC-derived astrocyte data from ref.). Statistics were calculated using a two-sided Fisher’s exact test comparing EGrf-predicted functional enhancers with all intergenic NHA ATAC–seq peaks, and EGrf-predicted EGPs with all EGPs considered for prediction (; Supplementary Table)., Functional assessment of EGrf-predicted EGPs. Each enhancer was individually targeted by CRISPRi with two sgRNAs, and each sgRNA was transduced in independent duplicate experiments (= 4 per enhancer). The expression of the predicted target gene was assessed by qRT–PCR. For each plot, the control group consists of samples transduced with negative control sgRNAs (= 3, triangles) and additional samples transduced with sgRNAs targeting the other five enhancers, depending on RNA sample availability (= 2–15, circles).values were calculated via a two-tailed unpairedtest, error bars = s.d. sgRNA sequences and qRT–PCR primers are listed in Supplementary Tablesand, respectively. a b a c d P P t n n n P t [14] [67] [26] [22] Methods 9f 2 10 Source data
Discussion
Here we used CRISPRi screening to test the function of 979 intergenic PsychENCODE candidate enhancers13 in human primary astrocytes, thereby uncovering genomic regions that control gene expression in the most abundant glial cell type.
The CRISPRi screen showed that most regulatory interactions (76%) occurred within 200 kb, and that enhancers often don’t target the closest gene—29% functional EGPs involved a distal gene while skipping at least one expressed gene, consistent with previous data from K562 (ref. 1). In contrast, a recent K562 screen reported >90% nearest-gene interactions6, likely because 70% of tested regions were intronic, suggesting that, under the assumption that CRISPRi itself does not impede transcription, intronic enhancers preferentially regulate their host gene.
Most functional EGPs were not captured by eQTLs (Fig. 6b). Similarly, 80% of the regulatory interactions identified by a CRISPRi screen in the blood-derived K562 cells were not captured by blood eQTL data6, despite high eQTL sample sizes for blood. Consistent with this, recent data showed that eQTLs primarily capture variants with large effect sizes, around promoters66.
The CRISPRi screen in astrocytes provided an opportunity to assess the accuracy of enhancer–gene prediction models outside of K562 cells, a lymphoblastoid line commonly used due to the availability of CRISPRi screen data. Cancer cells have altered epigenetic landscapes, and enhancer–gene regulation is highly specific to cell type. Existing models developed and benchmarked on K562 data performed less accurately in astrocytes, highlighting difficulties in generalizing regulatory rules learned from a cancer cell line to a different cell type in a primary state. Although EGrf improved prediction performance, accuracy remains insufficient to replace experimental validation on a genome-wide scale. We therefore applied a conservative threshold (precision = 0.8) when including predicted interactions in the AstroREG resource. Thus, the AstroREG resource is a high-confidence, albeit not exhaustive, set of predicted interactions. Recognizing that different users may prioritize sensitivity over specificity, we provide the full prediction dataset in Supplementary Table, allowing users to apply more permissive thresholds. 9
Our CRISPRi screen and validation experiments were focused on astrocytes, and we complemented this functional information with chromatin state data across cell types and developmental stages (snATAC–seq17). Through this approach, we identified enhancers that control 12 genes dysregulated in AD, are active in astrocytes as demonstrated by the CRISPRi screen and show astrocyte-specific chromatin state based on ATAC–seq (Fig. 4d, asterisk, and Fig.5b, red labels); these data highlight them as promising targets for cell-type-specific modulation of enhancer function and gene expression manipulation.
Future experiments assessing enhancer function in distinct functional states, such as reactive astrocytes, astrocytes cocultured with neurons or brain-region-specific astrocytes, will be critical for expanding our understanding of gene expression regulation in astrocytes. Furthermore, while CRISPRi has the advantage of avoiding off-target effects and DNA-damage responses caused by CRISPR-based deletions, it carries the caveat of potential incomplete silencing. Thus, experiments deleting individual enhancers would complement our data, and genome editing of specific SNVs will be important for demonstrating the functional consequences of individual sequence variants.
Overall, our study presents the largest set of experimentally validated enhancers in primary astrocytes, providing new annotations of noncoding variants and identifying regulatory elements controlling AD-linked genes. To facilitate future use of the astrocyte CRISPRi screen data, we provide the AstroREG data in a user-friendly web browser available at: https://voineagulabunsw.github.io/astrocyte_crispri_resource/↗.
Methods
All experimental work was carried out in accordance with ethics requirements at the University of New South Wales (UNSW). Lentiviral work and CRISPRi were approved by the UNSW Gene Technology Research Committee (NLRD22-21).
Experimental methods
Additional details on experimental methods are provided in: characterization of NHA and NHA–dCas9–KRAB cells (genotyping, ATAC-seq, RNA-seq and TT-seq library preparation and sequencing) and validation experiments. Supplementary Methods
General experimental methods
NHA cell lines and cell culture
Two NHA lines were obtained from Lonza (CC-2565, 18WG), that is, one female (lot 0000612736) and one male (lot 0000672445). The female line was used in all experiments except for genotyping, where both lines were used, and CRISPRi screen validation with individual CRISPRi and qRT–PCR readouts in a distinct NHA line, where the male line was used. Lenti-X 293T cells were obtained from Takara Bio (632180).
Cells were grown in complete DMEM (Thermo Fisher Scientific, 11995073) supplemented with 10% FBS (Scientifix, SFBS-AU) and 1% streptomycin–penicillin (Sigma-Aldrich, P4333-100ML) at 37 °C, 5% CO2.
We characterized NHAs in terms of their genetic background (SNP genotyping), chromatin state (ATAC–seq) and gene expression profile (bulk RNA-seq in NHAs with or without stable dCas9–KRAB expression; Fig. 1 and Extended Data Fig. 1).
RNA extraction
Cells were dissociated using trypsin and neutralized with complete DMEM. The cells were then centrifuged at 300g for 3 min. Cells were washed with PBS and pelleted again at 300g for 3 min. A total of 700-µl QIAzol (Qiagen, 79306) were added to the pellet and homogenized by pipetting. For RNA samples prepared for NanoString validation, cell lysates in QIAzol were heated to 55 °C for 10 min in a thermomixer (Eppendorf) with 1,000 rpm agitation, to improve isolation of NEAT1 isoforms70. RNA was then extracted using the miRNeasy Mini Kit (Qiagen, 217004) according to the manufacturer’s protocol, including on-column DNase digestion (Qiagen, 79254). RNA concentration was measured using the Qubit 2.0 Fluorometer (Life Technologies) with the BR RNA kit (Thermo Fisher Scientific, Q10210↗) according to the manufacturer’s instructions.
cDNA synthesis and qRT–PCR
RNA and DNA concentrations of RNA samples extracted as above were measured using the Qubit 2.0 Fluorometer (Life Technologies) with the BR RNA kit (Thermo Fisher Scientific, Q10210↗) and HS DNA kit (Thermo Fisher Scientific, Q33230↗), respectively, according to the manufacturer’s instructions. cDNA was generated using the high-capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific, 4368814) according to the manufacturer’s specifications. In parallel, a reaction was set up for a high DNA/RNA percentage sample without reverse transcriptase as a negative control. The reaction was run in a thermocycler for 10 min at 25 °C, 30 min at 37 °C, then 5 min at 85 °C. qRT–PCR was performed using the iTaq Universal SYBR Green Supermix (Bio-Rad, 1725124). A master mix was created with 5-µl iTaq Universal SYBR Green Supermix, 0.4 µl 10 µM forward primer, 0.4 µl 10 µM reverse primer and 2.2 µl UltraPure water per well (primer sequences available in Supplementary Table 10; Integrated DNA Technologies (IDT)). A total of 8 µl master mix and 2 µl cDNA were combined in each well of a MicroAmp Optical 384-Well Reaction Plate (Thermo Fisher Scientific, 4309849). A total of 8 µl master mix and 2 µl UltraPure water were combined per well for the nontargeting control. The qRT–PCR was performed on the QuantStudio 7 Pro (Thermo Fisher Scientific, A43183) using default settings for ‘Comparative Ct with Melt’, in ‘Fast’ run mode. For technical replicates with an s.d. of >0.4, the replicate with a threshold cycle (Ct) with the largest difference from other replicates was discarded as an outlier if its removal improved the s.d. Primer efficiency was assessed by performing a qRT–PCR as above with five serial dilutions of cDNA per primer pair and the slope of the linear regression was calculated for Ct values compared to the log of the relative amount of cDNA per well. Primer efficiency was then calculated using the following formula: efficiency (%) = (10(−1/slope) − 1) × 100. Relative gene expression was calculated as discussed in ref. 71, with ACTB and B2M used as reference genes.
Lentivirus-based vector generation and transduction
Five hundred microliters Opti-MEM (Thermo Fisher Scientific, 31985062) media and either 43 µl of Lipofectamine 3000 (Thermo Fisher Scientific, L3000015) or 50 µl of Lipofectamine LTX (Thermo Fisher Scientific, 15338100)—depending on availability—were combined in one tube, and 3.8 µg of pMD2.G, 5.9 µg of pMDLg/pRRE, 2.7 µg of pRSV-rev and 8.7 µg of the plasmid to be transduced were combined with 500 µl of Opti-MEM with either 40 µl of P3000 or 20 µl of Plus, depending on Lipofectamine reagent, in a separate tube. The two tubes were combined and incubated at room temperature for 15 or 30 min, depending on the Lipofectamine reagent, before adding dropwise to Lenti-X 293T cells at 70% confluency. The cells were incubated overnight before culture media was replaced. Lentiviral particles were collected 24 and 48 h after media replacement. The collected lentiviral particles were centrifuged at 300g for 3 min and the supernatant was filtered using a 0.45 µm filter (MilliporeSigma, SLHV033RS) to remove dead cells and debris. The collected lentiviral particles were concentrated tenfold using either the Lenti-X Concentrator (Takara Bio, 631231) or by adding 15 ml of lentiviral-containing media to the Amicon Ultra-15 Centrifugal Filter Unit at 100 KDa (MilliporeSigma, UFC910024) and centrifuging at 4,000g for 12 min or until 1.5 ml remained.
To transduce cells, concentrated lentiviral particles were diluted to the required concentration in complete DMEM, supplemented with 8 µg ml−1 polybrene (Merck, TR-1003-G), and added to cells. This media was replaced after 24 h. Cells were selected with an appropriate antibiotic for the transduced vector 48 h after transduction.
Determining lentiviral particle concentration and MOI
To determine the concentration of generated lentiviral particles, cells were transduced with three serial 1:3 dilutions of the lentiviral preparation. To determine the number of cells at transduction, cells from an identically seeded plate were detached with trypsin, neutralized with media and counted using a Countess II FL Automated Cell Counter (Thermo Fisher Scientific). The media on transduced cells was replaced the following day. A single-cell suspension was prepared 4 days post-transduction by detaching cells with trypsin (Sigma-Aldrich, T4174-100ML) and neutralizing with complete DMEM. Cells were centrifuged at 300g for 3 min and washed with PBS (Thermo Fisher Scientific, 10010049), before being centrifuged again at 300g for 3 min. Cells were resuspended in PBS supplemented with 25 mM HEPES (Sigma-Aldrich, H3537-100ML) and 2 mM EDTA (Thermo Fisher Scientific, 15575020) and passed through a cell strainer (Corning, BDAA352235). Cells were then analyzed for GFP fluorescence compared to an untransduced control. Flow cytometry was carried out using the BD LSRFortessa X-20 with BD FACSDiva (v8.0.1) software at the Mark Wainwright Analytical Centre (UNSW). Cells were selected with two single-cell gates to ensure live, single cells, then a GFP gate was used to determine percentage of cells expressing GFP. Lentiviral concentration in transduction units per ml (TU ml−1) was calculated using the following formula: concentration (TU ml−1) = number of cells transduced × % fluorescence cells as a decimal × dilution factor/transduction volume (ml)72. MOI was then calculated using the following formula: MOI = concentration (TU ml−1) × total transduction volume (ml)/number of cells73.
Generation of a stable NHA–dCas9–KRAB line
NHA cells were transduced with Lenti–dCas9–KRAB–blast (a gift from G. Hon; Addgene, 89567) and selected with 10 µg ml−1 blasticidine S hydrochloride (Sigma-Aldrich, 15205-25MG) after 48 h to create a stable cell line expressing dCas9–KRAB.
Cloning CROP-seq-opti-GFP
CROP-seq-opti-GFP was cloned by inserting the sequence for green fluorescence protein (GFP) and a T2A linker connected to the puromycin sequence in CROP-seq-opti-GFP (a gift from J. Shendure; Addgene, 106280). A synthetic gBlock (IDT) was generated with the sequence for GFP and a T2A linker, with flanking regions to recapitulate the CROP-seq-opti lost due to restriction enzyme digestion (Supplementary Table 10). The gBlock and CROP-seq-opti-GFP were digested with BsiWI (New England Biolabs, R3553S). The digested plasmid was dephosphorylated with Shrimp Alkaline Phosphatase (New England Biolabs, M0371S), and ligated with the digested gBlock using T4 DNA ligase (New England Biolabs, M0202S). The ligated plasmid was transformed into NEB 5-α competent Escherichiacoli (high efficiency) (New England Biolabs, C2987H) and grown on LB agar plates (UNSW; Lowy Solutions, UCS-BIO-0038) overnight at 30°C with 100 µg ml−1 ampicillin (Sigma-Aldrich, A5354-10ML). Colonies were screened by culturing picked colonies in LB broth (UNSW; Lowy Solutions, UCS-BIO-0046) with 100 µg ml−1 ampicillin overnight at 30°C and isolating plasmids using the QIAprep Spin Miniprep Kit (Qiagen, 27106) before Sanger sequencing across the cloned region (Ramaciotti Centre for Genomics, UNSW; Supplementary Table 10). CROP-seq-opti-GFP is available upon request.
CRISPRi screen
sgRNA library design
Three libraries of sgRNAs in CROP-seq-opti-GFP were generated at the Monash Functional Genomics Platform (Monash) as discussed in ref. 74—an enhancer library containing 2,478 sgRNAs targeting the 979 candidate enhancers, 20 positive control sgRNAs targeting 10 promoters and 50 negative control sgRNAs, a positive control library containing 250 sgRNAs targeting 125 promoters1 and a negative control library containing 250 negative control sgRNAs.
The enhancer-targeting library, positive control and negative control libraries were independently transduced. Data analysis was performed within each library transduction experiment to avoid batch effects. The positive control library allowed to assess the CRISPRi silencing efficiency without the confounding effects of a large number of promoter-targeting guides in the enhancer-targeting library, while the negative control library allowed to assess our statistical approach for P-value hyperinflation25; however, the enhancer-targeting library was used to identify the effects of enhancer silencing.
sgRNAs for candidate peaks were designed and scored using the GPP sgRNA designer75 (Supplementary Table 2). For peaks of <200 bp, peak width was first expanded to a 200-bp window around the midpoint, and two guides were chosen as those with the best combined rank in on-target and off-target effects. For peaks of >200 bp in size, the top two guides were chosen as above, with additional guides selected using an iterative tiling approach. Here a silencing window of ±100 bp was added around the start position of each guide. If the enhancer was fully tiled by these two guides (that is, its entire sequence was covered by these silencing windows), no further guides were selected. If not, starting from the next highest-ranked guide, the following process was iterated until the enhancer was fully tiled: (1) check if the start position of the guide is >50 bp from the start positions of all previously selected guides; (2) draw a ±100-bp silencing window around its start position, and check if this overlaps the sequence the other guides’ silencing windows failed to cover; and (3) if both (1) and (2) are fulfilled, keep this guide and check if the enhancer is now fully tiled.
Positive control guides targeting promoters were selected from those validated in ref. 1 (Supplementary Table 2). To prioritize from this list, guides were selected to target genes highly expressed in NHA cells. The positive control library of 250 guides was selected from the list mentioned in ref. 1 as the 125 genes with the lowest average expression rank in NHA cells. Additionally, 20 positive control guides were added to the NHA guide library, selected from the top ten highest-expressed genes in NHA. Two guides were selected for each promoter.
Negative control guides are listed in Supplementary Table, and consisted of nontargeting sequences and those targeting AAVS1. 2
sgRNA library transduction
Each library was individually transduced into NHA–dCas9–KRAB cells (passage 22) at an MOI of 5, while the positive control library was also transduced at 0.2 MOI, allowing for the assessment of the effect of MOI on CRISPRi silencing efficiency. On day 2 post-transduction, antibiotic selection began with cell media replaced with media containing 3 µg ml−1 of puromycin (Sigma-Aldrich, P833-25MG) and 10 µg ml−1 blasticidine S hydrochloride (Sigma-Aldrich, 15205-25MG). On day 4 post-transduction, the separately transduced cells were combined at a ratio of 20.4 : 2 : 1 : 5 for the enhancer library, negative control library, positive control library at MOI 5, and positive control library at MOI 0.2, respectively, to account for the relative abundance of sgRNAs in each library at each MOI.
scRNA-seq library preparation
On day 7 post-transduction, a single-cell suspension was prepared by detaching cells (passage 24) with trypsin and neutralizing with complete DMEM. Cells were pelleted at 300g for 5 min and resuspended in 2 ml PBS with 0.04% BSA (Sigma-Aldrich, A3311). Cells were then counted and passed through a cell strainer, before being pelleted by centrifugation at 300g for 5 min and resuspended in 200 μl PBS with 0.04% BSA. Single cell capture and library preparation were carried out at the Garvan–Weizmann Centre for Cellular Genomics (Garvan Institute of Medical Research) using the 10x Chromium Next GEM Single Cell 3’ kit (v3.1), using eight capture reactions aiming to capture 10,000 cells per reaction (Supplementary Fig. 1). The NHA_6 reaction and resulting sequencing data were filtered out due to a wetting failure during capture.
Sequencing was performed at the Ramaciotti Centre for Genomics (UNSW), using an Illumina NovaSeq 6000 with a run format of NovaSeq S4 200 cycles (v1.5) flow cell, with a read structure of 28:10:10:90, to obtain a total of 12.5 billion sequencing reads across the seven libraries.
Hemi-nested PCR for sgRNA enrichment
A hemi-nested PCR was performed on cDNA produced for scRNA-seq to amplify reads from the sgRNAs. cDNA generated from the scRNA-seq library preparation (after the amplification step but before fragmentation) was amplified in three PCR steps, as discussed in ref. 76 using primers from refs. 74,76 (Supplementary Table 10; IDT). Briefly, NEBNext High Fidelity PCR Master Mix was used for each PCR, with 10 ng of cDNA input for the first PCR, and 5 ng of DNA for subsequent PCRs. DNA was purified with the Qiagen MinElute PCR Purification Kit (Qiagen, 28004) according to the manufacturer’s protocol after each PCR. DNA concentration at each step was measured with a Qubit 2.0 Fluorometer using the Qubit dsDNA BR assay kit according to the manufacturer’s instructions. The amplified cDNA was cleaned up with 0.8× AMPureXP beads and sequenced on an Illumina NextSeq 500 to generate 2× 75 bp reads at the Ramaciotti Centre for Genomics (UNSW), at a total depth of 19.3 million reads.
Data analysis methods
Additional details on data analysis methods are provided in Supplementary Methods:Bulk ATAC–seq, RNA-seq and TT-seq data processingSNP genotyping and copy number variation callingNanoString data analysisProcessing and analysis of data from ref. 17Functional annotation of genesFunctional annotation of enhancersAnnotation of SNPs overlapping candidate enhancerseQTL data processingTF footprinting analysisRegulatory network constructionProcessing and analysis of TF–ChIP–seq dataDeep-learning-based in silico mutagenesis and DISsProcessing of WGS data from the ADSPPredictive models of enhancer–gene regulatory interactionComparison of chromatin state at intergenic regions across ENCODE astrocyte linesLarge language models
Candidate enhancer selection
Candidate NHA regulatory regions were selected by filtering NHA ATAC–seq peaks on five criteria (Extended Data Fig. 2):ATAC–seq read count (that is, MACS2 Pileup) >2 counts per million (CPM).Intergenic, that is, at least 2,000 bp from gene bodies (GENCODE, v32).Overlapping with the PsychENCODE consortium’s list of brain-active regulatory regions13.Overlapping with a TAD in cultured induced glial cells22, where the TAD contained at least one gene expressed above a Seurat-normalized scRNA-seq expression of 0.3 (log(((counts/total count) × 10,000) + 1)). This expression threshold was selected based on a small-scale pilot experiment assessing the minimal gene expression required for detecting silencing induced with CRISPRi (Extended Data Fig. 2).Overlapping with peak calls from at least two of the four ATAC–seq datasets generated from the following: cultured hiPSC-derived astrocytes21; GFAP+ or HEPACAM+ cells isolated from cortico-spheroids20; NeuN− nuclei isolated from the prefrontal cortex19; and NeuN− nuclei isolated from the dorsolateral prefrontal cortex, ventrolateral prefrontal cortex and orbitofrontal cortex18. All datasets were filtered to include peaks with pileup counts per >2 million.
All intersections were performed using intersectBed from the Bedtools suite77, requiring at least 1-bp overlap with no window. Where necessary, peak coordinates were lifted over to hg38 using rTracklayer78 with the UCSC hg19-to-hg38 chain (https://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToHg38.over.chain.gz↗).
scRNA-seq data processing
FASTQ demultiplexing, mapping and quantification of gene expression were performed using CellRanger (v6.0.2; ref. 79). Default parameters were followed, with mapping to the CellRanger 2020-A transcriptome reference (hg38 and GENCODE32).
Cells were called using the EmptyDrops package80, with a threshold of an FDR of <0.01. Data were processed using Seurat (v4.3.0.1; ref. 81). To remove potential doublets, we excluded cells whose total UMIs were above the 99.8th percentile within their library or the 99th percentile of all cells with only one assigned gRNA. Low-quality cells were identified and removed when mitochondrial genes, ribosomal genes or MALAT1 accounted for >20% of total UMIs, or dropletQC82 annotated them as damaged (Supplementary Fig. 1).
Associations between cell barcodes and guide RNAs were determined using the get_barcodes Python script (https://github.com/shendurelab/single-cell-ko-screens↗) from ref. 83. This script was applied to BAM files from both the 10x libraries and hemi-nested PCR-enriched sequencing runs. Default parameters were used, but with the following flags: --no_swalign --barcode_length 16, and --seach_seq GTGGAAAGGACGAAACACC. Cell–gRNA assignments required a minimum of three UMIs to be called, while also accounting for >2% of total gRNA UMIs within that cell.
Cell-type annotation of scRNA-seq data
scRNA-seq data generated in the CRISPRi screen were annotated to validate the cellular and developmental origin of NHAs. Cell-type annotation was based on normalized cell-level expression data, which were generated using the Seurat pipeline—log(((counts/total count) × 10,000) + 1)81. The SingleR algorithm (v1.8.1; ref. 84), which provides cell-type annotation based on external reference data using an unsupervised correlation-based algorithm with iterative tuning, was applied with default parameters. The reference dataset was constructed using an snRNA-seq time course of human brain maturation17. Cells were pseudobulked across batches according to their broad cell type (excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, oligodendrocyte precursor cells and microglia), and developmental stage (fetal, neonatal, infancy, childhood, adolescence and adult). Pseudobulked data were converted to CPM and log transformed (offset of +0.5).
CRISPRi screen scRNA-seq data analysis
DE was performed using the improved negative binomial regression model implemented in SCEPTRE25 (v0.1.0), an algorithm designed for high MOI, single-cell-level CRISPR screens. DE analyses tested the effect of silencing each enhancer on all expressed genes with a TSS within ±500 kb of the enhancer midpoint, as in previous studies6,85. Genes were considered expressed provided that the mean Seurat-normalized expression was greater than 0.015625 (2−6), equivalent to ~1.64 CPM and the gene being detected in at least ~5.6% of cells (Supplementary Fig. 2). This threshold was chosen based on the distribution of mean gene expression in the scRNA-seq library (Supplementary Fig. 2). Because some TADs were longer than 500 kb, 22 enhancers had no expressed gene within 500 kb, and thus 957 enhancers were tested in the DE analyses.
We thus tested the effect of 957 enhancers on 3,125 genes, for a total of 7,759 EGPs tested. DE analyses were performed at the enhancer level, comparing the targeted group (that is, cells expressing any guide targeting the enhancer) with the nontargeting group (that is, all other cells in the library with at least one other guide) as previously described6. Guide-level DE analyses were carried out for negative control sgRNAs, comparing each to all unique genes tested in the enhancer-level analyses. Positive control analyses were conducted by pooling both guides for the given TSS and testing against only the TSS’s gene, noting that one TSS was excluded due to a lack of cells expressing its guides (Pos_EIF3A, with only one cell expressing either guide).
The improved negative binomial test implemented in SCEPTRE was applied with default parameters, as a two-sided test and with the number of resamplings set to 1,000. The following cell-level covariates were adjusted for: log total UMIs; log MOI (where MOI is defined as the number of distinct gRNAs detected in each cell); sequencing batch; percent of UMIs aligning to mitochondrial genes; percent of UMIs aligning to ribosomal genes; nuclear fraction, as calculated using DropletQC82; and the first principal component of cell-cycling gene expression, determined using Tricycle86.
To avoid P-value inflation in the EGP tests, we considered the following two approaches: SCEPTRE’s conditional randomization test25 and the empirical P-value correction proposed in ref. 1—Pempirical = ((number of negative control guide–gene pairs with a smaller P value) + 1)/((the total number of negative control guide–gene pairs + 1)). Multiple testing correction was then applied using the Benjamini–Hochberg FDR approach87. Significant EGPs were defined at FDR < 0.1 as previously described1,25. We found the SCEPTRE approach to have lower sensitivity, identifying 90 significant EGPs versus 158 identified using the empirical P-value approach, of which 88 overlapped (Supplementary Table 3). To determine if the EGPs identified using the approach mentioned in ref. 1, not the SCEPTRE conditional randomization test, were valid, we included two such enhancers in our validation experiments, and found both to be experimentally validated (Enh219 and Enh220 regulating NEAT1; Fig. 5c). Thus, we selected the empirical P value with the FDR correction approach for our final result set (Supplementary Table 3). We do acknowledge that this more permissive set may, overall, possess a higher false positive rate and thus also specify the more stringent set in Supplementary Table 3 by including the SCEPTRE results.
Power simulations to define well-powered EGPs
A simulation framework inspired by the study in ref. 14 was used to define well-powered EGP tests in the CRISPRi screen dataset. For each EGP in the CRISPRi screen, where the enhancer was silenced in k cells and the effect was assessed on gene g, we evaluated the ability to detect perturbations of FC of −0.15 and −0.25 as follows. A random set of cells were selected from the CRISPRi scRNA-seq dataset, in which the counts of gene g were randomly downsampled across cells to reduce the sum of its expression by fc. The number of cells selected for downsampling was k rounded to the nearest multiple of 50 for 50 < k < 500, or to the nearest multiple of 100 for 500 < k < 1,000. We then carried out DE analyses comparing the downsampled cells to the nondownsampled group in the same way as the DE analysis of the CRISPRi screen data, with the same covariates, SCEPTRE parameters and significance threshold (Supplementary Fig. 4). A total of 20 simulated DE analyses were carried out for each k–g combination. Power was calculated as the proportion of simulations in which a significant DE result was obtained. EGPs were considered well-powered if power was above 80% for fc = −0.15 (Supplementary Fig. 4). Furthermore, a candidate enhancer or gene was considered well-powered if at least one of the EGPs in which it participated was well-powered.
Statistics and reproducibility
All statistical analyses were performed in R (v4.1.2).
No statistical methods were used to predetermine sample sizes. However, a power simulation was used post hoc to remove low-powered enhancers, genes or EGPs from statistical background distributions in bioinformatic analyses.
Enhancers were excluded from the CROP-seq scRNA-seq analyses if no expressed genes were detected within 500 kb (n = 22). In qRT–PCR, where technical replicates had a s.d. of >0.4, the replicate with a Ct with the largest difference from other replicates was discarded as an outlier if its removal reduced the s.d.
Data from the CROP-seq scRNA-seq were assumed to follow a negative binomial distribution. Data from qPCR and log2-transformed NanoString experiments were assumed to be normally distributed. For all Fisher’s exact tests, both variables were categorical, independent and mutually exclusive.
Randomization is inherent to the CROP-seq experimental design, as single cells are randomly transduced with sgRNAs from the pooled sgRNA library. For all other experiments involving NHAs, randomization was not performed as all cells were derived from the same parent population.
Major statistical findings from the CROP-seq experiment assessed for robustness by using two algorithms (SCEPTRE and the empirical FDR), with validation of key results sought using multiple independent orthogonal approaches (for example, NanoString and qPCR). Where possible, multiple independent resources were consulted from the literature (for example, four resources of brain eQTLs and three lists of genes dysregulated in astrocytes in AD).
Data collection and analysis were not performed blindly to the experimental conditions.
Reporting summary
Further information on research design is available in thelinked to this article. Nature Portfolio Reporting Summary
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41593-025-02154-3.
Supplementary information
Supplementary Information Supplementary Notes 1 and 2, Supplementary Methods (Experimental methods and Data analysis methods) and Supplementary Figs. 1–5. Reporting Summary Supplementary Table 1 Annotation of candidate enhancers. Supplementary Table 2 List of sgRNAs. Supplementary Table 3 Differential expression analyses of CRISPRi screen data. Supplementary Table 4 Functional annotation of enhancer-regulated genes. Supplementary Table 5 Chromatin state and regulatory network data. Supplementary Table 6 Disease associations of enhancer-regulated genes. Supplementary Table 7 Variant annotation to functional enhancers. Supplementary Table 8 Deep learning-derived DISs for variants in functional enhancers and their relevance to WGS data from ADSP. Supplementary Table 9 Predictive models of enhancer–gene regulatory interactions. Supplementary Table 10 Oligonucleotide sequences for cloning and primers. Supplementary Table 11 Quality control measures of sequencing data. Supplementary Table 12 MAGMA analyses.
Source data
Source Data Fig. 1 Statistical source data for Fig. 1d,f,g. Source Data Fig. 2 Statistical source data for Fig. 2c,d. Source Data Fig. 3 Statistical source data for Fig. 3d–g. Source Data Fig. 4 Statistical source data for Fig. 4a,c,d. Source Data Fig. 5 Statistical source data for Fig. 5a,c. Source Data Fig. 6 Statistical source data for Fig. 6a–c. Source Data Fig. 7 Statistical source data for Fig. 7a,c,d. Source Data Extended Data Fig. 4 Uncropped original western blot for Extended Data Fig. 4j.