What this is
- This research utilizes Bayesian Genome-wide Transcriptome-wide Association Studies (BGW-) to identify risk genes for Alzheimer's disease (AD) dementia.
- By integrating both cis- and trans-expression quantitative trait loci () from brain and blood tissues, the study enhances gene mapping.
- A total of 141 significant risk genes were identified, providing insights into the genetic underpinnings of AD dementia.
Essence
- BGW- identified 141 significant risk genes for Alzheimer's disease dementia by leveraging both cis- and trans- data from brain and blood tissues. This approach revealed additional risk genes not previously mapped in existing studies.
Key takeaways
- The study identified 85 significant genes in the prefrontal cortex, 82 in the cortex, and 76 in whole blood associated with AD dementia. Combining results across these tissues led to the discovery of 141 significant risk genes.
- Among the identified genes, 34 were primarily driven by trans-, indicating the importance of distant regulatory mechanisms in AD dementia. Additionally, 35 genes were previously mapped in the GWAS Catalog.
- The findings emphasize the utility of BGW- in uncovering risk genes that traditional methods may overlook, enhancing the understanding of AD's biological mechanisms and potential therapeutic targets.
Caveats
- The study's findings are limited to European ancestry samples, which may restrict the generalizability of the results to other populations.
- Only three tissues were analyzed, potentially missing other relevant tissues that could contribute to the understanding of AD dementia.
- The BGW- method assumes a sparse model, which may not accurately capture all genetic influences on gene expression.
Definitions
- Transcriptome-wide association study (TWAS): A method that associates gene expression levels with complex diseases, using genetic data to estimate the effects of genetic variants on gene expression.
- eQTL (expression quantitative trait loci): Genetic variants that influence gene expression levels, categorized as cis (near the gene) or trans (distant from the gene).
- Bayesian Genome-wide TWAS (BGW-TWAS): An advanced TWAS method that incorporates both cis- and trans-eQTL to improve the identification of risk genes.
AI simplified
Background
Alzheimer's disease (AD) dementia is a complex neurodegenerative disorder characterized by progressive cognitive decline and memory loss, currently affecting ~7 million Americans aged 65 and older. AD dementia is listed as the seventh-leading cause of death in the United States of America in 2021 [1]. Despite extensive research, the underlying biological mechanisms of AD dementia remain elusive, and effective treatments are still lacking [2]. Recent studies have highlighted the important roles of genomic risk factors of AD dementia [3, 4]. Recent genome-wide association studies (GWAS) identified a total of ~75 genetic risk loci for AD dementia [5]. However, these risk loci still only explain a small portion of the heritability of AD dementia, suggesting additional risk genes or loci might contribute to the genetic etiology of AD dementia. Also, the biological mechanisms underlying the majority of the mapped risk genes remain unknown. Transcriptome-wide association study (TWAS) has emerged as an influential tool for identifying risk genes associated with complex diseases, particularly those with genetic effects mediated through transcriptome [6, 7]. For example, the recent TWAS of AD dementia by Sun et al. identified 53 risk genes by standard two-stage TWAS methods [8].
Standard two-stage TWAS methods first train gene expression prediction models by using reference genetic and transcriptomic data profiled of the same training samples, taking quantitative gene expression traits as response variables and the proximal cis-acting genetic variants as predictors (Stage I). Estimated effect sizes of genetic variants in the gene expression prediction models could be viewed as effect sizes of a broad sense of expression quantitative trait loci (eQTL), which will then be taken as variant weights in gene-based association test to map risk genes with GWAS data (Stage II). TWAS association test in Stage II is equivalent to testing the association between predicted genetically regulated gene expression (GReX) levels and the phenotype of interest in the GWAS data [9].
However, one limitation of most existing TWAS methods is that they only consider cis-eQTL [6, 10, 11], i.e., proximal genetic variants (e.g., located within the 1Mb region around the target gene region), in the gene expression prediction models. The limitation is mainly due to the computational bottleneck of considering genome-wide genetic variants to fit gene expression prediction models for transcriptome-wide ~20K genes per tissue type. The resulting cavity is failing to account for trans-eQTL, i.e., located distal from the target gene region, that have been found playing important roles in biological processes and disease susceptibility [12, 13]. Incorporating trans-eQTL in TWAS is essential as they can reveal additional regulatory mechanisms, help identify additional risk genes, and further our understanding of the underlying biological mechanisms of complex diseases.
To overcome this limitation in studying AD dementia, we employed the Bayesian Genome-wide TWAS (BGW-TWAS) method that incorporated both cis- and trans-eQTL for TWAS [14]. We applied BGW-TWAS to the reference Genotype-Tissue Expression (GTEx) V8 dataset [15] of three tissues –– prefrontal cortex, cortex, and whole blood. The selection of prefrontal cortex and cortex tissues was based on substantial evidence linking their involvement to the progression of AD dementia [16]. The selection of whole blood tissue was due to three reasons: i) a large sample size exists (n=574) for the whole blood tissue in the reference GTEx V8 dataset; ii) gene expression in the whole blood and that of the brain's cortex were found correlated [17]; iii) recent studies have demonstrated that gene expressions in whole blood could serve as biomarkers for AD dementia [18, 19]. For example, multiple studies showed that blood-based transcriptomics biomarkers predict AD risks, such as in Korukonda et. al. Alzheimer’s & Dementia 2021 [20]; Shigemizu et. al. Alzheimer’s Research & Therapy 2020 [21], Abdullah et. al. Informatics in Medicine 2022 [22],and Lee et. al Scientific Reports 2020 [23]. Also, a recent paper by Angioni et. al. J Prev Alzheimers Dis. 2022 [24] discussed the success, challenges, and future directions of deriving blood biomarkers from research use to clinical practice.
As a result, we identified 85 significant genes with the prefrontal cortex, 82 significant genes with the cortex, and 76 significant genes with the whole blood reference transcriptomic data. Combining BGW-TWAS p-values across these three tissues by ACAT-O, we obtained a total of 141 significant TWAS risk genes for AD dementia, including several well-known AD risk genes and risk genes that not mapped in the GWAS Catalog. Through protein-protein interaction network analysis [29] with these 141 significant TWAS risk genes, we detected gene networks comprised of known AD risk genes such as APOC2, BIN1, CASS4, MS4A4A, MS4A6A, SLC24A4, and MAPT and additional risk genes that are not mapped in the GWAS Catalog. In addition, these 141 significant TWAS risk genes were found enriched with known risk genes of several AD related phenotypes.

Study workflow
Methods
Bayesian genome-wide TWAS (BGW-TWAS)
BGW-TWAS [14] is a recently proposed TWAS method that incorporates both cis- and trans- genetic variants of the target genes as predictors in the gene expression prediction models (Stage I) as follows:1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal{E}}_{g}={{\varvec{X}}}_{cis}{{\varvec{w}}}_{cis}+{{\varvec{X}}}_{trans}{{\varvec{w}}}_{trans}+{\varvec{\epsilon}}, {\epsilon }_{i}\sim N\left(0, {\sigma }_{\epsilon }^{2}\right),$$\end{document}Eg=Xciswcis+Xtranswtrans+ϵ,ϵi∼N0,σϵ2,where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal{E}}_{g}$$\end{document}Eg denotes the target gene expression quantitative trait, (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{X}}}_{cis}, {{\varvec{X}}}_{trans}$$\end{document}Xcis,Xtrans) denotes the genotype data matrix of cis- and trans- genetic variants; (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{w}}}_{cis}$$\end{document}wcis, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{w}}}_{trans}$$\end{document}wtrans) denotes the corresponding cis- and trans- effect sizes; and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\epsilon}}$$\end{document}ϵ denotes the errors following a normal distribution with mean 0 and variance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sigma }_{\epsilon }^{2}$$\end{document}σϵ2. Here, we briefly describe the BGW-TWAS method.
BGW-TWAS assumes the Bayesian variable selection regression (BVSR) model [30] to enforce sparse eQTL models, by assuming specific spike-and-slab prior distributions for cis- and trans- effect sizes given by2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{array}{c}{w}_{cis,i} \sim {\pi }_{cis}N\left(0, {\sigma }_{cis}^{2}{\sigma }_{\epsilon }^{2}\right)+\left(1-{\pi }_{cis}\right){\delta }_{0}\left({w}_{cis,i}\right); \\ {w}_{trans,i} \sim {\pi }_{trans}N\left(0, {\sigma }_{trans}^{2}{\sigma }_{\epsilon }^{2}\right)+\left(1-{\pi }_{trans}\right){\delta }_{0}\left({w}_{cis,i}\right).\end{array}$$\end{document}wcis,i∼πcisN0,σcis2σϵ2+1-πcisδ0wcis,i;wtrans,i∼πtransN0,σtrans2σϵ2+1-πtransδ0wcis,i.
Here,denote the respective probability that the corresponding effect size is normally distributed, andis the point mass density function that takes value 0 whenand 1 when. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\pi }_{cis}, {\pi }_{trans})$$\end{document} ( , ) π cis π trans \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\delta }_{0}\left({w}_{i}\right)$$\end{document} δ 0 w i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${w}_{i}\ne 0$$\end{document} w i ≠ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${w}_{i}=0$$\end{document} w i = 0
BGW-TWAS employs multiple computational strategies to enable efficient computation to account for genome-wide genetic variants as predictors in the gene expression prediction models as in Eq. 1. A previously developed scalable expectation-maximization Markov Chain Monte Carlo (EM-MCMC) algorithm [31] is adapted and used by BGW-TWAS. Genome-wide variants are first segmented into approximately independent genome blocks. Genome blocks containing cis- genetic variants or containing top trans-eQTL with single variant test p-values < \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${10}^{-5}$$\end{document}10-5 will be selected for implementing the EM-MCMC algorithm to estimate eQTL effect sizes in the joint multiple regression model as in Eq. 1. The posterior causal probabilities (PCP) for “eQTL” with non-zero posterior effect size estimates will also be estimated. The product of estimated PCPs and effect sizes will represent the expected posterior effect sizes and be used as variant weights in the follow-up gene-based association tests (Stage II). The details of the BGW-TWAS method are referred to the BGW-TWAS paper [14].
The BGW-TWAS tool is memory efficient for first generating single variant eQTL summary data with individual-level reference transcriptomic and genetic data, and then using only these summary eQTL data in the MCMC algorithm. Also, the BGW-TWAS tool implements MCMC algorithm in parallel for multiple genome blocks, which can best utilize high-performance computing clusters with multiple computation cores. In this study, the BGW-TWAS method finished analyzing an average gene with sample size 184 in the reference transcriptomic dataset within 5 minutes in 16 cores. A total of 16 Gb memory was requested and sufficient for all analyses in this study.
Gene-based association test by S-PrediXcan test statistic
With cis- and trans- eQTL weights estimated by BGW-TWAS method and summary-level GWAS data (i.e., Z-scores by single variant tests), we employed the S-PrediXcan [25] approach to calculate the burden type TWAS Z-score test statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Z}_{g}$$\end{document}Zg per gene as follows:3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{array}{c}{Z}_{g}=\sum_{l\in {\text{ Model}}_{g}} \widetilde{{\text{w}}_{\text{lg }}}\frac{\widehat{{\sigma }_{l}}}{\widehat{{\sigma }_{g}}}\frac{\widehat{{\beta }_{l}}}{SE\left(\widehat{{\beta }_{l}}\right)}=\sum_{l\in {\text{ Model}}_{g}} \widetilde{{\text{w}}_{\text{lg }}}\frac{\widehat{{\sigma }_{l}}}{\widehat{{\sigma }_{g}}}{Z}_{l}=\sum_{l\in {\text{ Model}}_{g}} \frac{\left(\widetilde{{\text{w}}_{\text{lg }}}\widehat{{\sigma }_{l}}\right){Z}_{l}}{\sqrt{{\widehat{w}}^{\,\prime}{\varvec{V}}\widehat{w}}},\\ \widehat{{\sigma }_{l}^{2}}=\text{Var}\left({x}_{l}\right), \widehat{{\sigma }_{g}^{2}}={\widetilde{{\varvec{w}}}^{\,\prime}}V\widetilde{{\varvec{w}}}, V=Cov({{\varvec{X}}}_{{\varvec{r}}{\varvec{e}}{\varvec{f}}}),\end{array}$$\end{document}Zg=∑l∈Modelgwlg~σl^σg^βl^SEβl^=∑l∈Modelgwlg~σl^σg^Zl=∑l∈Modelgwlg~σl^Zlw^′Vw^,σl2^=Varxl,σg2^=w~′Vw~,V=Cov(Xref),where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\beta }_{l}}$$\end{document}βl^ denotes the genetic effect size of variant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l$$\end{document}l from GWAS, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Z}_{l}$$\end{document}Zl denote the corresponding Z-score statistic by single variant GWAS test, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{{\text{w}}_{\text{lg }}}=\widehat{P{CP}_{l}}{\widehat{w}}_{l}$$\end{document}wlg~=PCPl^w^l is the eQTL weight of variant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l$$\end{document}l estimated by BGW-TWAS. Here, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{X}}}_{{\varvec{r}}{\varvec{e}}{\varvec{f}}}$$\end{document}Xref denotes the genotype matrix with genotype covariance matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{V}}$$\end{document}V from a reference panel. Two-tailed BGW-TWAS p-values can then be obtained from the TWAS Z-score test statistics as in Eq. 3.
Omnibus aggregated cauchy association test (ACAT-O)
ACAT-O is an omnibus test that can combine p-values of multiple tests with respect to the same null hypothesis [27, 28], which employs a linear combination of transformed p-values as the test statistic. Particularly, the ACAT-O method is a general statistical method, which is suitable for combining correlated p-values such as the ones generated with correlated transcriptomic data of multiple tissues. As shown in the following formula,4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${T}_{ACAT-O}=\frac{1}{K}\sum_{i=1}^{K} \text{tan}\left\{\left(0.5-{p}_{i}\right)\pi \right\},$$\end{document}TACAT-O=1K∑i=1Ktan0.5-piπ,where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{{p}_{i}, i=1,\dots , K\}$$\end{document}{pi,i=1,⋯,K} denotes the p-values of K tests and the ACAT-O statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${T}_{ACAT-O}$$\end{document}TACAT-O approximately follows a standard Cauchy distribution under the null hypothesis. The approximate standard Cauchy distribution allows analytical calculations of the ACAT-O p-values, which were shown to be more accurate for combining small/significant p-values [27, 28].
As shown in Fig. 1, in this study, ACAT-O method was used to combine BGW-TWAS p-values (per gene) across three considered tissues to obtain combined TWAS p-values, under the null hypothesis of the combined genetic effects are mediated through the transcriptome of one of these three tissues. Combined TWAS p-values were obtained by using the “ACATO” function from the R package “sumFREGAT” [32].
Since the ACAT-O method only provides an omnibus combined p-value for each test gene, one could refer to the TWAS Z-score test summary data of each tissue to gain information about which tissue contributed most to the significant TWAS risk gene by ACAT-O.
Genotype-tissue expression (GTEx) data
The publicly available GTEx V8 (dbGaP phs000424.v8.p2) data [15] contain whole genome sequencing (WGS) data of 838 human donors and RNA sequencing (RNA-seq) transcriptomic data of 17,382 normal samples from 52 human tissues and two cell lines. We use the transcriptomic data of the prefrontal cortex (n=158), cortex (n=184), and whole blood (n=574) tissues and the corresponding WGS data as reference panel to estimate cis- and trans- eQTL weights by BGW-TWAS. Samples in the GTEx V8 dataset are of European ancestries. The same fully processed, filtered, and normalized transcriptomic data used in the GTEx eQTL analysis were downloaded from the GTEx portal and use in this study. For each tissue, samples with <10 million mapped RNA-seq reads were excluded. For samples with replicates, the replicate with the greatest number of reads was selected. Gene read counts from each sample were normalized using size factors calculated by DESeq2 and log-transformed with an offset of 1. Genes with log-transformed value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>1$$\end{document}>1 in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>10\%$$\end{document}>10% samples were considered. The resulting gene expression values were centered with mean 0 and standardized with standard deviation 1. The resulting matrix was then hierarchically clustered (based on average and cosine distance), and a chi2 p-value was calculated based on Mahalanobis distance. Clusters with ≥60% samples with Bonferroni-corrected p-values <0.05 were marked as outliers, and their samples were excluded. Genetic variants with missing rate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<20\%$$\end{document}<20%, minor allele frequency \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>0.01$$\end{document}>0.01, and Hardy-Weinberg equilibrium p-value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>{10}^{-5}$$\end{document}>10-5 were considered for fitting the gene expression prediction models.
The fully processed, filtered, and normalized transcriptomic data were adjusted for top five genotype principal components, top probabilistic estimation of expression residuals (PEER) factors [9], sequencing protocol (PCR-based or PCR-free), sequencing platform (Illumina HiSeq 2000 or HiSeq X), and sex, as suggested by the GTEx eQTL data analysis guidelines [15]. The number of top PEER factors [33] used to adjust the gene expression traits depends on the sample sizes in the reference transcriptomic data cohort. We used 30 factors for gene expression traits of the prefrontal cortex and cortex tissues, and 60 factors for the whole blood tissue. Only samples with complete data of these covariates were included in the analyses. Adjusted gene expression quantitative traits were then taken as response variables in the gene expression prediction model.
Summary-level GWAS data of AD dementia
The summary-level GWAS data of AD dementia (i.e., single variant Z-score test statistics obtained by meta-analysis) were generated by the latest GWAS by Bellenguez et al. [5]. The summary-level GWAS data (n=~487K; European ancestry) were generated by meta-analysis with the European Alzheimer & Dementia Biobank consortium (with samples from 15 European countries) and UK Biobank dataset. GWAS of each cohort was conducted with clinically diagnosed cases, ancestry proxy cases, and controls.
Protein-protein interaction network and pathway analysis
STRING (version 12.0) [34] is a bioinformatics web tool that provides information on protein-protein interactions and networks, as well as functional characterization of genes and proteins. The tool integrates different types of evidence from public databases, such as genomic context, high-throughput experiments, and previous knowledge from other databases, to generate reliable predictions of protein interactions and build networks and pathways. Provided with a list of gene names, STRING will construct networks based on the protein-protein interactions of the corresponding proteins, as well as identify phenotypes that have risk genes enriched in the provided list. Proteins corresponding to provided genes are considered as nodes in the protein-protein interaction network. Protein-protein edges represent the predicted functional associations, colored differently to indicate seven categories –– computational interaction predictions from co-expression, “text-mining” of scientific literature, databases of interaction experiments (biochemical/genetic data), known protein complexes or pathways from curated resources, gene co-occurrence, gene fusion and gene neighborhood. Gene co-occurrence, fusion, and neighborhood represent association predictions are based on whole-genome comparisons [35]. Interactions from these resources are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information [34].
Particularly, the “text-mining” channel is the result of parsing full-text articles from the PMC Open Access Subset (up to April 2022), PubMed abstracts (up to August 2022), as well as summary texts from OMIM [36] and Saccharomyces genome database [37] entry descriptions. These texts are all parsed for co-mentions of protein pairs and assessed against the frequencies of all separate mentions of the respective proteins. An improved deep learning-based relation extraction text mining model was used by STRING v12 [34]. The ‘textmining’ channel significantly increases the number of protein–protein interactions.
Results
BGW-TWAS results of AD dementia
Using BGW-TWAS method, we trained gene expression prediction models for 23,724 genes of prefrontal cortex tissue, 23,900 genes of cortex tissue, and 19,519 genes of whole blood tissue. We identified 85, 82, and 76 TWAS risk genes with significant p-values (with Bonferroni correction) for prefrontal cortex, cortex, and whole blood tissues, respectively. Of these, 20 genes were significant for prefrontal cortex and cortex, 4 were significant for prefrontal cortex and whole blood, 5 were significant for cortex and whole blood, and 13 were significant for all three tissue types (Supplemental Figure 1). Detailed information of the significant TWAS risk genes of these three tissues were provided in Supplemental Files 2-4. Manhattan plots of BGW-TWAS results of these three tissues were presented in Supplemental Figures 2-4.
We also summarized the proportion of trans-eQTL with non-zero weights for all significant genes of three tissues in Supplemental Figure 5. There were 27 (31.8%), 19 (23.2%), and 21 (27.6%) significant genes whose association were driven by ≥50% trans-eQTLs in prefrontal cortex, cortex, and whole blood tissues, respectively. These results demonstrated that trans-eQTL had important contributions to significant TWAS risk genes of all three tissues.
ACAT-O results of AD dementia

Manhattan plot of ACAT-O TWAS-values for studying AD dementia. The horizontal dashed line represents the Bonferroni corrected significance threshold of the-values. Orange dots indicate genes that were significant in only one tissue, while red dots highlight genes that were significant in more than one tissue p p
| TWAS risk genes driven bycis-eQTL | TWAS risk genes driven bytrans-eQTL | ||||
|---|---|---|---|---|---|
| Gene | CHR | ACAT-Op-value | Gene | CHR | ACAT-Op-value |
| CR1ab | 1 | 6.85e-31 | NRP2a | 2 | 0.0000016 |
| ICA1Lab | 2 | 8.25e-7 | HAUS3a | 4 | 0.00000118 |
| HLA-DQA2ab | 6 | 5.49e-10 | GTF2H4b | 6 | 0.00000286 |
| PVRIGab | 7 | 6.56e-10 | CUL7b | 6 | 0.00000133 |
| STAG3ab | 7 | 5.66e-10 | TUBBb | 6 | 2.03e-7 |
| AP4M1ab | 7 | 3.23e-10 | DDR1a | 6 | 7.83e-10 |
| NDUFAF6ab | 8 | 2.98e-8 | TRERF1c | 6 | 3.04e-10 |
| TSPAN14ab | 10 | 1.67e-9 | SNORD22a | 11 | 0.00000183 |
| LACTBabc | 15 | 1.21e-7 | VWCEb | 11 | 0.00000132 |
| SEZ6L2ab | 16 | 2.31e-7 | SLC3A2c | 11 | 1.13e-9 |
| PRSS36ab | 16 | 2.33e-9 | AP001350.4b | 11 | 1.74e-13 |
| INO80Eabc | 16 | 9.48e-10 | C16orf58a | 16 | 2.95e-7 |
| YPEL3abc | 16 | 7.34e-10 | SLC7A5P1c | 16 | 1.26e-9 |
| ARHGAP27ab | 17 | 1.93e-7 | RP11-196G11.3a | 16 | 1.21e-9 |
| ARL17Aab | 17 | 1.04e-7 | NUPR1c | 16 | 6.68e-11 |
| PLEKHM1abc | 17 | 7.82e-8 | TGFB1I1b | 16 | 8.45e-12 |
| DND1P1abc | 17 | 5.87e-8 | EFTUD2b | 17 | 0.00000256 |
| LRRC37A2abc | 17 | 5.83e-8 | NBR1a | 17 | 0.00000217 |
| MAPK8IP1P2abc | 17 | 4.89e-8 | C17orf53a | 17 | 0.00000163 |
| LINC02210abc | 17 | 4.84e-8 | AOC2c | 17 | 0.00000149 |
| LRRC37A4Pabc | 17 | 4.78e-8 | RP5-882C2.2a | 17 | 1.98e-7 |
| MAPK8IP1P1abc | 17 | 3.68e-8 | RUNDC1c | 17 | 1.86e-7 |
| RP11-259G18.3abc | 17 | 2.45e-8 | SP2c | 17 | 5.04e-8 |
| WNT3abc | 17 | 1.02e-9 | RAB5Ca | 17 | 8.57e-9 |
| GRNab | 17 | 3.75e-12 | SP2-AS1a | 17 | 3.84e-9 |
| ACEabc | 17 | 9.99e-16 | CADM4a | 19 | 0.00000116 |
| POLR2Eab | 19 | 1.81e-7 | PRKD2b | 19 | 2.59e-7 |
| GPR4ab | 19 | 1.15e-8 | DOT1La | 19 | 2.03e-7 |
| APOC2ab | 19 | 5.87e-27 | CTB-12A17.2a | 19 | 4.45e-9 |
| a: Genes significant in prefrontal cortex b: Genes significant in cortex c: Genes significant in whole blood | TMEM160c | 19 | 3.91e-9 | ||
| PPP5Ca | 19 | 2.33e-10 | |||
| SLC1A5c | 19 | 5.63e-11 | |||
| ZNF226c | 19 | 3.82e-11 | |||
| CTD-2233K9.1a | 19 | 3.23e-12 | |||
Contributing eQTL of significant TWAS risk genes
It is noticeable that all the trans-eQTL of the example genes (and for most of the genes we studied) are still on the same chromosome as the test gene. This might be due to both biological mechanisms and methodological limitations. Because the BGM-TWAS method assumes a sparse eQTL model for gene expression and often can only estimate eQTL with relatively large effect sizes, it is possible that trans-eQTL located on the same chromosome as the test gene have relatively larger effect sizes. Thus, the BGW-TWAS method only estimated non-zero weights for these trans-eQTL that were used in the follow-up gene-based association studies.

Scatter plots of eQTL weights estimated by BGW-TWAS of example TWAS risk genes. Column A: gene(chr17) in three tissues; Column B: gene(chr11),(chr11)and(chr11), in prefrontal cortex, cortex, and whole blood tissues, respectively. Y-axis depicts the values of eQTL weights estimated by BGW-TWAS, and the x-axis shows the order of base pair position of the corresponding eQTL. Solid circles denote-eQTL, and triangles refer to-eQTL. Color legend denotes the -log (GWAS-value) of the corresponding eQTL ACE SNORD22 AP001350.4 , SLC3A2 cis trans p
Known mapped GWAS risk genes in GWAS catalog for AD dementia
Compared to previous GWAS results, 35 out of the 141 TWAS risk genes were reported as mapped risk genes of AD dementia in GWAS Catalog [39], such as known risk genes of ACE, APOC2, BIN1, and CR1 (Supplemental File 1). Particularly, the GReX of CR1 and APOC2 were both found to be negatively associated with risk of AD dementia in prefrontal cortex and cortex. The GReX of BIN1 was positively associated with risk of AD dementia in both cortex and whole blood tissue. The GReX of ACE was positively associated with risk of AD dementia in all three tissues. These findings are consistent with previous studies about genes CR1 [40], APOC2 [41], BIN1 [42], and ACE [43]. Information of upregulation and downregulation of the complete lists of significant genes in the 3 tissues is provided in the Supplemental Files 2-4.
Additional TWAS risk genes of AD dementia
Besides these 35 mapped risk genes reported in GWAS catalog, we identified additional 106 significant TWAS risk genes of AD dementia, including 34 genes whose significance were primarily due to trans-eQTLs. Also, our findings replicated risk genes identified by recent studies of AD that integrated eQTL data with GWAS summary data. For instance, genes OSBP, ZNF296, and ZNF284 were identified by the TWAS tool TIGAR [9, 11] using only cis-eQTL of the prefrontal cortex tissue [44] and GWAS summary data of AD dementia generated by Wightman D.P. et. al in 2021 [26]. A recent study integrating summary-level GWAS and meta-analytic cis-eQTL data found genes NDUFS2 and PRSS36 were related to AD risk [45]. Recent splicing TWAS analyses found that causal splicing introns of gene WDR33 and LRRC37A4P were associated with AD in multiple brain tissues [46]. A recent study that integrated eQTL data of blood tissue and GWAS of late-onset AD (LOAD) by a Bayesian statistical method identified risk gene ZNF226 [47].
We also looked at cis-SNPs within ±1Mb region of our identified TWAS risk genes, using the GWAS results by Bellenguez et al. [5]. We found that 114 genes have at least one cis-SNP with GWAS p-value less than the genome-wide significance threshold (5E-8) and 27 genes have no GWAS significant cis-SNPs (Supplemental File 1). Because our significant TWAS risk genes were identified by testing both cis- and trans-SNPs with non-zero eQTL weights. Those genes without GWAS significant cis-SNPs will not be identified by standard GWAS with the same data. Our findings could be due to the consideration of trans-eQTL and/or burden tests with multiple eQTL. Previous biological studies provided insights of some of our findings with no significant cis-SNPs. For example, it was found that the expression of NRP2, a gene coding the modulating receptors of the vascular endothelial growth factor signaling family, was associated with better cognitive outcomes [48]. Gene OTULIN was found to affect NF-κB-activity in AD patients, subsequently leading to the shrinkage of the entorhinal cortex and the limbic system in early stages of AD [49]. Gene DDR1 is associated with reduced inflammation and vascular fibrosis in AD [50]. In addition, gene TUBB encodes a β-tubulin protein that forms a dimer with alpha tubulin and acts as a structural component of microtubules, where β-tubulin was found aggregating in AD cases [51].
Protein-protein interaction network analysis by STRING
By pathway enrichment analyses conducted by the STRING tool, five pathways were found to be enriched with our identified TWAS risk genes. Three of these pathways were involved with the CD20-like family such as Cranial nerve maturation, Peptidase A22B, signal peptide peptidase and GTPase GIMA/IAN/Toc. These three pathways included the key known AD risk genes MS4A4A, SLC24A4, MS4A6A, BIN1, and CASS4. The fourth pathway was ADP-ribosylation factor-like protein 17, and LRRC37A/B like protein 1, C-terminal, which included TWAS risk genes LRRC37A2, ARHGAP27, and ARL17A on chromosome 17. The fifth pathway was Cranial nerve maturation, and His Kinase A (phosphoacceptor) domain, including known AD risk genes SLC24A4, CASS4, and BIN1.

Protein-protein interaction networks identified with 141 significant TWAS risk genes by the STRING tool. Edge colors show different sources of the identified protein-protein interactions
Phenotype enrichment analysis
Interestingly, the enriched phenotypes of functional laterality, intracerebral hemorrhage, white matter microstructure measurement, complete blood cell count, leukocyte count, myeloid white cell count, and eosinophil count have been reported to be associated with AD by previous studies. According to a high-resolution MRI study, lack of the laterality shift in limbic system and early loss of asymmetry in entorhinal cortex could be biomarkers to identify preclinical AD [55]. Several studies found that the risk of dementia increased significantly after intracerebral hemorrhage [56, 57]. A study observed microstructural damage of white matter in AD brains [58]. Chronic inflammation has been proposed as a significant risk factor in AD pathogenesis [59]. Increased levels of complete blood cell count and immune cell count, including leukocyte, myeloid white cell, and eosinophil, have been observed in the brains of AD patients [60–63].
Further, we conducted separate phenotype enrichment analyses by using tissue-specific TWAS risk genes that are of upregulated (with positive TWAS Z-score) and downregulated (with negative TWAS Z-score) groups (Supplemental Figure 7). We found that mental or behavioral disorders and functional laterality were enriched in the upregulated genes in prefrontal cortex, cortex, and whole blood. Additionally, family history of AD was enriched in the upregulated genes in cortex and whole blood, and the downregulated genes in whole blood; AD biomarker measurement was enriched in the downregulated genes in whole blood. No phenotype was found enriched in the downregulated genes in prefrontal cortex and cortex.

Phenotype enrichment analysis with 141 significant TWAS risk genes by the STRING tool. The -log10 of the false discovery rates (FDR, x-axis) for testing the enrichment of known risk genes of the corresponding phenotype (y-axis) were plotted
Discussion
In this study, we applied the BGW-TWAS method to the reference GTEx V8 data of three tissues (prefrontal cortex, cortex, and whole blood) to leverage both cis- and trans-acting eQTL for studying AD dementia. With the latest GWAS summary data of AD dementia [5], we identified 85 significant genes with the prefrontal cortex, 82 significant genes with the cortex, and 76 significant genes with the whole blood reference transcriptomic data. Combining BGW-TWAS p-values by ACAT-O, we obtained 141 significant TWAS risk genes of AD dementia, including 35 mapped GWAS risk genes in GWAS Catalog and 34 TWAS risk genes primarily driven by trans-eQTL. As expected, most of the genes exhibiting significance due to trans-eQTL were not detected by previous GWAS and TWAS which failed to account for trans-eQTL. Interestingly, previous studies reported AD-related biological functions for our identified TWAS risk genes that have no cis-SNP with significant GWAS p-value, such as NRP2, OTULIN, DDR1, and TUBB [48–51].
Through protein-protein interaction network analysis using the STRING tool, we identified network clusters containing mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes that are either driven by trans-eQTL or/and have no cis-SNP with significant GWAS p-value. A total of 46 identified TWAS risk genes that are not reported as mapped risk genes in GWAS catalog, are connected with mapped AD GWAS risk genes in the identified protein-protein interaction clusters. Our findings are consistent with previous studies highlighting the critical involvement of APOC2, BIN1, and MAPT in AD [52–54], as evidenced by their extensive connectivity with other mapped GWAS risk genes of AD and TWAS risk genes within the major network cluster. The identified protein-protein interaction networks also underline the important function of genes in the CD20-like family. Since the “text-mining” channel (see Methods) used by the STRING tool identified the majority of the protein-protein interactions among our identified TWAS risk genes, further functional studies by biological experiments are still needed to validate these interactions.
We also identified 14 phenotypes whose known risk genes were enriched in our identified 141 TWAS risk genes. Especially, brain lesions such as functional laterality, intracerebral hemorrhage, and white matter microstructure have been reported to be associated with AD by previous studies [55–58]. Biomarkers such as complete blood cell count, leukocyte count, myeloid white cell count, and eosinophil count were shown to be related to AD [60–63], suggesting a complex interplay among genetic, transcriptomic, metabolic, and inflammatory risk factors in the pathogenesis of AD. Shared risk genes between these biomarkers and AD have also been reported by previous studies. For example, gene BCL3 and INPP5D were reported with pleiotropy effects on AD and inflammation biomarkers [64, 65]. Further biological experiments are still needed to understand the roles of these TWAS risk genes in the biological mechanism of AD.
Nevertheless, our study has several limitations. First, due to the computation burden of running BGW-TWAS tool, we only applied BGW-TWAS to reference GTEx V8 data of three tissues (prefrontal cortex, cortex, and whole blood). Other tissues within and outside the brain such as hippocampus, muscle, and spinal cord, are also known to play crucial roles in the biological mechanisms of AD dementia [66–70]. Failing to consider all available tissues in GTEx V8 data may fail to capture a full spectrum of genetically regulated gene expression associated with AD dementia. Our ongoing work includes further improving the computation efficiency of the BGW-TWAS tool and applying it to all available tissues in GTEx V8.
Second, since samples in the summary-level GWAS data of AD dementia (generated by Bellenguez et. al.) and the GTEx reference dataset are all European ancestries, our TWAS significant gene findings are also limited to the European population. Conducting similar TWAS analyses with reference transcriptomic data and GWAS summary data of other populations would be needed to study the disparity of AD risks in different populations.
Third, the BVSR model employed in the BGW-TWAS method inherently assumes a sparse model implying that only a small number of eQTL have true causal effects on gene expression. Although this assumption can be computationally advantageous, it may not always accurately represent the underlying genetic architecture of complex gene expression quantitative traits. In our future work, we will also investigate using other statistical models for genome-wide TWAS analysis.
Conclusions
In conclusion, our study highlights the importance of considering both cis- and trans-eQTL in TWAS analysis as it can help identify significant risk genes that would have been missed by using only cis-eQTL. We identified several well-known AD risk genes as well as genes that are not reported in GWAS Catalog but interconnected with known AD risk genes in the same protein-protein interaction networks. As a genome-wide TWAS, our study is the first to utilize both cis- and trans- eQTLs of multiple tissues for AD risk gene identification. Our results provide further insights into the underlying biological mechanisms of AD dementia and a list of potential gene targets for the development of therapeutics for treating AD dementia.
Supplementary Information
Supplementary Material 1. Supplementary Material 2. Supplementary Material 3. Supplementary Material 4. Supplementary Material 5. Supplementary Material 6. Supplementary Material 7. Supplementary Material 8. Supplementary Material 9. Supplementary Material 10. Supplementary Material 11.