PLoS computational biology

Using combined genetic and biological data to identify genes that may cause Long COVID

Updated

Abstract

Essence

An integrative multi-omics analysis suggests 32 putative causal genes and three symptom-based subtypes.

Evidence

An integrative genomics analysis combining , , eQTL, GWAS, RNA-seq, and PPI data for Long COVID prioritized 32 candidate genes, including 13 novel ones, and derived three symptom-based subtypes from causal gene expression profiles.

Caveat

These are inference-based gene and subtype predictions from integrated omics datasets, not experimental or clinical validation.

Simplified

Key numbers

32
Candidate Genes Identified
Total number of putative causal genes for identified.
19
Previously Reported Genes
Number of candidate genes confirmed by existing literature.
13
Novel Genes Discovered
Count of new candidate genes identified for .

Key figures

Fig 1
Multi-omics data integration and analysis steps for identifying causal genes in
Frames a clear multi-omics approach integrating genetic and network data to prioritize causal genes in Long COVID.
pcbi.1013725.g001
  • Panel A
    Input data includes ( linked to gene expression), (SNPs linked to outcome), gene expression, and the human Protein-Protein Interaction () network.
  • Panel B
    Integrative method combines and scores to evaluate risk/preventive and , with a parameter α controlling their weighting.
  • Panel C
    Output ranks significant causal genes by weighted final scores, showing a network of top causal genes for Long COVID with α varying from 0 (all network driver genes) to 1 (all risk/preventive genes).
  • Panel D
    Downstream analyses include enrichment analysis (dot plot of pathway significance), literature validation (document icon), and identification of Long COVID subtypes (bar chart).
Fig 2
Top causal genes ranked by and classified by disease risk and network roles
Highlights how gene roles shift from disease risk to network control as the decreases
pcbi.1013725.g002
  • Panels across alpha values 1.00 to 0.00
    Genes are sorted horizontally by increasing absolute effect size and vertically by alpha values balancing direct disease effect and network controllability; red boxes indicate , green boxes indicate , and yellow boxes indicate .
Fig 3
Enrichment of biological processes and pathways for putative causal genes
Highlights key biological processes and pathways enriched in Long COVID genes, spotlighting immune and signaling functions with higher significance
pcbi.1013725.g003
  • Panel A
    Top 20 enriched terms in Biological Process and Molecular Function categories, with dot size showing gene count and color indicating (blue = more significant)
  • Panel B
    Top 20 enriched KEGG pathways ranked by adjusted p-value, with dot size representing gene count and color gradient showing significance
  • Panel C
    Top 20 enriched Reactome pathways with dot size reflecting gene count and color indicating adjusted p-value (blue denotes higher significance)
Fig 4
Risk and protective gene expression effects for across
Highlights contrasting gene expression effects and tissue-specific risk patterns linked to Long COVID susceptibility.
pcbi.1013725.g004
  • Panel single
    Forest plot of 16 genes showing (standardized beta) with 95% confidence intervals; red bars represent lung and other tissues, blue bars represent non-lung tissues; effect sizes range from negative (decreased risk) to positive (increased risk).
Fig 5
Network connections of a key driver gene CREBBP in gene interactions
Highlights CREBBP’s extensive network connectivity and critical role in Long COVID gene interactions
pcbi.1013725.g005
  • Network plot
    CREBBP is shown as a large red central node with 273 total interactions (153 incoming, 120 outgoing)
  • Network plot
    Connected genes are represented by shapes: ellipses (), diamonds (), and round rectangles ()
  • Network plot
    Three most enriched pathways are highlighted in green, purple, and blue with node sizes proportional to their (connectivity)
1 / 5

Full Text

What this is

  • , or Post-Acute Sequelae of SARS-CoV-2 infection (PASC), affects 10-20% of COVID-19 patients with persistent symptoms.
  • This research developed a multi-omics framework to identify causal genes linked to .
  • The approach integrated various methodologies, including () and (), to prioritize candidate genes.

Essence

  • This framework identified 32 candidate genes associated with , including 19 previously reported and 13 novel genes. It also revealed three symptom-based subtypes of , enhancing understanding of disease mechanisms.

Key takeaways

  • The framework prioritized 32 candidate genes for , including 19 confirmed by existing literature. These genes are implicated in immune regulation, viral response, and cell cycle control.
  • Three distinct symptom-based subtypes of were identified, indicating variability in disease mechanisms and potential for personalized treatment strategies.
  • The research developed an open-source web application for interactive exploration of the findings, enabling further research and validation of identified candidate genes.

Caveats

  • The study relies on publicly available datasets, which may limit the generalizability of findings across diverse populations.
  • The framework's predictions are based on gene expression data, which may not fully capture the complexity of pathology.

Definitions

  • Long COVID: Persistent symptoms following SARS-CoV-2 infection, lasting beyond the acute phase, affecting multiple organ systems.
  • Transcriptome-Wide Mendelian Randomization (TWMR): A method that uses genetic variants to infer causal relationships between gene expression and disease outcomes.
  • Control Theory (CT): A framework used to analyze and identify key regulatory genes within biological networks.

Simplified

what lands in your inbox each week:

  • 📚7 fresh studies
  • 📝plain-language summaries
  • direct links to original studies
  • 🏅top journal indicators
  • 📅weekly delivery
  • 🧘‍♂️always free