Communications medicineMay 5, 2026

Using data from different sources improves machine learning to identify long COVID

Long Covid Weekly Brief ↗PubMed ↗DOI ↗OA ↗

Updated Jun 27, 2026

Abstract

Essence

Adding survey and genomic data modestly improved machine-learning identification of long COVID beyond EHR data alone.

Evidence

This cohort-based machine-learning study used more than 17,200 SARS-CoV-2-infected NIH All of Us participants and compared multi-scale versus EHR-only models.

Caveat

The AUROC gain was small, from 0.736 to 0.748, so the added cost of collecting survey and genetic data may limit implementation.

Simplified

BACKGROUND: Long COVID affects a substantial proportion of the over 778 million individuals infected with SARS-CoV-2, yet predictive models remain limited in scope. While existing efforts, such as the National COVID Cohort Collaborative (N3C), have leveraged electronic health record (EHR) data for risk prediction and identification, accumulating evidence points to additional contributions from social, behavioral, and genetic factors.

METHODS: Using a diverse cohort of SARS-CoV-2-infected individuals (n > 17,200) from the NIH All of Us Research Program, we investigated whether integrating EHR data with survey-based and genomic information improves model performance.

RESULTS: Our multi-scale approach outperforms EHR-only model's area under the receiver operating curve 0.736 (95% CI: 0.730, 0.741), achieving an area of 0.748 (0.741,0.755). Among the top predictors, active-duty service status, and self-reported fatigue are the most informative survey features.

CONCLUSIONS: These findings highlight the importance of incorporating multi-scale data to improve risk stratification and inform personalized interventions for long COVID. However the relative increase in accuracy is modest, and the cost of collecting genetic and survey data should be considered before implementation.

Full Text

Full text is available at the source.

View Full Text ↗

Featured in

Long CovidIssue #36

Long COVID linked to 2x higher heart disease risk in Stockholm study of 9,000 patients

↗

Using data from different sources improves machine learning to identify long COVID

Abstract

Full Text

Featured in

You found one interesting study. We’ll send the next 7.

what lands in your inbox each week:

Recent issues from the long covid brief