A machine learning model for predicting obesity risk in patients with diabetes mellitus: analysis of NHANES 2007–2018

Sep 8, 2025Frontiers in public health

Using machine learning to predict obesity risk in people with diabetes

AI simplified

Abstract

Among 3,794 participants with type 2 diabetes, 57.0% were classified as obese.

  • LASSO regression identified 19 key variables related to risk.
  • The logistic regression model outperformed other algorithms in predicting obesity.
  • The logistic regression model achieved an area under the ROC curve (AUC) of 0.751 in the training set and 0.781 in the test set.
  • Calibration and decision curve analysis indicated favorable clinical utility for the logistic regression model.
  • A nomogram was created to assist in personalized risk prediction for obesity.

AI simplified

Key numbers

2,163 of 3,794
Prevalence
Total participants with vs. total participants with diabetes
0.781
for
Area under the in the test set for the best-performing model
19
Key Predictors Identified
Number of significant variables associated with risk

Key figures

Figure 1
Participant selection and exclusion process for analysis in diabetes patients
Sets up clear participant filtering and group division essential for analyzing obesity risk in diabetes patients
fpubh-13-1606751-g001
  • Panel single
    Flowchart starting with total population (n=59842), excluding those under 18 years (n=23262) and without diabetes diagnosis (n=30408), resulting in eligible participants (n=6172); further excluding participants with >10% missing data (n=2378) to yield the final cohort (n=3794) split into obesity (n=2163) and non-obesity (n=1631) groups based on threshold of 30 kg/m²
Figure 2
Correlations among variables and their importance for predicting risk in diabetes patients
Highlights age as the strongest predictor and reveals clusters of related variables influencing obesity risk in diabetes
fpubh-13-1606751-g002
  • Panel A
    showing correlations between variables with a cluster of strong positive correlations visible among several related factors
  • Panel B
    Bar chart ranking variables by their predictive importance based on values, with Age having the highest importance
Figure 3
coefficient changes and optimal selection for predictors
Highlights how key obesity-related variables are selected and weighted using LASSO regression in diabetes data
fpubh-13-1606751-g003
  • Panel A
    LASSO coefficient path plot showing 19 variable coefficients changing as log lambda decreases
  • Panel B
    Least angle regression path plot displaying with error bars and optimal lambda selection
Figure 4
Performance of nine models for predicting in diabetes patients
Highlights ’s better calibration and lower compared to other models in obesity prediction
fpubh-13-1606751-g004
  • Panels A–B
    ROC curves for nine models in training (A) and test (B) sets with values shown in the legend
  • Panels C–D
    (DCA) for nine models in training (C) and test (D) sets showing net clinical benefit across threshold probabilities
  • Panel E
    Calibration plots of each model in the test set with red line for observed probability and diagonal black line for perfect calibration
  • Panel F
    of Brier scores for each model in the test set; lower scores indicate better predictive accuracy
Figure 5
Regression coefficients of 10 significant variables from for diabetic diagnosis
Highlights key variables with positive and negative associations for predicting obesity risk in diabetes patients.
fpubh-13-1606751-g005
  • Panel single
    Regression coefficients for variables including , , , Total Fat, , , , Age, , and ; Depression and EH=1 have positive coefficients, Sex=1 has a negative coefficient.
1 / 5

Full Text

What this is

  • This research analyzes data from the National Health and Nutrition Examination Survey (NHANES) from 2007 to 2018.
  • It aims to identify predictors of in patients with type 2 diabetes mellitus (T2DM) using .
  • The study develops a predictive model to assist in early identification and management of risk in this population.

Essence

  • affects 57.0% of individuals with T2DM. A logistic regression model, identified as the best-performing, predicts risk based on key clinical features.

Key takeaways

  • Logistic regression outperformed other models in predicting risk among T2DM patients, achieving an AUC of 0.781 in the test set.
  • Nineteen key variables were identified as significant predictors of , including age, gender, and uric acid levels.
  • A nomogram was created for clinical use, allowing healthcare providers to estimate risk based on individual patient characteristics.

Caveats

  • The study's cross-sectional design limits the ability to draw causal inferences from the observed associations.
  • External validation of the predictive model was not conducted, which may affect its generalizability.
  • Some relevant variables, such as genetic and environmental factors, were not included in the dataset.

Definitions

  • Obesity: A condition characterized by excessive body fat, typically defined by a BMI ≥30.0 kg/m².
  • Machine learning: A subset of artificial intelligence that uses algorithms to analyze data, identify patterns, and make predictions.

AI simplified

what lands in your inbox each week:

  • 📚7 fresh studies
  • 📝plain-language summaries
  • direct links to original studies
  • 🏅top journal indicators
  • 📅weekly delivery
  • 🧘‍♂️always free