Frontiers in bioinformatics

Using machine learning to improve three-part peptide treatments for metabolic disease

Updated

Abstract

Essence

A machine learning framework suggests triple-agonist peptide sequences can be computationally optimized for metabolic-disease drug discovery.

Evidence

This computational modeling study trained a graph attention network on 234 peptides with measured receptor binding, tested it by 5-fold cross-validation and 24-sequence validation, and used a genetic algorithm to generate 20 candidate peptides with mean predicted binding probabilities above 0.5 across all three targets.

Caveat

The results are in silico, with limited training data for GIPR and mixed receptor-specific performance, so therapeutic value still needs experimental validation.

Simplified

Key numbers

0.942
Prediction Accuracy Increase
for vs.
4.0%
Fitness Enhancement
Percentage increase from baseline fitness score
0.915 ± 0.050
for
Area under the receiver operating characteristic curve

Key figures

FIGURE 1
Pipeline combining a and for peptide design optimization
Highlights a computational approach that visibly improves peptide binding fitness through iterative optimization
fbinf-05-1687617-g001
  • Panel 1
    Input peptide sequences are processed by the model with layers including graph convolution and attention heads for receptor-specific predictions
  • Panel 2
    Genetic algorithm evolves a population of 100 peptide sequences through evaluation, selection, crossover, and mutation operations
  • Panel 3
    graph shows progressive increase in fitness scores over 50 generations, indicating optimization progress
  • Panel 4
    Final output is optimized peptide sequences ready for experimental validation
FIGURE 2
Binding affinity, sample size, and agonist type distributions in peptide dataset
Highlights variation in binding affinity and sample sizes across receptors, spotlighting + as the dominant multi-agonist type
fbinf-05-1687617-g002
  • Panel a
    Percentage of sequences (≤1000 pM) for GCGR (49.0%), GLP1R (74.8%), and (57.1%)
  • Panel b
    Number of sequence measurements for GCGR (206), GLP1R (234), and GIPR (56), with GLP1R having the largest sample size
  • Panel c
    Distribution of multi-agonist sequence types showing GCGR+GLP1R (41.5%) as the most common, followed by GLP1R+GIPR (13.2%), GCGR+GIPR (10.3%), and Triple Agonist (10.3%)
FIGURE 3
Performance metrics for versus ensemble on prediction targets
Highlights stronger prediction accuracy and correlation for GAT on EC50_LOG_T1 compared to CNN ensemble
fbinf-05-1687617-g003
  • Panel a
    (RMSE) comparison showing significantly lower prediction error for GAT on EC50_LOG_T1 and comparable error on EC50_LOG_T2
  • Panel b
    (R²) comparison showing superior explained variance for GAT on EC50_LOG_T1 and equivalent performance on EC50_LOG_T2
  • Panel c
    Pearson correlation coefficients indicating stronger linear relationships for GAT predictions on EC50_LOG_T1 and comparable correlations on EC50_LOG_T2
FIGURE 4
performance metrics across receptors and training stages
Highlights consistently high and stable prediction accuracy across receptors, with showing strongest improvement
fbinf-05-1687617-g004
  • Panel (a)
    AUC-ROC scores for GCGR, , and receptors across three training stages, all above 0.84; GCGR shows highest AUC-ROC at Stage 3 (0.915)
  • Panel (b)
    F1-scores for the three receptors across stages, all exceeding 0.81; GLP1R shows highest at Stage 2 (0.90)
  • Panel (c)
    Box plots of AUC-ROC score distributions across five folds at final unified stage, with median values above 0.9 for GCGR and GIPR, and 0.85 for GLP1R
  • Panel (d)
    Performance progression trajectories showing stable or improved AUC-ROC scores from initial to unified training stages for all receptors, with GCGR and GIPR reaching near or above 0.9
FIGURE 5
Novel sequences vs complete validation set: model performance metrics for three receptors
Highlights that model performance is generally lower for novel sequences, especially in for and , spotlighting challenges in predicting new peptide sequences
fbinf-05-1687617-g005
  • Panel a
    F1-score comparison for GCGR, GLP1R, and receptors showing lower scores for novel sequences (red) than complete set (blue) except GIPR where novel is slightly higher
  • Panel b
    Area under precision-recall curve () comparison showing GCGR and GIPR values slightly lower or similar for novel sequences (red) versus complete set (blue), but GLP1R novel sequences appear higher
1 / 5

Full Text

What this is

  • This research focuses on optimizing peptide therapeutics targeting multiple receptors for metabolic diseases using machine learning.
  • Triple agonist peptides that activate glucagon receptor (GCGR), glucagon-like peptide-1 receptor (GLP1R), and glucose-dependent insulinotropic polypeptide receptor (GIPR) show potential advantages over single-target therapies.
  • A novel Graph Attention Network () framework is proposed to improve predictive performance and facilitate peptide design, addressing limitations of previous methods.

Essence

  • The study introduces a -based framework for designing triple agonist peptides targeting GCGR, GLP1R, and GIPR. This approach enhances predictive accuracy and facilitates systematic optimization of peptide sequences.

Key takeaways

  • outperformed traditional CNNs for GCGR prediction, achieving a root mean square error (RMSE) of 0.942 vs. 1.209, indicating improved accuracy in binding affinity predictions.
  • The genetic algorithm optimization led to a 4.0% fitness enhancement, generating 20 candidate peptides with mean binding probabilities exceeding 0.5 for all target receptors.
  • The model demonstrated robust performance in cross-validation, achieving an AUC-ROC of 0.915 ± 0.050 for GCGR, 0.853 ± 0.059 for GLP1R, and 0.907 ± 0.083 for GIPR.

Caveats

  • The training dataset's variability across laboratories may introduce biases affecting model predictions, limiting generalizability.
  • The model's reliance on static physicochemical properties may overlook critical three-dimensional structural features influencing binding affinity.
  • Experimental validation of the computationally generated sequences is necessary to confirm their biological activity and therapeutic relevance.

Definitions

  • Graph Attention Networks (GAT): A type of neural network that uses attention mechanisms to model relationships in graph-structured data, allowing for flexible input sizes and interpretability.

Simplified

what lands in your inbox each week:

  • 📚7 fresh studies
  • 📝plain-language summaries
  • direct links to original studies
  • 🏅top journal indicators
  • 📅weekly delivery
  • 🧘‍♂️always free