A transformer model predicted lipid nanoparticle by fusing molecular features of ionizable lipids.
Evidence
This machine-learning benchmark used more than 10,000 experimentally measured transfection-efficiency values and reported average Pearson correlation of 0.845 and AUC-ROC of 0.818 across datasets.
Caveat
The abstract reports retrospective benchmark and robustness tests, not prospective experimental deployment of model-selected LNP formulations.
Simplified
RNA-based technologies have demonstrated significant potential for diverse applications, ranging from vaccination to gene editing. However, their widespread adoption is limited by the critical challenge of efficient delivery. Lipid nanoparticles (LNPs) have emerged as a widely utilized RNA delivery system, yet their formulation design and optimization primarily rely on empirical trial-and-error, which is labor-intensive, time-consuming, and cost-prohibitive, thus hindering the rapid development of RNA therapeutics. To facilitate the early-stage design and optimization of LNPs for enhanced delivery efficiency, in this study, we construct LNPs-, a benchmark dataset comprising over 10 000 experimentally measured transfection efficiency (TE) values, and introduce LNPs integrated feature fusion Transformer (LIFT), a deep learning framework for LNPs TE prediction. Comprehensive experiments demonstrate that LIFT effectively integrates multidimensional molecular representations of ionizable lipids, the key component in LNPs formulation, achieving superior predictive performance, with an average Pearson correlation coefficient of 0.845 for regression and an area under the receiver operating characteristic curve (AUC-ROC) of 0.818 for multi-class classification across multiple datasets. Through scaffold-based splitting and activity cliff tasks, we further validated the exceptional generalization ability and robustness of LIFT, which achieved over a 10% improvement in the coefficient of determination (R2) compared with state-of-the-art baseline models, highlighting its potential as a practical and stable approach for the virtual screening of efficient LNPs formulation. The relevant data, model and code are made publicly available at https://github.com/U12458/LIFT.
Key numbers
0.846
Average Pearson Correlation Coefficient
Achieved in regression tasks across multiple datasets.
10.3%
Improvement in Coefficient of Determination (R²)
Compared to state-of-the-art baseline models.
Full Text
We can’t show the full text here under this license.