What this is
- SenPred is a pipeline designed to classify senescent dermal fibroblast cells using (scRNA-seq).
- The pipeline addresses the challenge of accurately detecting , which varies by cell type and context.
- By utilizing scRNA-seq data from fibroblasts grown in both 2D and 3D environments, SenPred achieves high accuracy in identifying senescent cells.
- The findings suggest that 3D culture conditions improve the detection of senescent cells in vivo compared to traditional 2D methods.
Essence
- SenPred accurately classifies deeply senescent fibroblasts using scRNA-seq data, achieving over 99% true positives. The model demonstrates that 3D culture conditions enhance the detection of senescent cells in vivo, addressing limitations of 2D cultures.
Key takeaways
- SenPred predicts fibroblast with over 99% true positives using scRNA-seq data. This high accuracy indicates the model's effectiveness in distinguishing senescent cells.
- Fibroblasts grown in 3D environments yield better detection than those grown in 2D. This suggests that 3D culture conditions more accurately reflect in vivo cellular behavior.
- The model's application to in vivo datasets reveals a significant discrepancy between predicted senescent cell burdens and existing literature, indicating potential limitations in current detection methods.
Caveats
- The model's performance in vivo is limited by the lack of established ground truth for markers. This uncertainty complicates the evaluation of the model's accuracy in real-world contexts.
- SenPred's ability to detect is context-specific, as it performed well for dermal fibroblasts but not for lung fibroblasts or whole skin, indicating potential limitations in its generalizability.
Definitions
- Senescence: A stable cell cycle arrest characterized by changes in cell function and morphology, often associated with aging and tissue damage.
- Single-cell RNA sequencing (scRNA-seq): A technique that allows for the examination of gene expression at the individual cell level, providing insights into cellular heterogeneity.
- Machine learning (ML): A subset of artificial intelligence that uses algorithms to analyze data, learn patterns, and make predictions without explicit programming for each task.
AI simplified
Background
Senescence is a fundamental cellular programme defined as a stable cell cycle arrest. The lack of universal senescence markers is a well-recognised challenge within the field. Hallmarks of senescence are both cell-type and trigger dependent, and no single marker alone is sufficient to confirm senescence in all contexts [1, 2]. It is, therefore, necessary to assess multiple markers encompassing cellular morphology, effectors of cell cycle arrest, the senescence-associated secretory phenotype (SASP), mitochondrial changes, chromatin remodelling, and lysosomal markers such as senescence-associated beta-galactosidase (SA-ÎČ-Gal) [1â3]. However, choosing appropriate markers for a particular experiment often relies on either literature precedent using the specific cell type and trigger, or lengthy optimisation and marker validation. Recent work has sought to address this by focusing on the development of accessible, stepwise, and standardised workflows for the classification and subsequent characterisation of senescence [2, 4]. Kohli et al. proposed a two-step approach to senescence detection, firstly focusing on more general lysosomal changes (SA-ÎČ-Gal) and cell cycle arrest (lack of EdU incorporation, and immunocytochemistry for Ki67, p16 and p21), and secondly on subtype-specific proinflammatory SASP changes [4].
This seminal work has allowed for a more systematic and streamlined approach to senescence detection. The outcome of these approaches continues to depend upon the senescence context, meaning prior knowledge of the subtype is required to determine a senescent-cell state. This renders in vivo senescent cell detection a current challenge, due to a lack of âground truthâ. Further, using a small number of selected markers could bias senescent cell detection towards a specific subtype and overlook additional heterogeneous subpopulations. Therefore, understanding the nuanced changes between senescence subtypes could allow for context-specific senescence detection and would enable unbiased classification of datasets where the senescent cell status is unknown. Furthermore, this understanding would remove the likelihood of detecting false positive and false negative data, which is crucial for progression within the senescence field. With the emergence of high-throughput screening methodologies, and âomicsâ-based sequencing technologies, our understanding of senescent cell subtypes is rapidly expanding. This necessitates the development of tools which can utilise this data to classify senescence in novel datasets.
Machine learning (ML) methodologies are widely used to classify distinct cellular states [5]. The data must have a state of ground truth, so prior knowledge of the input class is required. The data can then be divided into training and testing datasets. The training dataset can be used to build the model, and the unseen testing dataset can be used to evaluate model performance based on a variety of metrics such as classification accuracy [6]. These ML models can then be applied to classify novel datasets. ML has been successfully applied to senescence classification in a small number of contexts (reviewed in Hughes, Wallis & Bishop 2023 [5]). For example, Heckenbach et al. have developed their model âXceptionâ using neural networks, which can accurately classify a cell as senescent based on its nuclear morphology [7]. Alongside morphology, RNA sequencing data has also been used to build predictive models based on mRNA expression levels of senescent versus proliferative cells [8â11]. Jochems et al. used an elastic net model to accurately classify therapy-induced senescent cancer cells grown in a 2D monolayer [11]. Importantly, the group stratified the cells into those which had undergone short- and long-term treatments and found that treatment time strongly influenced the transcriptome of the cells. When applying their model to in vivo patient-derived xenografts treated with the senescence inducer SHP099, all xenografts were classified as ânot senescentâ. However, the group described that the use of bulk RNA sequencing datasets could explain why they were unable to detect senescence signatures in vivo.
Therefore, we asked if using single-cell transcriptomics alongside ML could improve senescent cell detection, particularly within heterogeneous in vivo contexts. Initially, we performed single-cell RNA sequencing of deeply senescent human dermal fibroblast (HDF) cells from a two-dimensional monolayer to build a ML model of senescence. We established that the model can accurately detect fibroblast senescence in independent 2D in vitro datasets, and in line with work by Jochems et al., we demonstrate that temporal kinetics is an important consideration when building transcriptomic senescent cell models. Interestingly, we find that fibroblasts grown in a two-dimensional monolayer cannot generate an accurate model of in vivo senescence. Therefore, to more faithfully replicate conditions in vivo, we performed single-cell RNA sequencing of cells grown in a three-dimensional matrix. It is widely reported that cells grown in 3D are more similar to cells in vivo, and can better recapitulate cellular morphologies, gene expression, and most notably the extracellular matrix [12â15]. We demonstrate that the 3D ML model enabled the detection of senescent dermal fibroblasts in vivo. Here, we present SenPred, an exploratory proof-of-concept pipeline, which lays the foundation for a new approach to detect senescent cells in vivo and could allow for scRNA-seq-based prediction of alternative disease states.
Methods
Cells and reagents
Primary human dermal fibroblasts (HDFs) were a kind donation from anonymous healthy patients under standard ethical practice, reference LREC No. 09/HO704/69. HDFs were cultured in DMEM with 4 mM L-glutamine (41,966â029, Life Technologies) supplemented with 10% foetal bovine serum (1001/500, Biosera), in the absence of antibiotics, and maintained at 37 °C, 5% CO2, 95% humidity. Cells were seeded at 4000 cells/cm2 for passages 1â19, 7500 cells/cm2 for passage 20â25, and 10,000 cells/cm2 for passage 26 onwards. HDFs were serially passaged and were classified as early proliferative (EP) up to passage 13, and Deeply Senescent (DS) beyond passage 39 + 3. All cells were routinely tested for mycoplasma and were shown to be negative.
Dermal gels were generated by suspending HDFs in collagen 1 (354,236, Corning) supplemented with Matrigel (7,341,100, Corning), FCS (1001/500, Biosera), 10X DMEM (11,430â030, Invitrogen), and pH balanced with 4 M NaOH in the absence of antibiotics. HDF-gel mix was added dropwise to transwells (CC405, Corning) to give a final cell density of 10,000 HDFs per gel. Nine gels were constructed per condition. Dermal gels were maintained at 37 °C, 5% CO2, and 95% humidity.
Single-cell RNA sequencing
For 2D samples, cells were grown in 6-well plates at 7000/cm2 for 72 h. Cells were collected, and resuspended in HBSS (14,025,092, Thermofisher), and immediately sorted into single cell oil droplet suspension. To extract the cells from 3D dermal gels, gels were incubated with collagenase D (Sigma) at 37 °C for 1 h. Collagenase activity was subsequently inhibited by HBSS addition. Gels were strained and centrifuged to capture cell pellets, which were resuspended in HBSS before sorting into single-cell oil droplet suspension.
Library generation and sequencing were performed using the v2 Chromium Single Cell 3âČ Kit (10X genomics). Samples were sequenced using the Illumina NextSeq 500 High Output Run Sequencing platform using paired-end sequencing and aligned to the human genome (GRCh37) using Cell Ranger (10âĂâGenomics).
External data acquisition
For external model testing, data was acquired from the published papers listed in Table 1.
The datasets from Chan et al. and Teo et al. were used to apply and test the 2D SenPred pipeline on in vitro scRNA-seq experiments, whereas the datasets from Tabib et al., Solé-Boldo et al., Ganier et al., and Sikkema et al. were used to test the 3D SenPred pipeline. The number of cells analysed for the in vitro studies was as follows:
The number of samples used for each in vivo study after QC and any additional sample details are listed in Tables 2, 3, 4, and 5. The ethics approval and data acquisition of human tissue were acquired by the respective authors within the in vivo references listed in Table 1.
Additional preprocessing steps were carried out on both the Tabib et al. [19] and Solé-Boldo et al. [20] datasets, to combine donors and generate a single Seurat Object for each. Cell level filtering and normalisation of raw data from Tabib et al. [19] and Solé-Boldo et al. [20] were performed using Seurat R package version 3.2.2 in R version 4.0.3 [26]. To remove low-quality cells, the data was filtered using the following metrics:
Data was normalised using ScTransform [27], and the sctransform normalized counts for each sample were then integrated [26], using the top 3000 most variable genes in the SelectIntegrationFeatures() function. Then, the FindIntegrationAnchors() function was applied with default parameters to identify the anchors across the different samples in each dataset. The anchors were then passed to the IntegrateData() function to create a single normalized and integrated Seurat object for each dataset.
| Reference | Accession | Sample | Context |
|---|---|---|---|
| Chan et al. [] [16] | GEO #[] GSE175533 [17] | WI-38 replicative senescent cells | In vitro |
| Teo et al. [] [10] | GEO #[] GSE115301 [18] | OIS and paracrine ER:IMR90 fibroblasts | In vitro |
| Tabib et al. [] [19] | The pre-processed dataset, including the raw UMI data matrix and associated metadata from Tabib et al., was acquired from the corresponding author, Robert Lafyatis (lafyatis@pitt.edu) in 2018 | Whole skin | In vivo |
| Solé-Boldo et al. [] [20] | GEO #[] GSE130973 [21] | Whole skin | In vivo |
| Ganier et al. [] [22] | ArrayExpress: âE -MTAB- 13085â [] [23] | Whole skin | In vivo |
| Sikkema et al. [] [24] | Integrated human lung cell atlas:. []. https://cellxgene.cziscience.com/collections/6f6d381a-7701â4781-935c-db10d30de293 [25] | Whole lung | In vivo |
| Donor age | Sex | Sample site | Total number of cells | Number of fibroblast cells |
|---|---|---|---|---|
| 23 | Female | Dorsal mid-forearm | 1185 | 648 |
| 24 | Male | Dorsal mid-forearm | 861 | 43 |
| 54 | Male | Dorsal mid-forearm | 1055 | 268 |
| 62 | Female | Dorsal mid-forearm | 1347 | 808 |
| 63 | Male | Dorsal mid-forearm | 791 | 253 |
| 66 | Female | Dorsal mid-forearm | 1659 | 538 |
| Donor age | Sex | Sample site | Total number of cells | Number of fibroblast cells |
|---|---|---|---|---|
| 25 | Male | Inguinoiliac region | 2769 | 666 |
| 27 | Male | Inguinoiliac region | 2395 | 813 |
| 53 | Male | Inguinoiliac region | 3320 | 1188 |
| 69 | Male | Inguinoiliac region | 4531 | 2034 |
| 70 | Male | Inguinoiliac region | 2139 | 592 |
| Donor age | Sex | Sample site | Total number of cells | Number of fibroblast cells |
|---|---|---|---|---|
| 56 | Male | Nose | 12579 | 5114 |
| 59 | Male | Temple | 11751 | 2597 |
| 62 | Male | Cheek | 2335 | 617 |
| 70 | Male | Forehead | 6209 | 1805 |
| 73 | Male | Forehead | 7953 | 620 |
| 77 | Male | Ear | 9205 | 1914 |
| 78 | Male | Temple | 4740 | 2184 |
| 80 | Male | Forehead | 12,119 | 1266 |
| 81 | Female | Cheek | 7439 | 2062 |
| 85 | Male | Forehead | 13,454 | 3153 |
| 86 | Male | Ear | 6982 | 2166 |
| 90 | Male | Cheek | 13,784 | 2503 |
| Donor age or median of age group | Sex | Sample site | Total number of cells | Number of fibroblast cells |
|---|---|---|---|---|
| 25 | Male | Lung | 49,574 | 3606 |
| 29 | Male | Lung | 17,533 | 361 |
| 30 | Maleâ+âfemale | Lung | 35,652 | 301 |
| 32.5 | Female | Lung | 3771 | 211 |
| 34 | Female | Lung | 9134 | 173 |
| 35 | Maleâ+âfemale | Lung | 17,711 | 173 |
| 42.5 | Male | Lung | 16,899 | 357 |
| 49 | Male | Lung | 14,613 | 125 |
| 50 | Female | Lung | 16,187 | 146 |
| 52 | Male | Lung | 41,505 | 3889 |
| 52.5 | Male | Lung | 2425 | 139 |
| 55 | Maleâ+âfemale | Lung | 25,789 | 346 |
| 57 | Male | Lung | 33,202 | 221 |
| 57.5 | Male | Lung | 7971 | 1358 |
| 59 | Male | Lung | 23,027 | 347 |
| 62 | Male | Lung | 7286 | 339 |
| 64 | Maleâ+âfemale | Lung | 16,258 | 263 |
| 67.5 | Female | Lung | 14,728 | 718 |
Cell-level filtering and normalisation
All subsequent data analysis was performed using R (version 4.2.2) in Rstudio (2022.07.2 Build 576), running under macOS Monterey 12.0.1. Matrix, gene, and barcode.gz data files were loaded and converted into a Seurat object [28]. Unless otherwise stated, default parameters from Seurat V4.3.0.1 were applied. To remove low-quality cells, the data was filtered using the following metrics (Supplementary Fig. S1):
Data was normalised using NormalizeData(), which normalises individual gene expression of a cell to total gene expression of this cell and multiplies by the default scale factor (10,000). This data is then log transformed.
To subset fibroblasts from whole skin data for the Tabib [19] and Solé-Boldo [20] datasets, pre-defined metadata was used. For Ganier et al. [22], dot plots were generated to visualise the mean cluster expression of a pre-defined list of genes from the original paper, using the Seurat function DotPlot(). To identify lung fibroblasts in Sikkema et al., data was subset for pre-defined fibroblasts, from healthy donors, where the age was reported. Donors were removed if they had less than 100 fibroblasts recorded.
Dimensionality reduction and differential gene expression analysis
The top 2000 variable genes were identified using the FindVariableFeatures function [26], and these were subsequently scaled to ensure a mean expression of 0 and variance of 1 across all cells. K-nearest neighbour algorithms were used based on PCA distance, followed by the Louvain algorithm to determine clusters. The resolution of all cluster-based functions was determined systematically using the Clustree package [29]. These clusters were visualised on UMAP plots. To identify differentially expressed genes (DEGs) in the clusters, FindAllMarkers default Seurat function was implemented. Genes must be detected in > 25% of cells in at least 1 cluster and have a logFC greater than 0.25.
Machine learning senescence classifier
The ScPred package developed by the Powell group was used to build ML models to classify senescence (https://github.com/powellgenomicslab/scPred/)â [30]. Unless otherwise stated, default parameters of ScPred V1.9.2 were used. Data was separated into training (80%) and testing (20%) datasets. ScPred ML models were then built on training data using all principal components, using a variety of methods including Support Vector Machines with Radial Basis Function Kernel (SVM), Mixture Discriminant Analysis (MDA), K-nearest neighbours (KNN), and Generalised Linear Model (GLM). The models were evaluated based on their classification of the testing dataset through a variety of methods, including ROC, sensitivity, and specificity, and the best performing model (MDA) was taken forwards. This utilised the CRAN package mda_0.5â4. For the MDA model, a fivefold cross-validation was used. These models could then be applied to classify senescence in alternative datasets.
Trajectory analysis
Monocle3 V1.3.1 was used to analyse the trajectory of the PDL50 cells [31â33]. All default parameters were used unless otherwise stated. The earliest principal node was calculated using the function get_earliest_principal_node, where the node most surrounded by EP cells was selected. The cells were then ordered based on pseudotime.
Matrisome analysis
NABA_MATRISOME human gene set was obtained from Petrov et al., and the median log normalised gene count expression was calculated for the 2D, 3D, Tabib et al. [19] and Solé-Boldo et al. [20] datasets. Genes which had a median count of zero for all conditions were removed, and the final list of 147 matrisome genes is reported in Supplementary Table S2. The gene counts were converted into a heatmap using heatmap.2(), and Ward hierarchical clustering was applied.
Marker correlation plots
Normalised gene counts were extracted from the Tabib and Solé-Boldo data, using a list of pre-defined senescence genes: CDKN1A, CDKN2A, IGFBP3, SERPINE1, IGF1, CXCL12, IL6, MMP3 and TP53. The Pearson correlation coefficient was calculated between all pairs of genes, and Ward hierarchical clustering was applied to generate both x- and y-axis dendrograms. The results were then visualised on a heatmap using heatmap.2().
Digital PCR
Early Proliferative (EP â P10 â P12) and Deeply Senescent (DS â P39 + 3) FLF1 HDF cell pellets were lysed with QIAzol (217,004, Qiagen miRNeasy Mini Kit), periodically vortexed for 5 min and snap frozen at -80C. Total RNA extraction was performed using the miRNeasy Mini Kit (217,004, Qiagen), following the manufacturerâs guidelines (âQuick-Start protocol miRNeasy Mini Kitâ guide, Qiagen). Following this, RNA ethanol precipitation was performed to increase sample purity, where 30 mL of RNA was mixed with 75 ÎŒl of 100% ethanol (34,852, Sigma Aldrich), 3 ÎŒl of sodium acetate 3 M at pH 5.2 (R1181, Thermofisher Scientific), and 1 ÎŒl GlycoBlueTM Coprecipitant (AM9515, Invitrogen), and incubated overnight at â 20 °C. The sample was centrifuged at 12,000 g for 10 min at 4 °C, resuspended in 500 ÎŒl of 80% EtOH, and centrifuged a second time at 12,000 g for 10 min at 4 °C. The pellet was dried for 2 h at RT and resuspended in 20 ÎŒl RNAse-free distilled water (217,004, Qiagen miRNeasy Mini Kit).
Precipitated RNA was converted to cDNA using the SuperScriptTM III Reverse Transcriptase kit, as per the manufacturerâs instructions (18,080,044, Invitrogen). cDNA concentration was measured using the Qubit 2.0 Fluorometer (Thermofisher Scientific), and absolute quantification of cDNA was carried out using SYBR Green I dye (S7563, Thermofisher Scientific) on the Applied Biosystems QuantStudio Absolute Q Digital PCR System. Digital PCR was carried out using the Absolute Q DNA digital PCR Master Mix (A52490, Thermofisher Scientific), with a QuantStudio Absolute Q MAP16 plate kit (A52865, Thermofisher Scientific), as per manufacturerâs instructions. For analysis, absolute cDNA counts were normalised to 1Â ng/ml of cDNA loaded.
Results
Generating and evaluating machine learning models of replicative fibroblast senescence
In an effort to develop a ML model of senescence classification, we cultured adult human dermal fibroblasts (HDFs) to replicative senescence (passage 39) and allowed them to reach a deeply senescent state over a further 3 weeks in culture (DS; P39 + 3). Single-cell RNA sequencing (scRNA-seq) was carried out on early proliferative (EP) and deeply senescent (DS) HDFs cultured in a two-dimensional monolayer (extensive model validation previously reported in Wallis et al. [34]) (Fig. 1a). Using k-nearest neighbours (KNN), seven independent clusters were identified in the EP and DS HDF dataset (Fig. 1b). Overlaying these clusters with the originating cell populations showed distinct clustering of EP and DS fibroblasts, highlighting the discrete mRNA profiles of the two populations, together with inherent heterogeneity within both the EP and DS fibroblast populations (Fig. 1c). Given that the EP and DS states were transcriptionally distinct, we proceeded to build a ML model to distinguish the two conditions.
The scRNA-seq data was split into training (80%) and testing (20%) datasets, for model development and evaluation. Four ML models were explored, namely support vector machine (SVM; Fig. 1d), mixture discriminant analysis (MDA; Fig. 1e), k-nearest neighbours (KNN) and generalised linear model (GLM) (Supplementary Table 1). The MDA confusion matrix was the most accurate and highlighted a marginally higher percentage of true positive DS cells than the SVM model (0.49% higher). Both the MDA and SVM models have an ROC score of 1, and a sensitivity of 0.999, but the MDA model had higher specificity than the SVM model (1 versus 0.99, respectively). KNN and GLM were also tested but had marginally lower evaluation metrics (KNN ROC: 0.998, Sens: 0.978, Spec: 0.992; and GLM ROC: 0.999, Sens: 0.996, Spec: 0.994, respectively). These evaluation metrics are displayed in Additional File 1: Table S1. For this reason, the MDA model was selected for subsequent work. Comparing the known classification of the testing dataset (Fig. 1f), with the MDA model predictions (Fig. 1g) shows a striking prediction accuracy of 99.79%. Although intra-experimental, this testing data is previously âunseenâ, suggesting model overfitting is unlikely and instead, the two classification outcomes are transcriptionally distinct. This work demonstrates that it is possible to build an MDA ML model of senescence classification, which can accurately predict fibroblast deep senescence in a two-dimensional monolayer. Therefore, MDA machine learning models are used throughout this work.
Machine learning models can accurately discriminate between early proliferative (EP) and deeply senescent (DS) human dermal fibroblasts.Schematic illustration of the position of early proliferative (EP) and deeply senescent (DS) human dermal fibroblasts (HDFs) on the Hayflick curve. The samples were grown in a 2D monolayer before single-cell RNA sequencing (scRNA-seq). Created with.Uniform manifold approximation projection (UMAP) of transcriptomic signatures of EP and DS fibroblasts.UMAP inoverlayed with sample type.Support vector machine (SVM) model confusion matrix generated from the testing dataset, highlighting true positives and true negatives (green), and false positives and false negatives (red).Mixture discriminant analysis (MDA) model confusion matrix generated from the testing dataset, highlighting true positives and true negatives (green), and false positives and false negatives (red).20% testing dataset clustered into a UMAP and overlayed with sample type.UMAP inoverlayed based on prediction using the MDA model, with the ScPred package a b c b d e f g f BioRender.com
Testing the models on an external dataset reveals the importance of temporal senescence kinetics
To test the MDA ML modelâs performance, we used an independent publicly available dataset from Chan et al. (GEOâ#GSE175533â), which included scRNA-seq data for increasing population doubling levels (PDLs) of WI-38 fibroblasts [16]. These fibroblasts ranged from PDL25 to PDL50, the latter of which Chan et al. described as being in an early senescent state. The transcriptional differences of the PDL50 senescent cells were apparent when the cells were clustered and plotted onto a UMAP (Fig. 2a), with the majority of the PDL46 and PDL50 cells clustering away from the main bulk of the population. Applying our MDA ML model to the WI-38 fibroblast dataset shows an enrichment for DS predictions in the UMAP locations of the early senescent PDL50 cells (Fig. 2b). However, only 10.08% of the PDL50 cells are predicted to be DS (Fig. 2c). To further investigate this, the PDL50 cells were isolated and clustered independently, producing eight clusters (Fig. 2d). Intriguingly, cluster five was largely enriched for DS-predicted cells, suggesting that this subpopulation of cells was transcriptionally similar to the DS cells in the original HDF model.
We hypothesised that the low percentage of DS prediction in the PDL50 cells could be due to temporal differences in the two experiments. The DS cells used to build the MDA ML model had undergone three weeks of culture after reaching replicative senescence, to establish and deepen their phenotype. Conversely, the PDL50 WI-38 fibroblast cells from Chan et al. were sequenced immediately upon reaching their Hayflick limit (Fig. 2e). To test this hypothesis, we applied Monocle3 trajectory analysis to the PDL50 cells [33]. Cluster five (the cluster enriched for cells which were predicted to be DSâFig. 2f), fell at the end of the predicted trajectory, suggesting that the model was detecting deeply senescent but not early senescent fibroblasts (Fig. 2g) within the PDL50 population.
Cells at PDL50 likely represent a heterogeneous population, where some cells have reached the Hayflick limit earlier and deepened their senescent phenotype. To confirm this, we examined the expression of known cell cycle arrest markers CDKN1A (p21) and CDKN2A (p16/Arf). Alcorta et al. reported an initial increase of p21 during the early stages of cell cycle arrest of human diploid fibroblasts, with a gradual increase of p16 expression as the senescence phenotype deepens [35]. Intriguingly, this was also apparent in the PDL50 cells, with those cells predicted to be DS having lower expression levels of CDKN1A (Fig. 2h) and higher expression levels of CDKN2A mRNA (Fig. 2i) than the remaining PDL50 population. It must be noted that an increase of CDKN2A expression could be due to increased expression of either p16 or Arf alternative reading frames. However, using sequence-specific primers, the DS HDFs yielded a significant increase in p16 mRNA and a significant decrease in Arf compared to their proliferative counterparts with senescence (Additional File 1: Fig. S2). It is therefore likely that there is an increased expression of p16 but not Arf mRNA in the PDL50 DS predicted cells. This increase supports our hypothesis that the MDA ML model is able to detect deeply senescent but not early senescent fibroblasts.
To build a ML model which could encompass early replicative senescent fibroblasts (ES), the PDL50 cells from Chan et al. [16] were incorporated into the model as an additional ES classifier. First, clustering the ES, EP, and DS cells revealed a more successive pattern, with the ES cells bridging the gap between the EP and DS cells (Fig. 3aâb). This was confirmed using Monocle3 trajectory analysis (Fig. 3c). This is perhaps unsurprising, as we have previously identified the ES cells to contain a heterogeneous mix with a small proportion of DS cells. The MDA confusion matrix highlights more than 90.2% true positives for all intra-experimental predictions, with the lowest percentage of true positives being for the âearly senescentâ or ES cells (Fig. 3d), which could again be explained by the heterogeneity of this population.
Applying this model, the WI-38 cells at PDL25 through to PDL50 were able to be classified as either EP, ES, or DS (Fig. 3e; original clustering in Fig. 2a). Unsurprisingly, the PDL50 cells had the highest predicted percentage of ES cells (66.6%). The percentage of predicted PDL50 cells then incrementally decreased with decreasing PDL number, with 15.7% of PDL25 cells being predicted to be ES (Fig. 3f). This suggests that even at the earlier passages, the population of fibroblasts is heterogeneous and that individual cells are on their own trajectories. Finally, for completeness, this ML predictive model for ES, EP, and DS cells was tested on the EP and DS HDF cells alone, identifying a small percentage (10.06%) of the EP cells as early senescent (Fig. 3g). However, within the DS cells, only 0.39% were predicted to be ES, indicating that the DS cells are a more homogenous population. Intriguingly, testing the ML model on two alternative senescence triggers (oncogene-induced senescence (OIS) and paracrine senescence) [10] suggests that the time spent in senescence is perhaps more important than the trigger (Additional File 1: Fig. S3aâc). The ES classifier was able to detect increasing senescence in the OIS and paracrine groups compared to the proliferating cells (27.9% and 32.9% more, respectively), but no DS cells were detected (Additional File 1: Fig. S3c).
The machine learning model can detect deeply senescent but not early senescent fibroblasts.UMAP of scRNA-seq data from fibroblasts from the external dataset (Chan et al. []) at different population doubling levels (PDL).UMAP inoverlayed based on prediction from the MDA model.Barplot representing the percentage of cells which are predicted to be deeply senescent (DS) at each PDL, using the MDA model.UMAP of the PDL50 cells from Chan et al. [].Schematic illustrating position on the Hayflick curve of early proliferative (EP) and deeply senescent (DS) human dermal fibroblasts (HDFs), and PDL50 or âearly senescentâ cells. Created with.UMAP inoverlayed based on prediction from the MDA model.Monocle3 trajectory plot of UMAP in, overlayed with predicted pseudotime trajectory.PDL50 UMAP from, overlayed with expression of CDKN1A.PDL50 UMAP from, overlayed with expression of CDKN2A a b a c d e f d g d h d i d [16] [16] BioRender.com
Building a machine learning model incorporating early senescent (ES) cells highlights high levels of heterogeneity throughout progressive population doublings.UMAP of combined scRNA-seq data from EP and DS HDFs, and PDL50 WI-38 fibroblasts from Chan et al. []UMAP inoverlayed with sample type.Pseudotime trajectory of UMAP in.Confusion matrix of MDA model of EP, ES, and DS cells, highlighting true positives and true negatives (green), and false positives and false negatives (red).UMAP of all WI-38 PDLs from Chan et al. [] (Fig.a), overlayed with prediction from EP/ES/DS 2D model.Barplot showing the percentage of predicted ES cells at each PDL.Barplot showing the percentage of predicted ES cells within the EP and DS HDF populations a b a c b d e f g [16] [16] 2 .
Testing the models using in vivo data
To investigate whether the EP/ES/DS model can detect senescence in vivo, the models were tested on two independent publicly available whole skin scRNA-seq datasets from Tabib et al. [19] (data kindly provided by corresponding author) and SolĂ©-Boldo et al. [20] (#GSE130973â). Clustering the fibroblasts from Tabib et al. [19] revealed five clusters (Fig. 4a), which did not appear to stratify based on the age of the donors (Fig. 4b), or by their predicted senescence state (Fig. 4c). Analysis of the second whole skin scRNA-seq data from SolĂ©-Boldo et al. [20] generated similar findings. Clustering these fibroblasts formed 11 distinct clusters (Fig. 4d), and these did appear to have some age-associated influences (Fig. 4e). Particularly, the 53-year-old donor clustered alone, which could suggest fibroblast abnormality for this individual, or perhaps issues with tissue isolation and processing. Intriguingly, this abnormality was also reflected in the percentage predictions of EP, ES, and DS cells, with the 53-year-old donor having a very high prediction of ES cells (73.9%) (Fig. 4fâg).
Combining the percentage of EP, ES and DS predictions from both the Tabib [19] and SolĂ©-Boldo [20] datasets revealed a remarkably high prevalence of ES and DS predicted cells, regardless of donor age (Fig. 4g). The combination of DS and ES predicted cells makes up a minimum of 44.18% of the fibroblasts (24-year-old donor), and a maximum of 78.02% of the fibroblasts (53-year-old donor). Evidence within the literature suggests that senescent fibroblasts within human skin in vivo make up approximately 8â15% of the fibroblast population [36â39]. However, this literature relies on single markers which are not necessarily specific to senescence, or sufficient to detect all heterogeneous senescent populations, and making these predictions is confounded by a lack of ground truth in vivo [2]. Despite this, the large disparity between the model prediction and literature reports suggests that the model is not accurately able to identify senescent fibroblasts in vivo.
The 2D machine learning models cannot accurately be applied to in vivo datasets.UMAP of scRNA-seq of in vivo fibroblasts from Tabib et al. [].UMAP shown in, overlayed with donor age.UMAP shown in, overlayed with MDA EP/ES/DS model prediction.UMAP of scRNA-seq of in vivo fibroblasts from Solé-Boldo et al. [].UMAP shown in, overlayed with donor age.UMAP shown in, overlayed with MDA EP/ES/DS model prediction.Barplot showing percentage model prediction of EP/ES/DS cells for each donor from both Tabib [] and Solé-Boldo [] datasets inand a b a c a d e d f d g c f [19] [20] [19] [20]
Building in vitro 3D models improves senescent cell detection in vivo
In recent years, the limitations of culturing cells in a two-dimensional monolayer have been widely reported. Most importantly, cells cultured in a monolayer do not replicate the complexities and depth of the extracellular matrix in vivo, which can have wide implications on cell growth and signalling processes [12â15, 40, 41]. Consequently, three-dimensional organotypic models have been developed in an effort to recapitulate in vivo cell behaviour more accurately. For this reason, we built a 3D fibroblast matrigel-collagen dermal gel containing either EP or DS HDFs, which underwent 10X scRNA-seq (Fig. 5a). To determine whether the transcriptome of fibroblasts grown in 3D or 2D was more similar to fibroblasts in vivo, a heatmap was constructed comparing the core matrisome (structural ECM proteins) between these groups (Additional File 1: Fig. S4) [42]. The core matrisome proteins measured are listed in Additional File 2; Supplementary Table S2. Hierarchical clustering reveals two initial branched clusters, where the fibroblasts grown in 2D cluster alone, and the fibroblasts grown in a 3D gel clustered alongside the in vivo fibroblasts from both Tabib et al. [19] and SolĂ©-Boldo et al. [20], suggesting fibroblasts grown in a 3D gel are more similar to those in vivo based on their expression of ECM components.
Clustering of the fibroblasts grown in a 3D gel revealed six distinct groups, and a clear separation of EP and DS cells (Fig. 5bâc), highlighting that the transcriptomic differences between EP and DS cells were maintained in three-dimensional culture. Next, a MDA ML model was built to classify the cells from the dermal gels as EP or DS (called 3D SenPred), and intra-experimental testing revealed over 97.92% true positives (Fig. 5d). Promisingly, applying this ML model to the dataset from Tabib et al. [19] revealed a more physiological level of senescence prediction, with the maximum percentage of senescent fibroblasts being predicted in the 63-year-old (21.34%) (Fig. 5e). This was recapitulated when the 3D SenPred model was applied to the SolĂ©-Boldo [20] dataset, with the highest predicted senescent cell burden being 32.79% in the 69-year-old donor (Fig. 5f).
Mine et al. reported that papillary fibroblasts change with age compared to reticular fibroblasts, including increasing secretion of keratinocyte growth factor, increased age-related contraction of collagen gels, and decreased population doublings and clonogenic capacity with age [43], suggesting that papillary fibroblasts are likely to be the fibroblast subtype to undergo replicative senescence in vivo. Therefore, we investigated the proportion of senescent cells within the four fibroblast subtypes defined by Solé-Boldo et al. [20]: secretory reticular, pro-inflammatory, secretory papillary, and mesenchymal. Promisingly, the secretory papillary fibroblasts from Solé-Boldo et al. [20] had the highest percentage of senescence prediction (40.25%), providing increased confidence that the ML model is detecting senescent cells in vivo (Additional File 1: Fig. S5).
Characterisation of senescence classically relies of the use of a selected panel of markers. Therefore, we were interested to explore how the expression of a panel of 9 commonly used senescence markers (CXCL12, CDKN1A (p21), CDKN2A (p16/Arf), IGF1, IGFBP3, IL-6, MMP3, SERPINE1, TP53) compared on a per cell basis. We started by using the scRNAseq data for the in vitro DS3D and EP3D cells because these have a state of âground truthâ. Using this data, we generated Pearson correlation coefficients for the comparison of normalised gene counts for each of the senescence markers, displaying the resulting correlation coefficients in a heatmap (Additional File: Fig. S6aâb). The highest positive correlation coefficient in the DS cells was when IGFBP3 and CXCL12 were compared (0.39), and this compared to 0.28 in the EP3D cells (Additional File: Fig. S6aâb). 1 1
In parallel, we generated Pearson correlation coefficients for the cells that the 3D SenPred model predicted as DS or EP within the Tabib [19] and SolĂ©-Boldo [20] datasets (Fig. 5g and Additional File 1: Fig. S6C, respectively). Again, the highest correlation in the predicted DS cells was between IGFBP3 and CXCL12 (0.35), compared to 0.22 in the predicted EP cells. The Pearson correlation coefficient between CDKN1A and CDKN2A expression in the DS and EP predicted cells were 0.0046 and â 0.0022, respectively (individual correlation plots provided in Additional File 1: Fig. S6dâe), and between IGFBP3 and TP53 the correlation coefficient was 0.0225 and â 0.0126, respectively (individual correlation plots provided in Additional File 1: Fig. S6fâg) [22]. Further, 49.38% of gene pairs have a Pearson correlation coefficient of zero or less than zero (i.e. a negative correlation) in the DS predicted cells. Overall, this exercise revealed the absence of a strong correlation between pairs of commonly used senescence markers at the single cell level both in vivo and even in vitro where we have a state of ground truth, and emphasises how using preselected markers might only detect a subset of senescent cells. Taken together, this data further highlights the value of using SenPred, a holistic machine learning model built using thousands of genes, to detect the likelihood of a cell being senescent based on more nuanced changes, with the potential for improved identification of heterogeneous senescent cells whilst removing the likelihood of single-marker based false positives or false negatives.
Next, we expanded the in vivo datasets to incorporate a further 12 whole skin scRNA-seq dataset donors recently generated by Ganier et al. [22]. First, the fibroblasts were identified in line with Ganier et al. [22] (Additional File 1: Fig. S7). We then used the 3D SenPred model to predict the percentage of DS fibroblasts. Finally, we combined a total of 23 donor samples from the three independent in vivo datasets: Tabib [19], SolĂ©-Boldo [20], and Ganier [22] to explore how the SenPred output changes relative to donor age (Fig. 5H). We observed that the percentage of DS predicted fibroblasts is significantly positively correlated with age (Pearson = 0.471, p value = 0.023), indicating that SenPred generated from 3D fibroblasts successfully predicts increasing percentages of deeply senescent fibroblast cells with age. Crucially, SenPred generated from 2D fibroblasts does not show any significant trend with age (Pearson = â 0.26, p = 0.23), further confirming the advantage of utilising data generated with cells grown in a 3D environment (Additional File 1: Fig. S8a). Unsurprisingly, the trend is also lost when the model is tested on whole skin (Additional File 1: Fig. S8b) or on fibroblasts isolated from lung tissue (Additional File 1: Fig. S8c) [24], highlighting the modelâs tissue and cell type specificity in vivo.
In summary, the SenPred pipeline allows the generation of an unbiased predictive model of dermal fibroblast senescence. Using scRNA-seq data from cells with a known senescence classification, ML models can be generated and evaluated both intra- and inter-experimentally (Fig. 5i). In the future, these models could be generated in multiple tissues, trigger and cell-type-specific contexts of interest. The models can then be applied to classify senescence within unknown datasets. Importantly, using scRNA-seq data generated from cells grown in 3D allows the detection of senescent cells in vivo.
A ML model built from 3D dermal gels more accurately recapitulates senescence in vivo.Schematic illustration of the position of early proliferative (EP) and deeply senescent (DS) human dermal fibroblasts (HDFs) on the Hayflick curve. The samples were grown in a 3D Matrigel-collagen matrix before single-cell RNA sequencing (scRNA-seq). Created with.UMAP clustering of scRNA-seq of HDFs grown in a 3D matrigel matrix.UMAP in, overlayed with sample type.Confusion matrix of MDA model of EP and DS 3D cells, highlighting true positives and true negatives (green), and false positives and false negatives (red).Fig.a, overlayed with 3D ML model prediction.UMAP in Fig.d, overlayed with 3D ML model prediction.Pearson correlation matrix of normalised gene counts for a selected list of classic senescence markers, in the DS predicted cells from Tabib [] and Solé-Boldo [].Dotplot of the percentage of DS predicted cells using SenPred 3D, compared with donor age, for Tabib [], Solé-Boldo [], and Ganier [] data.Flowchart showing the steps of the SenPred pipeline a b c b d e f g h i BioRender.com 4 4 [19] [20] [19] [20] [22]
Discussion
The term âsenescenceâ encompasses a wide variety of cellular outcomes, which display diverse context-dependent states [1â3]. For this reason, there is no universal marker of senescence, and we believe that characterisation and classification of senescence should focus on identifying specific senescent cell subtypes. It is also important to consider the heterogeneity of these subtypes, particularly the heterogeneity of in vivo aged tissue, where the minority of cells within the tissue are likely to be senescent [36]. The emergence of scRNA-seq datasets has allowed for a greater understanding of this heterogeneity at the transcriptomic level. However, the lack of a single established senescence marker means identifying senescent cells within these datasets is arduous and requires prior knowledge of the specific senescent cell subtype. Recently, the SenNet Consortium was established [44], with the overarching aim of providing an atlas of senescence characterisation and mapping. With the growth of this database, it is important to have a tool which will allow rapid classification of senescence based on the characterisation atlas. Although AI and ML are emerging techniques for the classification of cellular subtypes, particularly within whole tissues or disease states, to date ML has not been used to classify senescence based on scRNA-seq data. Previously published work focussed on morphological classification or using bulk RNA sequencing datasets which perhaps miss the nuanced nature of senescence at the single cell level [7â11].
Within this work, we develop a ML model which can successfully classify deeply senescent fibroblasts in both internal and external in vitro datasets. Intriguingly, the depth of senescence appears more important than the trigger type when applying the model in vitro, as an early senescent classifier can detect oncogene-induced and paracrine senescent cells. Applying the ML model based on scRNA-seq from fibroblasts grown in 2D was unable to successfully determine senescence in fibroblasts from in vivo tissues, predicting an unrealistic senescent cell burden. Sequencing cells which were grown in a 3D environment more closely recapitulated reported senescent cell numbers in vivo and also detected senescence in the expected fibroblast subtypes. This suggests that future efforts to further characterise senescent cell heterogeneity using scRNAseq should consider harvesting cells following 3D rather than 2D culture. However, it is equally important to note that there are numerous environmental differences between cells grown in 3D and those in in vivo [12â15].
Current recommendations are to profile senescent cells using a panel of selected senescence markers. When we asked how these markers correlated in vitro (where the ground truth was known) and in vivo at the single cell level we found that the degree of positive correlation was at best 0.39 in vitro and 0.35 in vivo, with the majority of the marker pairs showing weak or negative correlation. This observation underscores the transcriptional heterogeneity of senescence at the single-cell level and the potential for bias in senescent cell identification based on a small marker panel. It also highlights the advantage of using the SenPred pipeline, which allows the detection of numerous, holistic, and bidirectional changes across 2,000 genes, enabling the detection of heterogeneous senescent cell subgroups which could otherwise be overlooked.
Using currently available scRNAseq datasets, we were able to detect a significantly increasing percentage of predicted fibroblast senescence with age. Importantly, when comparing SenPred with other correlation-based skin ageing markers, SenPred was able to reach a similar and significant positive correlation with a smaller sample size (Pearson = 0.471, p value â 0.023, 23 donors). For example, the work by Demanelis et al. used 286 independent donors to explore the relationship between telomere length and age in sun-unexposed skin, generating a significant negative Pearson correlation of -0.26 [45]. This comparison highlights a potential strength of the SenPred approach, whereby the use of a large number of markers to build a model could increase accuracy, reduce the influence of biological variation and potentially reduce the number of donors required to conduct future studies. Age-related correlation data inevitably contains biological variability. In the future, the ability to carry out physiological assessments of the donors could also strengthen the approach and allow for the comparison of biological versus chronological age. The existence of survivor bias should also be considered, whereby a high senescent cell burden is likely associated with increased mortality.
The evaluation of model performance in vivo represents a clear limitation of this study. At present, the full spectrum of in vivo triggers of senescence and thus potential signatures have yet to be elucidated, and thus the state of âground truthâ is currently unknown. Consequently, there is somewhat of a catch-22 when attempting to truly define senescent cell burden in vivo. The opportunity to use spatial transcriptomics alongside a classic senescence marker, although perhaps reductive based on the nuanced nature of senescence, may allow a more accurate evaluation of model performance in vivo. The advantages of multi-modal AI could be further explored, combining multiple âomicsâ datasets to improve senescence prediction.
Conclusions
In summary, we position SenPred as a proof-of-concept pipeline, with the future aspiration to build a more holistic ML model, whereby the model could detect multiple subtypes of senescence based on multiple triggers and cell types. Presently, it is unclear whether the 3D SenPred model can detect multiple triggers in vivo, but the model does appear to be cell-type and tissue specific. Currently, the datasets to build the model with multiple triggers and cell types are unavailable, but we envision this to change in line with the goals of the SenNet consortium [44]. The emergence of this novel data will allow continuous model testing and improvement, increasing confidence in the predictive power of SenPred. It should be noted, however, that this work has been generated using adult replicative senescent cells, arguably the most experimentally challenging model, and therefore implementation with other triggers should be more straightforward for others to build upon.
A holistic senescence prediction model would have multiple clinical benefits, including predicting individual patient senescent cell burden and trajectory, and the likelihood of senescence-associated age-related diseases. This could enable prediction of patient lifespan, personalised, early interventions, and potential efficacy of therapeutic interventions. For example, the model could be used to test senolytic activity by measuring the clearance of senescent cells. This would support work by Smer-Barreto et al., using ML to discover novel senolytics [46]. The SenPred workflow has potential wider utility, as it could be applied to additional cellular contexts and disease states beyond senescence, simply requiring scRNA sequencing of cells with context-specific ground truth. This would allow the detection of heterogeneous cells within tissues, prediction of disease burden, and application and evaluation of personalised medicines. In conclusion, we believe ML is a valuable tool which can contribute towards the collective goal of the senescence field to characterise and classify senescence and could have wider application across numerous disease states.
Supplementary Information
Acknowledgements
CLB acknowledges funding for BKH from the BBSRC (BB/V509668/1) and Unilever, DM from the BBSRC (BB/N503629/1) and Unilever, RW from Barts Charity (MGU0537) and FM from the BBSRC (BB/T008709/1).
Authorsâ contributions
BKH and CLB developed the initial concept. BKH, AD, RW, LJW, DG and CLB finalised the concept, developing and designed experiments. BKH, AD, DM, RW, FM analysed data. BKH and AD developed SenPred. BKH, DM and FM performed all experiments. BKH and CLB wrote the initial manuscript draft with all authors contributing to revisions of manuscript prior to submission. All authors read and approved the final manuscript.
Data availability
Package dependencies
All packages and versions used in this code are available at the following URL: https://github.com/bethk-h/SenPred_HDFâ [47].
Data Availability
The scRNAseq data generated as part of this work has been uploaded to GEO #GSE282425â [48].
Code Availability
All code for this project is accessible via the following URL: https://github.com/bethk-h/SenPred_HDFâ [47].
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Authors AD, LJW and DAG are Unilever employees. The remaining authors declare that they have no competing interests.
Footnotes
References
Associated Data
Supplementary Materials
Data Availability Statement
Package dependencies
All packages and versions used in this code are available at the following URL: https://github.com/bethk-h/SenPred_HDFâ [47].
Data Availability
The scRNAseq data generated as part of this work has been uploaded to GEO #GSE282425â [48].
Code Availability
All code for this project is accessible via the following URL: https://github.com/bethk-h/SenPred_HDFâ [47].