What this is
- Lung adenocarcinoma (LUAD) is the most prevalent type of lung cancer, contributing significantly to cancer-related mortality.
- This research investigates the role of () genes in predicting survival outcomes for LUAD patients.
- By analyzing gene expression data, a prognostic model was developed based on -related genes, specifically identifying PTPRC and CD19 as key indicators.
Essence
- A prognostic model based on 106 -related genes predicts survival in LUAD patients, with PTPRC and CD19 identified as significant markers. The model demonstrates robust predictive ability, especially when combined with clinical parameters.
Key takeaways
- The study identified 106 -related genes significantly associated with overall survival in LUAD patients. These genes are primarily involved in immune response and cell adhesion.
- PTPRC and CD19 emerged as key genes with strong connections to immune activity, indicating their potential as prognostic indicators for LUAD.
- The developed prognostic model achieved an area under the curve (AUC) of 0.716 for 3-year survival prediction, highlighting its clinical utility.
Caveats
- The study relies on retrospective data from TCGA and GEO databases, which may introduce biases inherent in such datasets.
- The model's performance may vary across different populations, necessitating further validation in diverse cohorts.
Definitions
- tumor microenvironment (TME): The surrounding cellular environment of a tumor, including immune and stromal cells, which influences tumor behavior and patient outcomes.
AI simplified
Introduction
Lung cancer is the most common cause of cancer-related deaths worldwide (). Non-small cell lung cancer (NSCLC) represents 85% of lung cancers, mainly including lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LSCC) (). Notably, the incidence of LUAD significantly increased and surpassed of LSCC and it constitutes nearly 40% of all lung malignancies (). Although with multiple kinds treatment methods, including surgery, chemotherapy, radiotherapy, and dramatic treatment shift of some targeted therapeutic agents, the prognosis for LUAD patients remains poor worldwide, with 5-year relative survival currently at 18% (;;). Therefore, understanding the mechanism of carcinogenesis and therapeutics of lung cancer is quite important. [Chen et al., 2016] [Losanno & Gridelli, 2016] [Nakamura & Saji, 2014] [Siegel, Miller & Jemal, 2015] [William et al., 2009] [Shaw et al., 2011]
At present, the pathogenesis of LUAD has not been adequately described; meanwhile, there are increasing study support the view of tumor microenvironment critically influence gene expression of tumor tissues, and the clinical outcomes further. The tumor microenvironment (TME) is a complicated mixture, immune and stromal cells are two major types of non-tumor factors. They have been certified to promote the development of diagnostic and prognostic assessment process of lung cancer (;;). Hence, exploring the molecular composition and function of the TME is critical to effectively manage cancer progression and immune response (;). [Cui et al., 2015] [Zhan et al., 2015] [Zhao, Li & Tian, 2018] [Wood et al., 2014] [Li et al., 2017]
Previous studies have solved the problem of the complexity of tumor infiltrating immune cells in LUAD. Most of these studies assessed tumor-infiltrating immune cells by immunohistochemical analysis of a single marker (;;). Fortunately, computational analysis of tumor immune cell interactions is now available by bioinformatics tools. In 2013, Yoshihara firstly reported an algorithm called ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data). The infiltration level of stromal and immune cells was predicted by calculating stromal and immune score based on data from the Cancer Genome Atlas (TCGA) data sets by ESTIMATE (). Subsequent reports applied the innovative algorithm to different cancer, such as prostate cancer (), breast cancer (), and colon cancer (), which further confirmed the effectiveness of the algorithm based on big database. [Mansuet-Lupo et al., 2016] [Driver et al., 2017] [Connolly, Nader & Joshi, 2018] [Yoshihara et al., 2013] [Shah et al., 2017] [Priedigkeit et al., 2017] [Alonso et al., 2017]
We analyzed immune/stromal score of LUAD cohorts from TCGA, which was derived from ESTIMATE algorithm, and extracted a series of TME associated prognostic gene in LUAD. Among those TME associated prognostic gene, PTPRC and CD19 were of particular interest for they are highly interconnected in the PPI network, and closely correlated with immune-related activity by GSEA analysis and CIBERSORT analysis. Combining those prognostic genes, we constructed a prognostic model which could provide a moderate OS prediction for LUAD alone, and provide a robust prediction with clinical parameters.
Materials and Methods
Data acquisition
A total of 576 samples, gene expression RNAseq and clinical information such as pathological stage, survival of LUAD were downloaded from the TCGA database and prepared for the analysis of differential expression. The data set was submitted by University of North Carolina TCGA genome characterization center based on the Illumina HiSeq 2000 RNA Sequencing platform (Oct 13, 2017). We downloaded the estimate, immune and stromal score of 517 LUAD samples from TCGA in ESTIMATE website. Stromal score captures the presence of stroma in tumor tissue, immune score represents the infiltration of immune cells in tumor tissue, and estimate score that infers tumor purity. The estimate score is equal to the immune score plus the stromal score. Fifty-nine samples were deleted for lack of immune and stromal score in ESTIMATE website. Gene expression profiles and LUAD clinical covariates (stage, age, sex) were obtained from GEO database with the accession number, consisting of 442 samples (). After using the Robust Multichip Averaging (RMA) method to normalize the sequence matrix data, the data was used to validate the differential expression gene and test the prognostic model. The data set was submitted by Mervi Heiskanen based on Affymetrix HG-U133A (May 01, 2015) (). A total of 133 non-small cell lung cancer samples were downloaded from GEO database with the accession number, were to compare the prognostic model with other gene expression signature. GSE68465 GSE14814 [Shedden et al., 2008] [Marti et al., 2003]
Distinguishing of differentially expressed genes
The raw data of TCGA was pre-processed by limma algorithm (). The adjusted-values (adj. p) < 0.05 and |Log2 (FC)| > 1 were set as the cut-offs to screen for differentially expressed genes (DEGs). [Ritchie et al., 2015] p
Heatmaps and clustering analysis
ClustVis web tool was used to create heatmaps and clustering (). [Metsalu & Vilo, 2015]
Enrichment analysis of DEGs
DAVID conducted functional enrichment analysis, including biological process, molecular function and cell component to analyze DEGs (), an essential foundation for visualization, annotation, and integrated discovery. KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways was also performed by the database of DAVID. Whole genome was the background using DAVID. False discovery rate (FDR) < 0.05 was used as the cut-off. [Huang, Sherman & Lempicki, 2009]
Screening of survival-related DEGs
KaplanâMeier plots and univariate Cox regression were used to visualize the association between the genes expression and overall survival of patients to explore its prognostic value. Statistical significance was examined using the Log-rank test.< 0.05 were considered statistically significant. p
Integration of PPI network and screening of modules
Search Tool for the Retrieval of Interacting Genes (STRING) online database was used to retrieve predicted PPIs. Only interactions with a combined score >0.4 of all associations obtained in STRING were selected to construct the PPI network using Cytoscape software (). The Molecular Complex Detection (MCODE) plugin in Cytoscape was then utilized to find clusters of the PPI network. Top 8 nodes ranked by degree were identified as hub genes. [Bader & Hogue, 2003]
Gene set enrichment analysis
From Molecular Signatures Database, Hallmark and C7 gene sets v6.2 collections were downloaded as the target sets. GSEA performed using the software gsea-3.0. 517 samples from TCGA were used for GSEA, and only gene sets with nominal (NOM)-value < 0.05 and FDR < 0.25 were considered as significant. p
TICs profile
CIBERSORT computational method was applied for estimating the tumor-infiltrating immune subsets (TICs) abundance profile in all tumor samples, which followed by quality fifiltering that only 517 tumor samples with< 0.05 were selected for the following analysis. p
Construction of the risk assessment model
To reduce the dimensionality and eliminate collinearity, factor analysis was performed on 106 prognostic genes. Then we used the Step-wise multivariate Cox regression analysis to obtain factors correlated with overall survival significantly (SurFactors). By combining the score coefficient of factors weighted by their regression coefficients and standard deviation, the risk index of each gene was calculated as follows: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$${ {\rm \alpha} _i} = \displaystyle{{\sum\limits_{j = 1}^m {{{\rm \beta} _j} \times fa{c_{ij}}} } \over {st{d_i}}}$$\end{document} α i = â j = 1 m ÎČ j c i j Ă f a s t d i
wherewas the risk index of gene,the number of SurFactors,the regression coefficient of factorin multivariate Cox regression analysis,the factor score coefficient of-th factor over geneandthe standard deviation of gene. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${ {\rm \alpha} _i}$\end{document} α i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$i$\end{document} i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$m$\end{document} m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${{\rm \beta} _j}$\end{document} ÎČ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$j$\end{document} j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$fa{c_{ij}}$\end{document} f a c i j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$j$\end{document} j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$i$\end{document} i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$st{d_i}$\end{document} s t d i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$i$\end{document} i
By combining the expression values of prognostic genes weighted by their risk index, the following risk scores can be established for each patient: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$$Risk\_score = \sum\limits_{i = 1}^n {{{\exp }_i}} * {\alpha _i}$$\end{document} R i s k _ s c o r e = â â i = 1 n exp i α i
wherewas the number of prognostic genes, andthe expression value of gene. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$n$\end{document} n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\exp _i}$\end{document} exp i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$i$\end{document} i
Survival and statistical analysis
We calculated risk score (RS) for every sample from testing data set based on the prognostic model and divided the specimens into two different groups based on the median RS. KaplanâMeier survival curves were drawn and compared the two subgroups via log-rank tests. We divided the samples into two group according to their overall survival, receiver operating characteristic (ROC) curves were drawn by IBM SPSS statistics 22, and area under the curve (AUC) was calculated. Moreover, the univariate and multivariate analyses of survival were conducted to identify the prognostic factors for LUAD patients from TCGA data set. A nomogram and Calibration plots were established based on the TCGA LUAD cohort. The nomogram and calibration plot analysis were conducted by using the R package ârmsâ and ârmda.â All tests were two-tailed, and< 0.5 was considered statistically significant. All statistical analyses were conducted using R software version 3.4.2. p
Results
ESTIMATE algorithm of LUAD
Transcriptional expression profiles and clinical information of 517 LUAD patients were collected from TCGA database. Among them, 277 (53.6%) patients were female, 240 (46.4%) were male. Pathological stage included 277 patients (53.6%) of stage I, 122 (23.6%) stage II, 84 (16.3%) stage III, 26 (5%) stage IV and 8 (1.5%) unknown. Based on ESTIMATE algorithm, the scores of stromal and immune were calculated, ranging from â1355.85 to 3286.67 and â1959.31 to 2098.77, respectively. The Estimate score was significantly associated with Pathological stage (,= 0.0436). The lowest Estimate score was at the most advanced pathological stage, the stage IV. Immune score showed remarkable prognostic potential, correlated with the pathological stage (,= 0.026), while stromal score showed no correlation with the pathological stage (,= 0.145). Fig. 1A Fig. 1B Fig. 1C p p p
To explore whether the potential correlation existing between survival benefits and immune/stromal score, 517 LUAD patients were divided into high and low score groups based on their scores. KaplanâMeier survival curves () showed that immune score significantly correlated with overall survival. Compared to the cases in the low score group, patients with high immune score have longer median overall survival (1725 d vs 1235 d,= 0.0152 in log-rank test). Although there was no statistically significant correlation between the stromal score and overall survival, patients with higher stromal score had longer median overall survival (, 1830 d vs 1293 d,= 0.0599 in log-rank test). Fig. 1D Fig. 1E p p

The relationship between Estimate/immune/stromal score and the prognosis of LUAD samples from TCGA. (A) Estimate score was significantly associated with pathological stage (= 0.0436). (B) Immune score showed predictive potential to pathological stage (= 0.026). (C) Similarly, the lowest stromal score was found in the most progressive clinicopathological stage IV, however, it was not statistically significant (= 0.145). (D) Immune score was significantly correlated with overall survival of LUAD samples (= 0.015). (E) Stromal score was not statistically significant correlated with overall, but the median overall survival of cases with higher stromal scores also showed longer than the patients with lower stromal scores (1830 d vs 1293 d,= 0.0599,= 0.145). p p p p p p
Identification of DEGs
To identify DEGs profiles according to immune/stromal score, we obtained the gene expression array of 517 LUAD patients obtained TCGA database. Based on the comparison of immune score of high group and low group, after analysis with the limma software package algorithm, there were 903 genes up-regulated and 56 genes down-regulated. We draw the heatmap with the top highly variate gene to (). Similarly, 1,007 up-regulated genes and 30 down regulated genes were obtained for high stromal score compared with low score (). It is obvious that up-regulated DEGs, no matter for comparison based on immune score nor stromal score, take the major part in total DEGs. Therefore, we decide to focus on up-regulated DEGs in further analysis. Fig. 2A Fig. 2B

Differential expression genes derived from the comparision of immune/stromal high score group vs low score group. (A and B) Heatmap of top 50 highly variaed gene from immune score/stromal score group. Genes with higher expression are shown in red, lower expression are shown in green, genes with same expression level are in black. (C and D) Venn diagram of common up/down regulated of immuneDEGs and stromal DEGs.
Enrichment analysis of DEGs
To reveal the biological function of the DEGs, GO and KEGG pathway enrichment analysis were performed for the up-regulated DEGs. GO analysis showed that immune up-regulated DEGs were remarkably enriched in immune response by biological processes, receptor activity by molecular functions, and plasma membrane by cellular component, respectively (â). Similarly, stromal up-regulated DEGs mainly enriched in cell adhesion by biological processes, extracellular matrix binding by molecular functions, and extracellular space by cellular component, respectively (â). KEGG pathway were both mainly enriched in cytokine-cytokine receptor interaction and cell adhesion molecules (CAMs) pathway (;). Figs. 3A 3C Figs. S1A S1C Fig. 3D Fig. S1D

GO term and KEGG pathway analysis for immune up-regulated DEGs. Top 10 pathways. False discovery rate (FDR) of GO analysis was acquired from DAVID functional annotation tool.< 0.05. (A) biological process, (B) cellular component, (C) molecular function, (D) KEGG pathway. p
Overall survival analysis of DEGs
To make further efforts to elucidate if up-regulated DEGs could give benefits to the LUAD patient survival, KaplanâMeier survival and univariate Cox proportional hazards regression analyses were performed on the up-regulated DEGs. The cases whose overall survival <30 days were excluded. The results showed that 446 immune related DEGs correlated with patient survival (< 0.05), 387 stromal related DEGs correlated with patient survival (< 0.05). Among all those genes, there were 291 duplicates of immune and stromal genes, a total of 542 genes (prognostic DEGs) associated with overall survival in patients with LUAD (). p p Table S1
Construction of PPI network by prognostic gene
Based on the STRING database, PPI network was obtained by using Cytoscape software to clarify the interaction between prognostic DEGs. The network was constructed by 13 modules, and we choose the top three significant modules ranked by mcode score for further study (). A total of 527 nodes and 4,661 edges were screened from the PPI network. According to the criteria above, eight nodes (PTPRC, ITGAM, LCP2, CTLA4, CD80, ITGAX, CD19, CCR5) were identified as hub genes. Consistently, these hub genes serve crucial roles in maintaining the top three modules. Fig. 4

PPI network of top three modules, ranked by mcode score. (AâC) The top three significant modules ranked by mcode score. The color of a node in the PPI network reflects the number of interacting proteins with the designated protein, and the size of node indicates the log (FC) value of thescore of gene expression. Z
Prognostic gene validate and intersection analysis with PPI network
To test whether these prognostic genes have prognostic significance in other LUAD cases. Microarray expression profile datasetfrom the GEO database was downloaded and the data were subjected to prognostic gene selection. There are 176 genes among 542 prognostic genes cannot find indataset because of platform differences. Of the remaining 366 genes, 106 genes were confirmed involving in LUAD patient survival. By intersection analysis the 106 prognostic genes with hub genes of PPI network, only PTPRC and CD19 were associated with overall survival in patients with LUAD. GSE68465 GSE68465
The correlation of PTPRC and CD19 expression with clinicopathological factors
Based on the study above, PTPRC and CD19 expression level were correlated with overall survival of LUAD patient, and PTPRC and CD19 high-expression group with longer survival. To explore the correlation of PTPRC and CD19 expression with clinical characteristic in LUAD patient, we analysis the PTPRC and CD19 expression level with TNM stage. The result indicated that the PTPRC and CD19 expression were negative correlated with the TNM stage of LUAD patient, and with the TNM stage rising, the expression of PTPRC and CD19 declined (). Fig. 5

The correlation of PTPRC and CD19 with clinicopathological stage and survival of LUAD patients. (A and B) The correlation of PTPRC and CD19 expression with clinicopathological stage. ANOVA served as the statistical significance test. (C and D) Survival analysis for LUAD patients with different PTPRC and CD19 expression. Patients were labeled with high expression or low expression depending on the comparison with the median expression level. Log-rank test served as the statistical significance test.
PTPRC and CD19 could be an indicator in TME status
To further elaborate the role of PTPRC and CD19 in LUAD, 517 samples from TCGA were divined into high/low expression group based on median expression. We implement GSEA in the high and low expression group of PTPRC and CD19. The result display that, in the high expression level whether PTPRC or CD19, the genes were basically enriched in immune relative activities including allograft rejection, complement and inflammatory response. In the low expression level of PTPRC, the genes were basically enriched in glycolysis, and typical tumor pathway including MYC-targets-V1 and MYC-targets-V2. As to the low expression group of CD19, genes were gathered in metabolic pathway. In C7 collection, many genes were concentrated in the high expression group of no matter PTPRC nor CD19, and both low expression group enrich few gene (). Fig. 6
Further, we performed correlation analysis between PTPRC and CD19 and 22 kind of immune infiltration cell. The result reveal that 16 kinds of immune infiltration cell were related with the expression of PTPRC (), 15 kinds associated with CD19 (). Fig. 7 Fig. S2

GSEA for samples with high PTPRC/CD19 expression and low expression. (A) The enriched gene sets in HALLMARK collection by the high PTPRC expression sample. Up-regulated genes on the-axis approaching the origin of the coordinates; by contrast, the down-regulated lay on the-axis. Only gene sets with NOM< 0.05 and FDR< 0.25 were considered signifificant. Several leading gene sets were displayed in the plot. (B) The enriched gene sets in HALLMARK by samples with low PTPRC expression. (C) Enriched gene sets in C7 collection, the immunologic gene sets, by samples of high PTPRC expression. Several leading gene sets are shown in plot. (D) Enriched gene sets in C7 by the low PTPRC expression. (E) The enriched gene sets in HALLMARK collection by the high CD19 expression sample. (F) The enriched gene sets in HALLMARK by samples with low CD19 expression. (G) Enriched gene sets in C7 by the high CD19 expression. y x p q

TIC profifile in tumor samples and correlation of TICs proportion with PTPRC expression. (A) Barplot showing the proportion of 22 kinds of TICs in LUAD tumor samples. Column names of plot were sample IDs. (B) Heatmap showing the correlation between 10 kinds of TICs and numeric in each tiny box indicating thevalue of correlation between two kinds of cells. The shade of each tiny color box represented corresponding correlation value between two cells, and Pearson coeffificient was used for significance test. (CâP) Scatter plot showed the correlation of 16 kinds of TICs proportion with the PTPRC expression (< 0.05). The red line in each plot was fitted linear model indicating the proportion tropism of the immune cell along with PTPRC expression, and Pearson coeffificient was used for the correlation test. p p
Construction and validation of the prognostic model
Factor analysis was used to determine common axes (or dimensions) of patterns and structures, which were measured by a reduced set of 106 predicted genes. The first twelve factors explained about 81.808% of the variation in the 106 prognostic genes. After multivariate Cox regression analysis, five factors correlated with survival were obtained (< 0.05). Through the formula mentioned before, the final prognostic model combined with 106 candidate gene were constructed. Further, to test the predictive ability of the prognostic model, the microarray expression profile datasetwas downloaded and we calculate the risk score for each patient. Based on the threshold risk score, we divided the patients into high- and low-risk group. We choose the log-rank test to identify the differences of OS between subgroups. The results showed that the overall survival of the high-risk patients is shorter than those in the low-risk ones in both TCGA dataset and(< 0.0001,= 0.0002 respectively,and). Then, to estimate whether the prognostic model is predictive of relapse free survival (RFS). We divided the patients into high- and low-risk group compared with the median risk score, log-rank test indicate that the prognostic model we build could predict the RFS of LUAD. ROC curves were also applied to evaluate the sensitivity of the prognostic model, and the ROC curves showed that the AUC value of the prognostic model reached 0.649, 0.617 in TCGA dataset andrespectively. The result indicates a substantially effective performance of the prognostic model for overall survival prediction (and). GSE68465 GSE68465 GSE68465 p p p Figs. 8A 8B Figs. 8C 8D

Validation of the prognostic model. (A) The KaplanâMeier survival analysis of the prognostic model for LUAD samples from TCGA. (B) ROC curve of the prognostic model in LUAD samples from TCGA. (C) The KaplanâMeier survival analysis of the prognostic model for LUAD samples from. (D) ROC curve of the prognostic model in LUAD samples from. (E) The KaplanâMeier survival analysis of the prognostic model with relapse free survival for LUAD samples from TCGA. GSE68465 GSE68465
Comparison with other gene expression signatures
We compared the prognostic model constructed by 106 candidate gene with 9-gene biomarker, as it also use estimate score on TCGA datasets to build the prognostic model ().included all 106 genes of our prognostic model and 9 genes needed to constructed the 9-gene biomarker. Fordata set, the 106-gene prognostic model achived a higher C-index (C-index, 0.56; 95% CI [0.48â0.64];< 0.01) compared with the 9-gene biomarker (C-index, 0.50; 95% CI [0.44â0.56];= 0.7) (). GSE14814 GSE14814 [Chen et al., 2020] Fig. S3 p p
To optimize the model with clinical charateristics
Further, univariate and multivariate Cox regression analysis have been executed between some of clinical pathological parameters and risk score of the prognostic model (). The result showed that the risk assessment model was an independent prognostic factor for prognostic. As the constructed risk assessment model with great prognostic value, we intended to improve the prognostic accuracy by intergrating with LUAD clinicopathological factors. We designed a nomogram to predict the survival of LUAD patient by combing T stage, lymph nodes metastasis, recurrence and risk score (). The AUC of the model was achieved 0.716 for 3 year and 0.699 for 5 year (and).andshow the nomogram calibration plots for predicting the overall survival of 3 years and 5 years of patient. Table 1 Fig. 9A Figs. 9D 9E Figures 9C 9D

Nomogram and Calibration curve. (A) Construction of a nomogram for predicting survival probability at 3, 5, and year of LUAD cases from TCGA data set. Calibration curve for the nomogram when predicting 3 (B) and 5 (C) year OS. (DâE) ROC curve of the optimized model in 3-years and 5 years.
| Parameters | Univariate | Multivariate | ||||
|---|---|---|---|---|---|---|
| HR | 95% CI | -valuep | HR | 95% CI | -valuep | |
| Recurrence (yes vs no) | 2.405 | [1.709â3.385] | <0.001 | 2.449 | [1.738â3.450] | <0.001 |
| Age (â€65y vs >65y) | 1.157 | [0.860â1.558] | 0.336 | |||
| Gender (male vs female) | 1.047 | [0.782â1.400] | 0.759 | |||
| Risk score | 2.216 | [1.635â3.004] | <0.001 | 1.897 | [1.329â2.709] | <0.001 |
| T stage | 2.455 | [1.697â3.553] | <0.001 | 1.827 | [1.827â1.156] | <0.01 |
| N stage | 2.546 | [1.900â3.411] | <0.001 | 2.139 | [1.517â3.017] | <0.001 |
| M stage | 1.027 | [0.747â1.414] | 0.868 | |||
| TNM stage (I-II vs III-IV) | 2.686 | [1.973â3.658] | <0.001 | 1.963 | [1.579â3.141] | 0.155 |
| Number-pack-years-smoked | 1.028 | [0.716â1.475] | 0.881 | |||
Discussion
Lung-cancer-related deaths is quite a large scale in the world which could be the most in patients who suffer cancer. And LUAD, as the most common type of lung cancer, is able to account for more than half of the morbidity and mortality of the lung cancer patient (;). Currently, growing evidence has suggested that TME plays a pivotal role in tumor initiation and progression (;), especially the recent immunocheckpoint inhibitors have noticeable effects on lung adenocarcinoma. However, patient prognosis and disease progression involved with TME related genes in LUAD have not been elucidated clearly. In the present study, 1,910 genes involved in immune response and cell adhesion were identified by comparing different immune/stromal score of LUAD samples from the TCGA database. Besides, we performed survival analysis and revealed that 542 of them were associated with overall survival in LUAD patients. After performing cross validation through, we obtained 106 TME related genes in which prominent correlation is found compared with prognosis situation. Of these survival-associated gene, 59 genes have been reported with the preliminary result. More importantly, among 106 TME related SurGenes, PTPRC and CD19 were highly interconnected nodes among PPI network and correlated with immune activity, exhibiting significant prognostic potential. Meanwhile, to explore the significance of these genes simultaneous changes for LUAD, we constructed and validated a risk assessment model that predicted survival of LUAD based on 106 genes.The prognostic model achieved robust predictive ability by combing with the clinical parameter we filtrated by multivariate Cox regression analysis.Recently, we notice that there are other prognostic model build by immune gene in LUAD (), however, compared with their prognostic model, the prognostic model we build have some advantages. First of all, we constructed the model by factor analysis. Factor analysis is a kind of algorithms in biometrics. It represents a complex array of structure-analyzing procedures used to identify the interrelationships among a large set of observed variables and then, through data reduction, to group a smaller set of these variables into dimensions or factors that have common characteristics. It is a tool to reduce multidimensional data to lower dimensions while retaining most of the information (). Secondly, our prognostic model was constructed by immune gene and stromal gene rather than only included immune gene, considering the effect of stromal gene on tumor microenvironment. Thirdly, before used to constructed the model, these 106 genes have been performed univariate cox regression analysis, and all of these gene were correlated with the prognosis of LUAD. At last, our prognostic model is also predictive not only to overall survival, but also to relapse free survival. GSE68465 [Travis et al., 2011] [Jemal et al., 2011] [Chen et al., 2020] [Liotta & Kohn, 2001] [Chen et al., 2020] [Barber, Faure & Long, 2004]
Among 106 TME related genes to be associated with LUAD, we are especially interested in PTPRC and CD19, as in the PPI network, they are highly interconnected nodes. PTPRC is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitosis, and oncogenic transformation. PTPRC has been shown to be an essential regulator of T- and B-cell antigen receptor signaling. It functions through either direct interaction with components of the antigen receptor complexes, or by activating various Src family kinases required for the antigen receptor signaling. PTPRC also suppresses JAK kinases, and thus functions as a regulator of cytokine receptor signaling. Alternatively spliced transcripts variants of this gene, which encode distinct isoforms, have been reported. There were studies showed the presence of intraepithelial PTPRC + cells from NSCLC or SCLC patients predicts favorable disease-specific survival (;). However, the mechanism of PTPRC in lung cancer has not been elucidated. The GSEA enrichment analysis in our research showed the genes in PTPRC high-expression group were mainly enriched in immune-related activities, such as allograft rejection, complement, and interferon response, and enriched in metabolic pathways in low expression of them, including glycolysis, oxidative phosphorylation, and typical tumor pathways. These result indicate that the expression level of PTPRC might correlate with the status of immune and microenviroment in tumor. Further study analysised by CIBERSORT algorithm showed many tumor infiltrating immune cell were correlated with the expression of PTPRC. All of these result suggest PTPRC participating in tumor microenvironment, and could be an indicator of TNM status. CD19 functions as coreceptor for the B-cell antigen receptor complex (BCR) on B-lymphocytes. Decreases the threshold for activation of downstream signaling pathways and for triggering B-cell responses to antigens (;;). Required for normal B cell differentiation and proliferation in response to antigen challenges (;). Some studies had reported that CD19 might serve as a downstream effector in RET signaling pathway and EGFR-activated signals in NSCLC. Other research have found in subcutaneous tumor of mice injected with LP07 adenocarcinoma cells and then treated with CD19 monoclonal antibodies, compared to non-treated cancer mice, in tumors of monoclonal-treated animals, tumor area and weight and ki-67 were significantly reduced, which implied that CD19 seemed to play a protumoral role in LUAD. There were researches reported the bronchoalveolar lavage fluid subset composition in primary lung neoplasia differs from that in the peripheral blood through an increase in CD19+ B cells compared with the peripheral blood. Similarly, some studies have described CD19+ B cells as the pathogenic populations within lung tumors (Tumor-infiltrating B cells: their role and application in anti-tumor immunity in lung cancer, Bal T lymphocyte subsets are reduced in primary lung neoplasias) (;). However, Our results suggested that the expression of CD19 was decreased in the advancing stages of LUAD patients. According to another study, in non small cell lung cancer, CD19+ B cells decrease by cancer disease (Biomarkers related to immunosenescence: relationships with therapy and survival in lung cancer patients) (). Thus, CD19 might play a double-face role in tumor, in different stages of tumor, either promoting survival or inducing apoptosis. The GSEA results showed that immune-related signaling pathways, such as allograft rejection, complement, and interferon response, were significantly enriched in the CD19 high expression group. In the CD19 low-expression group, metabolic pathways including adipogenesis, cholesterol homeostasis, and typical tumor pathways were enriched. These results implied that CD19 might participate in the status conversion of TME from immune-dominant to metabolic-dominant. Further analysis of TIC supported this view. Accordingly, the downregulation of CD19 along with the advancing stage of LUAD, the conversion of TME from immune-predominant to metabolic-dominant status, and the reduction of antitumor TICs supported that CD19 might play an antitumor role in LUAD. [Kilvaer et al., 2016] [Wang et al., 2013] [De Rie et al., 1989] [Carter & Fearon, 1992] [Van Zelm et al., 2006] [Carter & Fearon, 1992] [De Rie et al., 1989] [Wang et al., 2019] [Cascio et al., 1993] [Saavedra et al., 2016]
Conclusions
In summary, after analysis of immune/stromal score by using the ESTIMATE algorithm in TCGA database, a series of TME related genes was obtained and validated by an independent LUAD cohort. Further, the prognostic model combined with those identified genes was constructed and proved could provide a moderate OS prediction for LUAD. By integrating with clinical characteristics, testing by multivariate Cox regression analysis, the predictive ability have been optimized. Among 106 TME related genes, PTPRC and CD19 are highly interconnected nodes in the PPI network. Meanwhile, bioinformatic evidence suggest the levels of them affected the immune activity of TME. These two prognostic genes might be a candidate target to monitor and treat lung cancer, attributing to their immune nature and prognostic significance, thereby improving the clinical outcomes. Finally, our study might provide a novel insight on the potential correlation between monitoring and manipulating the TME with LUAD prognosis and precision immunotherapies.