Prostate cancer in young men represents a distinct clinical phenotype: gene expression signature to predict early metastases

Aim: Several genomic signatures are available to predict Prostate Cancer (CaP) outcomes based on gene expression in prostate tissue. However, no signature was tailored to predict aggressive CaP in younger men. We attempted to develop a gene signature to predict the development of metastatic CaP in young men. Methods: We measured genome-wide gene expression for 119 tumor and matched benign tissues from prostatectomies of men diagnosed at ≤ 50 years and > 70 years and identified age-related differentially expressed genes (DEGs) for tissue type and Gleason score. Age-related DEGs were selected using the improved Prediction Analysis of Microarray method (iPAM) to construct and validate a classifier to predict metastasis using gene expression data from 1,232 prostatectomies. Accuracy in predicting early metastasis was quantified by the area under the curve (AUC) of receiver operating characteristic (ROC), and abundance of immune cells in the tissue microenvironment was estimated using gene expression data. Results: Thirty-six age-related DEGs were selected for the iPAM classifier. The AUC of five-year survival ROC for the iPAM classifier was 0.87 (95%CI: 0.78–0.94) in young (≤ 55 years), 0.82 (95%CI: 0.76–0.88) in middle-aged (56–70 years), and 0.69 (95%CI: 0.55–0.69) in old (> 70 years) patients. Metastasis-associated immune responses in the tumor microenvironment were more pronounced in young and middle-aged patients than in old ones, potentially explaining the difference in accuracy of prediction among the groups. Conclusion: We developed a genomic classifier with high precision to predict early metastasis for younger CaP patients and identified age-related differences in immune response to metastasis development.


Identification of age-related DEGs
DEGs [absolute fold change > 1.5 and false discovery rate (FDR) < 0.05] were identified from the DASL expression data using a mixed linear model implemented in the limma R Package [2] . In the model, sample type with two levels (tumor and benign), age group with two levels (young and old), and Gleason score with two levels [low (Gleason score 6) versus high (Gleason score of 8-10)] were categorical variables with fixed effects, and patient ID was treated as a random effect. From this model, DEGs were extracted from six comparisons: 1) Gleason-score-6 tumor versus matched benign prostatic tissue in young patient group; 2) Gleason-score-6 tumor versus matched benign prostatic tissue in old patient group; 3) Gleason-score-8 tumor versus matched benign prostatic tissue in young patient group; 4) Gleason-score-8 tumor versus matched benign prostatic tissue in old patient group; 5) Gleason-score-6 tumor versus Gleason-score-8 tumor in young patient group; and 6) Gleason-score-6 tumor versus Gleason-score-8 tumor in old patient group. the Benjamin-Hochberg (BH) method [3] was used to correct for multiple testing. In a previous study [1] , we identified genes expressed differentially between Gleason-score-7 tumor and matched benign tissue in young and old patient groups.
DEGs identified from these eight comparisons were considered as primary candidate genes for developing a genomic classifier to predict metastasis following radical prostatectomy (RP) [ Supplementary Figure 1].
Ingenuity pathway analysis (IPA) of DEGs was used to predict directional biological effects (pathway involvement, cellular function, and disease association) of DEGs.

iPAM classifier development and validation
Gene expression data from primary tumors from RP in five data sets from the Decipher Genomic Resource Information Database (GRID, Decipher Biosciences, San Diego, CA) were used to develop and validate a new genomic classifier. Clinicopathological characteristics of the 1232 patients in the five datasets are shown in Supplementary Table 1; none of the patients had regional or distant metastatic disease at the time of RP.
Gene expression data were generated using the Affymetrix Human Exon 1.0 ST Gene Chips array and normalized as one combined data frame (46,050 genes in rows and 1232 patients in columns). A diagram of our study design is shown in Supplementary Figure 1.
The Mayo Clinic discovery cohort (MC I) [4] of 545 patients was used to develop the genomic classifier. In this data set, 212 patients developed early regional or distant metastasis (confirmed by bone or CT scan) within five years of biochemical relapse (BCR). This cohort was previously used to develop the Decipher classifier based on the random forest method [4] . Gene expression data from the MC I cohort for the age-related DEGs identified from the COH samples were extracted. To further reduce the DEG list, a two-sample t-test was used to only include genes differentially expressed between patients with (n = 212) and without (n = 333) metastasis.
After determining the list, the 545 patients from the MC I cohort were randomly assigned into a training data set (140 with and 222 without metastasis) and a test dataset (72 with metastasis and 111 without metastasis).
In order to select the optimal number of age-related DEGs that were most informative in predicting the development of metastatic prostate cancer (CaP), an improved Prediction Analysis of Microarray (iPAM) method [5][6][7] was applied to the training data to remove DEGs irrelevant to metastasis prediction based on minimizing the 10-fold cross-validated error rate. Specifically, the Adaptive Hierarchically Penalized Nearest Shrunken Centroid algorithm [6] was used to enable different amount of shrinkage for each variable (gene) in the process of variable selection. These iPAM-selected DEGs were assembled into an iPAM classifier by fitting a logistic regression model on the 362 samples in the training set. The iPAM classifier was then used to predict metastasis for the four independent validation data sets from the Decipher Biosciences as predictors in a logistic regression model to predict metastasis. An integrated classifier was also constructed by combining informative DEGs selected by the iPAM method and the six clinical variables as predictors in a logistic regression model.

Estimation of cell-type proportion in tissue microenvironment
xCell [8] was used to estimate the enrichment score of individual cell types (21 lymphoid, 13 myeloid, 14 stromal, 9 stem, and 7 epithelial cell types) for each tissue sample using genome-wide gene expression data (> 9000 specific genes). xCell also generated an immune score representing the overall abundance of immune cells for each sample by summing the enrichment scores from all immune cell types. The DASL gene expression data from 238 COH tissue samples (119 tumor-normal pairs) and Affymetrix gene expression data from 1232 primary tumor samples from the decipher GRID were analyzed separately by the xCell method.
Cell-type proportion in tissue microenvironment estimated by xCell method is a rank-based enrichment score.
Non-parametric analysis of variance (ANOVA) (confidence interval and p-values generated by percentile bootstrap), implemented in the "Rallfun-v35" R codes from Dr. Wilcox [9] , was used to test median differences in immune scores between sample groups classified by factors of sample type (tumor, normal), metastasis status (yes, no), and age group (old, young, middle-age). Figure 1. Bimodal distribution of predicted risk scores for developing metastasis generated by the iPAM classifier Risk scores for 556 samples from the three Decipher GRID validation data sets (MC II, CC, and TJU) with follow-up time were used to generate this histogram. The predicted iPAM risk scores for metastasis showed a bimodal distribution with score range of 0-1 where higher scores represent higher risk of developing metastasis. For the MSKCC data set (the fourth validation data, 131 patients in total and 9 patients with metastasis) with no information on follow-up time, the conventional AUC for iPAM (improved Prediction Analysis of Microarray) classifier calculated based on binary metastasis status was 0.86 with 95% CI of 0.73-0.99. Figure 3. Metastasis-associated differential abundance (enrichment scores) of four specific immune cell types among three age groups.

Supplementary
Plots were generated using the cell type enrichment scores estimated for the 1232 primary tumor samples from the five Decipher GRID data sets. Out of 33 immune cell types, 4 specific cell types, Type 2 T helper cells (A), Conventional dendritic cells (B), CD8+ T cells (C), and CD4+ memory T cells (D), in young and middle age group (≤ 55 yeas) demonstrated significantly (P < 0.05) greater abundance of immune cells in primary tumors from patients with metastasis compared to primary tumor samples from patients without metastasis; but there were no significant differences in the old patient group (>70 years). (Statistical p-values and median differences in abundance of immune cell type between metastasis and without metastasis groups among three age groups are included in the following table).     The pb2gen function in the WRS (Rallfun-v35) package 9 was used to test significance of median difference in immune score between patients with and without metastasis.   *number of patients with and without metastasis, respectively; **genes with red font are immune-related genes; *** Non-parametric ANOVA (p-values generated by percentile bootstrap), implemented in the "Rallfun-v35" R codes (the pb2gen function) from Dr. Wilcox (4), was used to test median differences in gene expression between patients with and without metastasis in each age group.