Prostate cancer (PCa) is a biologically heterogeneous disease. Even within a single prostate gland, tumors can vary dramatically in aggressiveness, and genetic mutations differ between individuals. Treatment decisions hinge on survival predictions guided by factors such as age, overall health, Gleason grade, TNM staging, and prostate-specific antigen (PSA) levels. Current clinical guidelines rely on conventional linear models, including the Cox proportional hazard model and standard survival analysis. However, these traditional approaches struggle with the complex, nonlinear interactions among the many prognostic variables in prostate cancer biology.
The machine learning proposition: Artificial intelligence and machine learning (ML) methods can process large, multidimensional datasets far more efficiently than traditional statistics. In prostate cancer specifically, ML has been applied to drug discovery, gene expression profiling, biomarker panel construction, multi-omics analysis, and digital pathology slide interpretation. This review from Gangnam Severance Hospital and Yonsei University College of Medicine (Seoul, South Korea) surveys the contemporary literature on ML-based algorithms for predicting biochemical recurrence (BCR)-free survival, castration resistance-free survival, complication-free survival, metastasis-free survival, and overall survival (OS).
Scope and growth: The authors note a striking temporal trend in the field. Of the 20 key studies summarized in their review tables, only 3 (15.0%) were published before 2020, while 17 (85.0%) appeared after 2020. This acceleration reflects both the growing availability of large clinical datasets and the maturation of ML frameworks suitable for clinical survival analysis. The review covers ML architectures ranging from artificial neural networks (ANN) and k-nearest neighbor (kNN) to random forests, support vector machines (SVM), convolutional neural networks (CNN), XGBoost, gradient boosting machines, and long short-term memory (LSTM) networks.
Biochemical recurrence (BCR) is defined as an elevation of PSA after definitive therapy such as radical prostatectomy or radiation therapy, signaling disease recurrence. Efforts to predict BCR-free survival using AI began as early as 2001. Tewari et al. developed artificial neural networks (ANN) using data from 1,400 patients who underwent radical prostatectomy. The ANN achieved 76% overall accuracy with 55% sensitivity and 90% specificity, compared to 66% accuracy, 15% sensitivity, and 94% specificity for multivariate regression analysis. The ANN used preoperative variables including age, race, serum PSA, biopsy Gleason score, and biopsy-based staging.
Random forest and logistic regression: Wong et al. analyzed 338 patients and compared K-nearest neighbor (AUC = 0.903), random forest (AUC = 0.924), and logistic regression (AUC = 0.940) against traditional Cox regression (AUC = 0.865). All three ML models outperformed the conventional approach for predicting one-year post-prostatectomy BCR. Tan et al. scaled this up to 1,130 patients with a median follow-up of 70 months. Among those, 15.6% developed BCR at a median time of 16 months. For predicting five-year BCR, Naive Bayes (AUC = 0.894) and random forest (AUC = 0.888) outperformed all three conventional nomograms: KATTAN (AUC = 0.799), CAPSURE (AUC = 0.749), and JHH (AUC = 0.750).
Deep learning biomarkers from pathology: Pinckaers et al. developed a CNN-based biomarker using ResNet50-D on tissue microarray hotspots from 685 post-prostatectomy patients at Johns Hopkins, with external validation on 204 patients from NYU Langone. The model yielded an odds ratio of 3.32 (CI 1.63-6.77, p = 0.001) per unit increase in the deep learning score, matched on Gleason sum, age, race, and pathologic stage. Sandeman et al. used two independent CNNs for tissue segmentation and Gleason grading across 750 patients (331 training, 391 validation), achieving 98% sensitivity and 98% specificity for cancer detection, with Grade Group 3-5 cancers showing HR = 5.91 (95% CI 1.96-17.83) for BCR compared to Grade Group 1-2.
Prostate MRI provides rich data on tumor size, extracapsular extension, seminal vesicle invasion, and pelvic lymph node metastasis (PLNM). Radiomics, the extraction of quantitative features from medical images, allows ML models to leverage this imaging data for predicting post-prostatectomy outcomes. Hou et al. developed iBCR-Net, a multimodal integrative deep survival network that combined an MRI radiomics signature (RadS), AI-predicted T3 staging, and AI-predicted PLNM with 17 clinicopathological indicators. Using data from 579 patients (463 training, 116 test), the iBCR-Net achieved 5.16-fold, 12.8-fold, and 2.09-fold improvements in prediction accuracy over the D'Amico score, CAPRA score, and CAPRA post-surgical score, respectively (all p < 0.05, log-rank test).
Deep learning on multiparametric MRI: Lee et al. analyzed 437 patients with a median 61-month follow-up and built a 17-layer convolutional deep learning model that combined prostate MRI radiomics with six clinical parameters. This model achieved a concordance index (C-index) of 0.89 for BCR-free survival prediction. In comparison, Bourbonne et al. used MRI T2 and apparent diffusion coefficient (ADC) maps on 195 high-risk patients (median follow-up 46.3 months), achieving a hazard ratio of 6.8, while Li et al. applied a radiomics approach on 198 patients (median follow-up 35 months) with a C-index of 0.77. Lee's model outperformed both.
Future potential with PSMA PET: The authors suggest that incorporating prostate-specific membrane antigen (PSMA) positron emission tomography (PET) radiomics into deep learning algorithms could further improve BCR prediction accuracy. PSMA PET provides molecular-level information about tumor biology that is complementary to the anatomical and functional data from MRI, potentially capturing aggressive tumor phenotypes that standard MRI radiomics might miss.
Androgen deprivation therapy (ADT) is the primary treatment for advanced prostate cancer. Patients who respond well are classified as castration-sensitive (CSPC), but a subset inevitably progresses to castration-resistant prostate cancer (CRPC), experiencing declining ADT response and disease progression. Early detection of CRPC is critical so that clinicians can administer second-line therapies such as androgen receptor axis-targeting agents (ARATs) or chemotherapy to extend overall survival.
Deep learning on biopsy images: Nakata et al. examined 180 metastatic hormone-naive prostate cancer patients who initially received combined androgen blockade. They used VGG16, a CNN architecture, to construct a deep learning algorithm (DLA) from prostate needle biopsy H&E patch images. Multivariate analysis showed that time to CRPC was the factor most strongly associated with overall survival (p < 0.001), even more so than Gleason score of 8 or above. The ratio of hormone-sensitive patches to total patches was significantly different between groups (p = 0.015, median 0.575 vs. 0.708). Among CSPC patients with time to CRPC exceeding 24 months, the 5-year OS was 96.7%, suggesting these patients may not require upfront intensified treatment. The study was limited to a small, single-institution Japanese population.
Multimodal MRI and pathology integration: Zhou et al. built a joint model integrating prostate MRI, H&E biopsy slides, and ML using data from three medical centers (140 patients training, 61 external validation). ResNet-50 emerged as the top performer for predicting CRPC progression, achieving an AUC of 0.887 in the training set and 0.768 in the test set. However, the study had incomplete clinical data for many patients and excluded key prognostic factors, highlighting the challenge of building comprehensive multimodal models in real-world clinical settings.
Lymph node metastasis: Pelvic lymph node dissection (PLND) is recommended for intermediate-risk patients with over ten years estimated survival and for high-risk patients, but it adds operative time and complication risk. Hou et al. developed a random forest-based model (AUC = 0.906, 95% CI 0.856-0.928) using data from 248 patients that outperformed the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram. At a 10% cutoff, the model spared 47.2% of extended PLNDs while missing only 1.7% of lymph node invasions. They also built a separate PLNM-Risk calculator using the AutoGluon platform across 401 patients from two centers, which spared 59.6% of extended PLNDs while missing only 1.7% of pelvic LN metastasis cases.
SEER-based and PSMA PET approaches: Wang et al. used six ML algorithms on 24,470 intermediate- and high-risk PCa patients from the SEER database. The gradient boosting machine achieved the highest performance (F1 = 0.838, AUC = 0.804). Cysouw et al. conducted a prospective study of 76 patients using radiomic features from [18F]DCFPyL PSMA PET-CT images with random forest ML. The model predicted lymph node invasion (AUC 0.86), nodal or distant metastasis (AUC 0.86), Gleason score (AUC 0.81), and extracapsular extension (AUC 0.76), all outperforming standard PET metrics.
Bone metastasis prediction: Approximately 90% of men who die of PCa metastases have bone involvement. Liu et al. analyzed 207,137 PCa patients from the SEER database, of whom 3.2% developed bone metastasis. The XGBoost model achieved the best performance (AUC = 0.962, accuracy = 0.884, sensitivity = 0.906, specificity = 0.879) and was deployed as a web-based predictor. Zhang et al. integrated MRI radiomics (via PyRadiomics), deep transfer learning features (via ResNet-50), and pathognomonic features from H&E slides across 211 patients. The best model, using a support vector machine, achieved AUC = 0.93 (95% CI 0.854-1.000).
Treatment discontinuation prediction: Deng et al. used data from 1,600 metastatic CRPC patients across three phase III clinical trials to predict docetaxel discontinuation due to adverse events. Among five base learners (linear regression, logistic regression, Cox regression, bagging with classification trees, and random forest), random forest achieved the highest median AUC of 0.627. The top predictive features included albumin, sodium, total protein, magnesium, testosterone, neutrophil count, white blood cell count, and phosphorus levels. The algorithm estimated that within 1,000 mCRPC patients, approximately 104 were wrongly assigned to docetaxel chemotherapy and the model could have saved about ten patients from inappropriate treatment.
Overall survival in metastatic disease: Saito et al. used random survival forest on 340 metastatic PCa patients receiving ADT. The model stratified patients into three groups: very poor prognosis (pretreatment LDH of 248.5 IU/L or above, with roughly 70% dying within five years), intermediate risk (low LDH but post-treatment ALP of 326.5 IU/L or above, 5-year survival about 70%), and very good prognosis (low LDH and low post-treatment ALP, 5-year survival above 90%). The C-index was 0.85 for both OS and cancer-specific survival. Anderson et al. developed gradient boosting machine models for 438 mCRPC patients with skeletal-related events, achieving AUCs of 0.73 to 0.86 and Brier scores consistently below 0.20 for 1- through 10-year survival predictions.
PET-CT for cancer-specific survival: Polymeri et al. examined 285 patients who underwent 18F-choline PET-CT for newly diagnosed high-risk PCa. An AI-based model automatically segmented the prostate and produced three volumetric measurements (lesion volume, total lesion uptake, and fraction of abnormal SUV voxels). All three were significantly associated with cancer-specific survival for patients receiving palliative treatment (p = 0.008, 0.02, and 0.005, respectively). These volume-based PET measurements provided better predictive ability than SUVmax alone, offering reproducible whole-tumor characterization that avoids the subjectivity of Gleason grading.
Koo et al. developed the SCaP (Severance Study Group of Prostate Cancer) Survival Calculator using data from 7,267 PCa patients. The calculator predicts 5- and 10-year survival rates for CRPC-free survival, cancer-specific survival, and overall survival according to various initial treatment modalities, including active surveillance, radical prostatectomy, radiation therapy with and without ADT, and ADT alone. The system accepts individual patient data such as age, height, weight, PSA, prostate volume, positive core numbers, maximal core percentage, Gleason score, Charlson Comorbidity Index, performance status, TNM stage, and medical history to automatically compare outcomes by treatment modality.
LSTM outperforms other architectures: When comparing Cox regression, multilayer perceptron (MLP), MLP for N-year survival prediction, and long short-term memory (LSTM) networks, the LSTM model showed the highest C-indices and AUC values. For 10-year CRPC progression prediction, the LSTM achieved a C-index of 0.914 (95% CI 0.890-0.928) and AUC of 0.920 (95% CI 0.899-0.936). LSTM networks are particularly well suited for this task because they process sequential, time-dependent data and can capture temporal patterns in disease progression that static models miss. The limitation was that all patients came from a single Asian ethnic background.
External validation: Lim et al. performed an external validation of the SCaP calculator using 4,415 patients from three institutions. The AUCs for 5-year CRPC-free survival, cancer-specific survival, and OS were 0.962, 0.944, and 0.884, respectively. For 10-year outcomes, AUCs were 0.959, 0.928, and 0.854. These validation results actually outperformed the original developmental model, confirming the generalizability of the LSTM-based calculator. However, the data included patients treated over a long time span during which treatment modalities and systemic agents had considerably improved, introducing potential temporal confounding.
Data quality and privacy concerns: ML holds immense promise for prostate cancer survival prediction, but data security, patient privacy, algorithmic bias, and the legal and ethical framework governing data use present significant obstacles. There is no strict rule for the optimal number of patients required for model generalizability. Instead, the required sample size depends on model complexity, data variability, feature dimensionality, and the specific prediction task. Many of the reviewed studies used relatively small, single-institution cohorts, and caution is warranted when interpreting results from studies that lack external validation.
Algorithmic bias and ethnic diversity: A recurring limitation across the reviewed studies is the lack of demographic diversity. Several high-performing models were trained exclusively on Asian or single-ethnicity populations. Potential biases and inequalities in algorithms developed using PCa populations characterized by limited heterogeneity and ethnic diversity must be carefully addressed. External validation across diverse countries, ethnicities, and medical centers is essential for building models that can be trusted in routine clinical practice. The authors emphasize that achieving this requires collaboration with health insurance providers and innovative startups to support widespread adoption.
Key strategies for improvement: The review identifies six strategies for building more precise ML-based algorithms: (1) enhancing data preparation quality, (2) selecting algorithms aligned with study objectives, (3) optimizing hyperparameters, (4) refining model architecture, (5) addressing overfitting and underfitting, and (6) improving computational power. Future directions include predicting treatment discontinuation and complications for metastatic patients, incorporating PSMA PET-CT into survival prediction models, and developing AI systems that predict genetic mutation status directly from clinical and imaging data.
Clinical translation: From a practical standpoint, the reviewed ML models address real clinical decision points. LN metastasis prediction can guide decisions about lymphadenectomy extent. Metastasis-free survival models inform adjuvant therapy decisions. Bone metastasis risk calculators enable early preventive measures against skeletal-related events using agents like denosumab and zoledronic acid. BCR prediction from biopsy samples and MRI can help determine optimal treatment timing and modality. The path to clinical guidelines integration, however, requires multi-center, multi-ethnic validation at scale.