Prostate cancer screening with prostate-specific antigen (PSA) has contributed to a greater than 50% reduction in prostate cancer mortality. However, PSA screening has also driven a major problem of overdiagnosis and overtreatment of non-aggressive cancers. The clinical focus has therefore shifted toward preferential detection of aggressive disease, using improved imaging, blood- and urine-based biomarkers, and genetic tests. The sheer volume of data generated by these newer diagnostics creates a challenge that AI-based systems are well positioned to address.
Scope of this review: The authors from Stanford University and University College London provide a comprehensive survey of AI models applied across three imaging modalities central to prostate cancer diagnosis: magnetic resonance imaging (MRI), ultrasound, and histopathology images from prostate biopsies. In addition, the review covers AI models for supporting tasks, including prostate gland segmentation, MRI-ultrasound registration for fusion biopsies, and MRI-histopathology registration for generating ground truth cancer labels.
The clinical pipeline: In current practice, MRI is used to detect suspicious lesions, which are then targeted via MRI-ultrasound fusion biopsy. Tissue obtained through biopsy undergoes histopathological analysis for Gleason grading. Urologists use this information to plan treatment, with the primary objective of treating aggressive cancer while avoiding overtreatment of indolent disease. AI has the potential to optimize every step in this workflow, from improving cancer detection rates on MRI and ultrasound to reducing inter-observer variability in Gleason grading among pathologists.
Known limitations of current imaging: Despite the adoption of PI-RADS (Prostate Imaging-Reporting and Data System), MRI still carries a false-positive rate exceeding 35%, misses 12% of aggressive cancers during screening, and has inter-reader agreement kappa values of only 0.46 to 0.78. In men undergoing prostatectomy, 34% of aggressive cancers and 81% of indolent cancers are missed on MRI. These figures underscore the substantial room for AI-based improvement.
The review organizes AI models for prostate MRI into two major task categories. Lesion classification models take a radiologist-outlined region of interest and classify it into categories such as cancer versus benign, clinically significant cancer versus indolent, or specific Gleason grade groups. These models frequently use traditional machine learning (TML) with hand-crafted radiomic features describing texture, shape, and volume, fed into classifiers like random forests, support vector machines, or logistic regression. More recent classification methods employ deep learning (DL) and skip the feature-engineering step entirely.
Lesion detection models take the full MRI exam as input and automatically detect, localize, and stratify cancer aggressiveness across the entire prostate. These are more clinically impactful because they do not require a radiologist to first outline a suspicious region. Detection models almost universally use deep learning. Cohort sizes range dramatically, from as few as 19 patients to 2,732 patients in the largest study by Saha et al. The review catalogues over 20 lesion classification studies and over 15 lesion detection studies, with detailed tables of input sequences (T2w, ADC, DWI, DCE), algorithm types, label sources, and evaluation metrics.
Ground truth labeling: A critical variable across these studies is how training labels are derived. Some use radiologist outlines of PI-RADS 3+ lesions with or without pathology confirmation. Others use pathologist outlines mapped from whole-mount histopathology onto pre-operative MRI via registration. A newer approach uses automated Gleason pattern labels generated by deep learning on pathology images, then mapped onto MRI through automated registration. This last method enables selective identification of aggressive versus indolent components even within mixed lesions, something that is not possible with human-annotated MRI labels.
Data and evaluation variability: Direct comparison across published models is essentially impossible due to wide variability in labels, evaluation criteria, data set sizes, and the lack of publicly shared source code. Most studies used retrospective, single-institution data. Evaluation granularity ranges from patient-level to lesion-level to pixel-level, and metrics include ROC-AUC, PR-AUC, FROC, sensitivity, specificity, F1-score, dice coefficient, and more. This heterogeneity is one of the key barriers to clinical translation identified by the authors.
Grayscale transrectal ultrasound (TRUS) remains the most common modality for guiding prostate biopsy, yet its cancer detection rate is reported to be as low as 40%. Low signal-to-noise ratio and artifacts such as speckle and shadowing prevent clinicians from reliably distinguishing cancerous from non-cancerous tissue. Newer ultrasound-based modalities, including shear-wave elastography, color Doppler ultrasound, contrast-enhanced ultrasound, and high-frequency micro-ultrasound, provide improved resolution and visualization. Micro-ultrasound in particular is showing promise, with sensitivity for clinically significant prostate cancer similar to or higher than MRI, at substantially lower cost.
AI studies on ultrasound: Despite its widespread clinical use, only one AI study focused specifically on cancer detection using standard grayscale TRUS images. Most AI research in this domain has targeted newer modalities, especially temporal enhanced ultrasound (TeUS). Deep learning models on TeUS data (Sedghi et al., Azizi et al.) used cohorts of 155 to 163 patients with pathology-confirmed biopsy labels, evaluated at the lesion level with metrics including sensitivity, specificity, accuracy, and AUC. Other studies explored radio-frequency time-series data for cancer detection using traditional machine learning, though on very small cohorts (14 to 16 patients).
Classification versus detection: Most ultrasound AI studies focused on lesion classification, where a physician-outlined region is categorized as benign or cancerous. Only a few addressed the more challenging task of lesion detection, which involves automatically identifying and localizing cancer across the entire ultrasound image. Wildeboer et al. combined TRUS, shear-wave elastography, and dynamic contrast-enhanced ultrasound on 50 patients for pixel-level and lesion-level detection.
The authors conclude that AI-based prostate cancer detection on ultrasound remains in early stages, with most methods evaluated on small, single-institution retrospective cohorts. This area represents a significant research opportunity, particularly given the global prevalence of ultrasound-guided biopsy and the emerging promise of micro-ultrasound imaging.
Gleason grading on histopathology images is the strongest predictor of prostate cancer aggressiveness and recurrence, but it suffers from significant inter- and intra-pathologist variability. While subspecialized genitourinary pathologists achieve high concordance, such expertise is not universally available. The digitization of glass slides into whole-slide images (WSIs) has opened the door for AI-assisted pathology. However, these images are enormous: a single uncompressed whole-mount histopathology slice is 2 to 4 GB, and a full patient dataset can exceed 20 GB. Approximately 470 WSIs contain the same number of pixels as the entire 14-million-image ImageNet dataset.
Study designs and data: AI models for histopathology are trained with labels derived either from pathology reports (enabling large cohorts, such as 15,187 slides in Campanella et al.) or from pixel-level annotations by experienced pathologists (yielding smaller but more precisely labeled datasets, such as 38 slides in Lucas et al.). Evaluation is performed at the pixel level, region level, or slide level, depending on the granularity of available labels. The review catalogs studies using both biopsy and radical prostatectomy specimens across single and multiple institutions.
Multi-reader and AI-assisted studies: Several studies compared AI models against panels of pathologists, and evaluated whether AI assistance improves human performance. The results consistently show that AI models perform comparably to expert genitourinary pathologists, increase sensitivity without reducing specificity, and reduce inter- and intra-pathologist variability. Results from the Prostate cANcer graDe Assessment (PANDA) challenge, which attracted 1,290 AI developers from 65 countries and used over 10,000 histopathology images, showed that AI models generalize across different patient populations and achieve strong concordance with expert pathologists on an independent validation set of 2,009 biopsies.
FDA-approved solution: Paige Prostate (Paige AI, New York) became the first FDA-approved AI-based clinical pathology solution for prostate cancer. Independent studies demonstrated that Paige Prostate improved sensitivity of non-genitourinary specialist pathologists from 74% to 97%, with the most pronounced improvements for smaller and lower-grade cancers (Grade Groups 1, 2, and 3). The authors note that Gleason grading standards continue to evolve, meaning AI models will need mechanisms to update alongside shifting clinical definitions, rather than remaining frozen with locked-in variables.
Prostate gland segmentation on MRI: Accurate prostate gland segmentation on T2-weighted MRI and ultrasound is a prerequisite for targeted MRI-ultrasound fusion biopsies. Manually outlining the prostate is time-consuming and introduces variability. Deep learning models for MRI segmentation have been trained and validated on cohorts of 40 to 250 patients, mostly from single institutions with retrospective data. The best-performing models achieved Dice scores of at least 0.90 on internal datasets and 0.80 on external datasets. One prospective study found that an AI segmentation model was more accurate and 17 times faster than trained radiology technicians. FDA-cleared commercial solutions for prostate gland segmentation are already available, including OnQ Prostate, PROView, Quantib Prostate, and qp-Prostate.
Prostate gland segmentation on ultrasound: AI models for ultrasound segmentation use both traditional machine learning and deep learning approaches. Studies have explored statistical shape models and temporal information to improve segmentation at challenging regions such as the apex and base. However, most studies used small, single-institution cohorts with a single ultrasound manufacturer, limiting evidence of generalizability across devices and clinical sites.
MRI-ultrasound registration: Aligning pre-operative MRI with intra-operative ultrasound is necessary for guiding fusion biopsies, focal therapy, and radiotherapy planning. This is complicated by differences between the two imaging modalities and tissue deformation between imaging sessions. AI approaches use pre-defined anatomical structures, deformable transformations, or learned registration models. The best-performing AI-based methods achieved average target registration errors (TREs) of approximately 2 to 3 mm, though with relatively large variance. Prostate gland segmentation as an intermediate step helps improve registration accuracy. All studies to date used retrospective data from single institutions, and several MRI/ultrasound manufacturers have integrated semi-automated registration tools that still require human input in real-time.
A critical but often overlooked component of training AI cancer detection models on MRI is the generation of accurate ground truth labels. The gold standard involves mapping pathologist annotations from post-operative whole-mount histopathology images onto pre-operative MRI through a registration process. This MRI-histopathology registration is considered the most accurate labeling strategy for training cancer detection models, because it captures the full spatial extent of tumors, including lesions that were invisible on MRI.
Registration approaches: Three main types exist. Cognitive registration involves researchers mentally projecting cancer labels from histopathology onto MRI slices without quantitative alignment. Manual registration involves spatially aligning the two modalities case by case. Both are labor-intensive and can only handle small datasets. Traditional automated approaches use customized similarity loss functions, fiducial markers, or intermediate ex vivo imaging modalities. Many rely on patient-specific 3D-printed molds derived from pre-operative MRI to maintain slice correspondence. Deep learning registration models, such as those by Shao et al. (152 and 183 patients), greatly speed up the process and in some cases eliminate the need for prostate segmentation at inference.
Scale and practical constraints: The review catalogs over a dozen registration approaches with cohort sizes ranging from 6 to 183 patients. Traditional automated methods require several minutes per patient, while deep learning approaches run much faster. A recent innovation uses 3D super-resolution of both MRI and histopathology volumes prior to registration to reduce partial volume artifacts. Despite the importance of this registration task, only a few AI cancer detection studies on MRI actually use automated registration to derive their ground truth labels. Most rely on simpler but less accurate labeling strategies.
The authors emphasize that public sharing of source code, trained models, and benchmarking datasets for registration is needed to enable broader adoption and fair comparison of different approaches. Without accurate registration, the quality of MRI-based cancer detection models will remain fundamentally limited.
Limited labeled data: Robust AI models require large, diverse, accurately labeled datasets. For comparison, natural image recognition systems train on approximately 14 million images from ImageNet. Medical AI models for prostate cancer are typically trained on small, single-institution datasets with patient populations that may not represent broader demographics. AI models trained on homogeneous populations or specific scanner types may not generalize to different racial, socio-economic, or ethnic groups, or to different imaging hardware. Privacy concerns around medical data sharing remain a fundamental obstacle. Potential solutions include federated learning (sharing model updates rather than data), self-supervised and weakly supervised learning (training with unlabeled or partially labeled data), and semi-supervised and few-shot learning techniques.
Limited multi-reader studies: Relatively few studies assess how AI models perform alongside clinicians in real workflows. The multi-reader studies that do exist are encouraging. For MRI, AI-assisted radiologists showed improved sensitivity and positive predictive value for both patient-level and lesion-level cancer detection. For histopathology, AI-assisted pathologists outperformed both the standalone AI system and unassisted pathologists, with reduced inter- and intra-observer variability. However, these studies have generally used limited patient populations, and more extensive, multi-center evaluations are needed.
Limited prospective evaluation: The overwhelming majority of AI models for prostate cancer have been developed and evaluated using retrospective data. Only a handful of studies have conducted prospective evaluations. Moving from retrospective validation to prospective clinical trials is essential before deployment. Clinical trials that evaluate AI models for cancer detection on non-invasive imaging need to be designed and explored.
Lack of standard evaluation criteria: The wide variability in evaluation methods and metrics across published studies makes it extremely difficult to compare different AI approaches. Patient-level, lesion-level, and pixel-level evaluations each have different clinical implications, and the field lacks a unified, clinically relevant standard. Long-term outcome data (recurrence, death) could serve as hard clinical endpoints, but prostate cancer is a slow-progressing disease and such data are rarely available. Grand challenges like PANDA (histopathology, 1,290 developers, 65 countries) have helped, but MRI challenges like ProstateX included only 346 patients from a single institution.
Limitations of this review: The authors acknowledge several constraints. First, this is not exhaustive. The volume of publications across different facets of AI in prostate cancer forced selectivity. Second, no direct comparative analysis of methods was possible due to inconsistent datasets and evaluation criteria across studies. Third, the review intentionally describes algorithms at a high level (traditional machine learning versus deep learning) rather than providing architectural details. Fourth, the review does not cover AI systems that use clinical data, genomic data, or newer imaging modalities such as prostate-specific membrane antigen (PSMA) PET scans.
Emerging modalities: Gallium-68 PSMA-11 PET-CT scans have demonstrated significant improvements in prostate cancer detection and treatment planning. Studies show that PSMA PET-CT is more accurate than conventional imaging with CT and bone scanning, adds diagnostic value beyond MRI alone, and may enable better prediction of pre-operative pathological outcomes. However, due to its recent FDA approval, only a few AI models exist for prostate cancer detection using PSMA PET. This represents a growing opportunity for AI research.
Key needs for clinical translation: The review converges on several priorities for the field. Publicly available, anonymized, large-scale, multi-institution imaging datasets are needed for model training and independent validation. Source code and trained models must be shared to enable reproducibility and fair benchmarking. Multi-institution collaborations should drive larger grand challenges for prostate MRI and ultrasound, building on what PANDA achieved for histopathology. Research on best practices for integrating AI predictions into clinical workflows in a seamless way is essential for enabling clinician-AI synergy.
The authors conclude that AI-enabled precision medicine may eventually help reduce disparities and advance health equity in prostate cancer management. However, the gap between academic research and commercial clinical solutions remains wide, and bridging it will require coordinated effort across institutions, regulatory bodies, and the AI research community.