Artificial Intelligence-Augmented Imaging for Early Pancreatic Cancer Detection

PMC 2024 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Pancreatic Cancer Is So Deadly, and Why Early Detection Matters

Pancreatic ductal adenocarcinoma (PDA) is one of the most lethal cancers and is projected to become the second deadliest cancer in the United States by 2030. The incidence-to-mortality ratio was 1.28 in 2024, meaning nearly every person diagnosed with PDA eventually dies from it. The core problem is late detection: only 13.6% of cases are found while the tumor is still localized and potentially operable. More than 85% of patients are diagnosed at an unresectable stage, leaving palliative care as the only option.

Unlike breast or colorectal cancer, PDA has no standardized screening program for sporadic cases, which make up 85-90% of all diagnoses. Hereditary PDA accounts for a small minority and can be monitored through genetic testing and structured surveillance. Sporadic PDA arises without any known familial predisposition, making pre-symptomatic detection extremely difficult. Even modest improvements in early detection could translate into significant reductions in mortality at a national scale.

Recent advances in multimodal therapy have shown that patients with locally advanced (LA) or borderline resectable (BR) PDA who achieve a major pathologic response to neoadjuvant chemotherapy can reach a median overall survival exceeding 60 months. This is a dramatic improvement over historical outcomes. The implication is clear: if PDA is detected earlier, even at the LA or BR stage, modern treatment regimens can substantially extend survival. This shifts the treatment paradigm from purely palliative management to curative-intent interventions.

The paper is a review from the Mayo Clinic Department of Radiology focused specifically on CT-based AI applications. The authors examine how artificial intelligence, particularly deep learning and radiomics, can overcome the limitations of conventional imaging to detect PDA at earlier stages. The review covers AI-driven pancreas segmentation, pre-diagnostic detection models, diagnostic-stage detection tools, systemic biomarker integration, and the challenges that remain before clinical deployment.

TL;DR: PDA kills nearly everyone it affects (incidence-to-mortality ratio of 1.28), with only 13.6% of cases caught early enough for surgery. No screening exists for sporadic cases (85-90% of diagnoses). This Mayo Clinic review focuses on how AI and radiomics applied to CT imaging could shift the detection window earlier, when modern therapies can push median survival past 60 months.
Page 2
Why Standard CT Fails to Catch Early Pancreatic Cancer

Contrast-enhanced computed tomography (CT) is the current diagnostic standard for PDA, but it relies on visualizing macroscopic tumors. The problem is that the pancreas frequently looks morphologically normal during the pre-diagnostic stage of PDA. Retrospective studies show that over 50% of PDA cases exhibit no discernible abnormalities on pre-diagnostic imaging, highlighting CT's fundamental inability to capture disease during its subclinical phase.

Subtle imaging signs that do appear before clinical diagnosis, such as pancreatic duct cutoff or mild ductal dilatation, lack specificity. These findings are typically present 3 to 36 months before clinical diagnosis but also occur commonly in individuals without PDA. This overlap leads to high false-positive rates and low predictive value. Interpretation of these subtle features is also highly subjective, with low inter-reader agreement among radiologists, which further compounds diagnostic variability and the likelihood of missed early-stage tumors.

PDA progresses rapidly, transitioning from subclinical disease to advanced-stage disease within an estimated 12 to 18 months. The parenchymal changes that precede tumor formation often manifest at textural or molecular levels that fall below the resolution of standard imaging. Standard imaging also fails to capture dynamic alterations in pancreatic morphology and function over time, limiting its usefulness for longitudinal screening. Without a fundamental shift in imaging methodology, the clinical window for early detection remains critically narrow.

TL;DR: Over 50% of PDA cases show no visible abnormalities on pre-diagnostic CT. Subtle signs like ductal dilatation appear 3-36 months before diagnosis but also occur in healthy individuals, producing high false-positive rates. Low inter-reader agreement among radiologists worsens the problem. PDA transitions from subclinical to advanced disease in just 12-18 months, leaving a very small detection window.
Pages 2-3
Automated Pancreas Segmentation as the Foundation for Early Detection

Volumetric pancreas segmentation is a critical prerequisite for early PDA detection on imaging. Subtle parenchymal changes often go unnoticed on standard imaging, so precise volumetric analysis is needed to identify pre-diagnostic alterations. However, manual segmentation of the pancreas is extremely labor-intensive and inconsistent. The organ has an irregular shape, variable positioning, and a complex interface with surrounding structures. Even experienced radiologists produce segmentations with high inter-reader and intra-reader variability, making manual approaches neither scalable nor suitable for large-scale screening programs or biomarker discovery studies.

Deep learning models trained on large, multi-institutional datasets have demonstrated near-expert performance in automated pancreas segmentation. These models achieve high Dice Similarity Coefficients (DSC), a metric that quantifies the spatial overlap between AI-generated and ground-truth segmentations on a 0-to-1 scale. The authors highlight an AI model that achieved a DSC of 0.96, indicating excellent consistency with radiologist-derived segmentations across axial, coronal, and sagittal imaging planes. The Concordance Correlation Coefficient (CCC), which measures both precision and accuracy of volumetric agreement, also showed values close to 1 for these models.

By providing accurate, reproducible volumetric assessments, AI-driven segmentation enables the extraction of subtle imaging biomarkers, including textural and morphometric changes that precede clinical PDA diagnosis. Automated segmentation minimizes human-induced variability, improves efficiency in radiomics workflows, and enables real-time pancreas analysis during routine clinical practice. This makes it the foundational technology for all the downstream AI detection models discussed in the paper.

TL;DR: Manual pancreas segmentation is too slow and inconsistent for screening. AI models achieve a Dice Similarity Coefficient of 0.96, matching radiologist quality across all imaging planes. Automated segmentation provides the reproducible volumetric data needed to extract subtle pre-diagnostic biomarkers and power downstream AI detection models.
Pages 3-4
REDMOD: Detecting Pancreatic Cancer on CT Scans Months to Years Before Diagnosis

The Radiomics-Based Early Detection Model (REDMOD) is a machine-learning model designed to identify structural and textural alterations in normal-appearing pancreatic tissue on CT scans obtained months to years before a clinical PDA diagnosis. In a case-control study, REDMOD was applied to a dataset of 155 pre-diagnostic CTs (median lead time of 398 days) and 265 CTs from age-matched controls. Radiomic feature extraction generated 88 first-order and gray-level texture metrics, from which 34 features were selected using a LASSO (least absolute shrinkage and selection operator) approach.

The dataset was split into a training subset (292 CTs: 110 pre-diagnostic, 182 controls) and a test subset (128 CTs: 45 pre-diagnostic, 83 controls). On the test subset, REDMOD achieved an AUC of 0.98 (95% CI, 0.94-0.98), sensitivity of 95.5% (85.5-100.0), specificity of 90.3% (84.3-91.5), and accuracy of 92.2% (86.7-93.7) at a median lead time of 386 days (range: 97-1,092 days). Specificity held up in external validation: 92.6% on an internal validation set (n = 176) and 96.2% on a public NIH dataset (n = 80).

Head-to-head comparison with radiologists: Expert readers assessing the same CTs achieved an AUC of only 0.66 (0.46-0.86), with fair inter-reader agreement (Cohen's kappa = 0.3). Radiologists frequently misclassified indirect findings like focal atrophy or ductal dilatation as indicative of PDA in controls, producing false-positive rates as high as 18%. An ablation study confirmed that textural heterogeneity features from gray-level co-occurrence matrices (GLCM) were the most predictive, consistent with evidence that pancreatic carcinogenesis induces microarchitectural remodeling before a discrete mass forms.

Corroborating studies: Qureshi et al. used a naive Bayes classifier on pre-diagnostic CTs and achieved 86% accuracy in predicting future PDA development. Chen et al. built a multi-institutional radiomic model that outperformed radiologists, especially for tumors smaller than 2 cm. Javed et al. analyzed pancreatic subregions and showed that PDA development is associated with distinct morphologic and textural changes in specific regions of the pancreas, refining localized risk prediction.

TL;DR: REDMOD detects pre-diagnostic PDA on CT with an AUC of 0.98, sensitivity of 95.5%, and specificity of 90.3% at a median lead time of 386 days. Radiologists achieved only an AUC of 0.66 on the same scans (Cohen's kappa = 0.3). GLCM textural features were the strongest predictors. Multiple independent studies corroborate these findings.
Pages 4-5
Body Composition Changes as Early Warning Signs of PDA

AI-driven radiomics models for early PDA detection can extend beyond direct pancreatic imaging by incorporating systemic changes that precede clinical diagnosis. Cancer-induced metabolic alterations, particularly in body composition, have emerged as potential biomarkers. Longitudinal studies indicate that PDA induces significant reductions in visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) well before diagnosis. These findings align with evidence that cachexia, a condition characterized by progressive muscle loss, can manifest even in the pre-clinical stages of pancreatic cancer.

Clinical limitations of this approach: A primary challenge is the requirement for serial CT imaging to track longitudinal changes in metabolic parameters, which is not routinely available outside of specific high-risk cohorts or surveillance programs. Systemic metabolic changes are also not unique to PDA. Conditions such as diabetes, chronic inflammation, and other malignancies can induce similar alterations, leading to high false-positive rates when these features are used in isolation. Cachexia progression also varies among patients, making it difficult to define standardized cutoffs for risk stratification.

The authors suggest that integrating systemic imaging biomarkers with pancreatic-specific radiomic features could improve specificity, but large-scale, multi-institutional prospective validation studies are needed before this combined approach can enter routine clinical practice. The potential value lies in creating a multimodal detection strategy that combines direct pancreatic analysis with broader metabolic profiling.

TL;DR: PDA causes measurable reductions in visceral and subcutaneous fat before clinical diagnosis. However, serial CT is needed to track these changes, and conditions like diabetes and chronic inflammation produce similar metabolic shifts. Combining body composition biomarkers with pancreatic-specific radiomics may improve specificity, but prospective validation is required.
Pages 4-5
AI Systems for Detecting PDA on Diagnostic CT, Including Non-Contrast Scans

Beyond pre-diagnostic detection, a major clinical problem is "missed PDA," where pancreatic lesions are present on imaging but go unidentified due to inadequate pancreatic evaluation, suboptimal contrast timing, or limitations in image quality. AI systems built on convolutional neural networks (CNNs) trained for pancreas segmentation address this gap. One system trained on 696 portal-phase diagnostic CTs with PDAC and 1,080 control images achieved 92% accuracy on an internal test set (1,238 cases: 409 PDAC, 829 controls) and 86% accuracy on an external dataset (194 PDAC, 80 controls). When applied to a simulated high-risk cohort reflecting new-onset diabetes (NOD) patients with END-PAC scores of 3 or higher, it reached 95% accuracy. Notably, although trained only on larger tumors, the model detected PDA on pre-diagnostic CTs acquired 3-36 months before clinical diagnosis at an 84% accuracy rate.

PANDA (Pancreatic Cancer Detection with Artificial Intelligence) demonstrated high accuracy even on non-contrast CTs, an imaging modality previously considered suboptimal for pancreatic cancer detection. Trained on 3,208 patients, PANDA achieved an AUC of 0.986-0.996 across multiple validation cohorts and outperformed radiologists by 34.1% in sensitivity and 6.3% in specificity. It maintained over 90% sensitivity for stage 1 and 2 PDACs. In real-world evaluations involving 20,530 patients, PANDA identified pancreatic malignancies missed by standard radiology reports, demonstrating its potential for opportunistic screening.

Additional models: Liu et al. developed a CNN with excellent sensitivity (0.97-0.99) and specificity (0.99-1.00) on local test sets, with an AUC of 0.92 on a cross-racial US external validation cohort. However, concerns were raised about the external dataset containing a heterogeneous mix of pancreatic pathologies (including neuroendocrine tumors and intraductal mucinous neoplasms), not solely PDA. A hybrid approach combining deep learning with radiomic feature extraction achieved an AUC of 0.99 and 99.2% accuracy on 125 CT scans. Patch-based AI models dividing the pancreas into smaller segments showed over 85% accuracy across Taiwanese and US datasets, with sensitivity exceeding 90% for stage 1 and 2 tumors.

Non-contrast CT models: Qu et al. introduced a causality-inspired method for CECT-based diagnosis, achieving an average accuracy of 0.87 across three independent test sets. Qiu et al. proposed a multiresolution-statistical texture analysis architecture for radiomics-based PDA diagnosis on non-contrast CTs, reporting an AUC of 0.79. Li et al. developed a causality-driven graph neural network that achieved stable accuracies of 0.81-0.85 across independent multicenter test cohorts on non-contrast CT, demonstrating the evolving potential of AI even without intravenous contrast.

TL;DR: CNN-based detection systems achieve 86-92% accuracy on diagnostic CT. PANDA reached AUC 0.986-0.996 on non-contrast CT and outperformed radiologists by 34.1% in sensitivity. In 20,530 patients, it caught cancers missed by standard radiology. Hybrid radiomic models hit AUC 0.99. Non-contrast graph neural networks achieve 0.81-0.85 accuracy across multicenter cohorts.
Pages 6-7
Dataset Gaps, Imaging Variability, and the Explainability Problem

Scarcity of pre-diagnostic data: A major barrier is the lack of well-curated, pre-diagnostic imaging datasets for training and validating AI models. Unlike cancers with established screening pathways, PDA has no standardized surveillance for high-risk individuals, limiting access to routine early-stage imaging. This results in a small pool of annotated pre-diagnostic CTs, which restricts model development and generalizability. Variability in imaging acquisition and annotation across institutions further complicates standardization and increases the risk of overfitting.

Public dataset quality: Publicly available datasets often lack histopathologic confirmation, consistent annotations, and imaging quality control. Approximately 25% of these datasets include biliary stents, which introduce bias by associating stents with PDAC and distorting AI predictions. Many studies fail to account for such biases, artificially inflating performance metrics and limiting real-world translation. Mitigating these issues requires standardized dataset curation, bias reduction strategies, and rigorous validation across diverse imaging protocols.

Interinstitutional variability: Differences in scanner technology, contrast timing, resolution, and reconstruction techniques create inconsistencies in AI-driven pancreas segmentation and lesion detection. Multi-institutional collaboration is critical to harmonizing imaging protocols and establishing large-scale registries. Federated learning frameworks can support decentralized model training while preserving patient privacy. Future data collection needs to include standardized, multicentric prospective data registries with harmonized acquisition protocols and structured clinical annotations conforming to FAIR (Findable, Accessible, Interoperable, Reusable) data principles.

Explainability: Black-box models lacking transparency reduce clinician trust and hinder regulatory approval. While techniques like Gradient-weighted Class Activation Mapping (Grad-CAM) have been used in some PDA models to visualize regions of model focus, more advanced and clinically intuitive explainable AI (XAI) methods are needed. Future systems should incorporate heatmaps, uncertainty quantification, and explainable AI methodologies to facilitate clinical integration and build clinician confidence.

TL;DR: Key barriers include scarce pre-diagnostic imaging datasets, public datasets with 25% biliary stent contamination that biases models, and interinstitutional variability in scanner technology and imaging protocols. Black-box opacity remains a major obstacle to clinical adoption. Federated learning, FAIR data principles, and advanced XAI methods like Grad-CAM are proposed solutions.
Pages 7-8
Clinical Trial Design and the Road to Prospective Validation

Validating AI-driven PDA detection requires prospective, multi-institutional trials with risk-stratified cohorts. The authors recommend prioritizing high-risk individuals, such as those with glycemically defined new-onset diabetes (NOD) and END-PAC scores of 3 or higher. Randomization presents ethical challenges because withholding AI-augmented imaging from high-risk individuals could delay diagnosis. A dual-cohort design comparing serial AI-enhanced CT imaging versus standard clinical monitoring is proposed as a pragmatic alternative.

Endpoints: Primary endpoints should include time-to-diagnosis from glycemically defined NOD onset. Secondary endpoints should evaluate stage at detection, AI specificity versus radiologists, the impact of false positives, and overall survival outcomes. Ethical considerations include radiation exposure from serial imaging, the psychological burden of false positives, and management of incidental findings. Serial imaging protocols must balance detection benefits with safety concerns.

Bias mitigation and data harmonization: AI models must be tested across diverse populations and imaging protocols. Standardized contrast-enhanced imaging protocols and multi-site validation will improve reproducibility. Passive electronic medical record (EMR) surveillance of observational cohorts can increase statistical power, enabling comprehensive evaluation of AI's impact on shifting the diagnostic trajectory toward curative intervention. Integration with biobanking efforts will support biomarker discovery and expand AI's role in a broader multimodal PDA detection strategy.

The authors emphasize that clinical adoption requires a multidisciplinary approach involving radiologists, gastroenterologists, oncologists, and AI experts to refine models and ensure they capture clinically relevant imaging markers. Advanced computational strategies including domain generalization, federated learning, and causality-inspired model architectures are highlighted as key directions for mitigating dataset shift and improving real-world performance.

TL;DR: Prospective trials should prioritize NOD patients with END-PAC scores of 3 or higher, using a dual-cohort design (AI-enhanced CT vs. standard monitoring). Primary endpoint: time-to-diagnosis from NOD onset. Federated learning, EMR surveillance, and biobanking integration are proposed to scale validation. Multidisciplinary collaboration across radiology, oncology, and AI is essential.
Citation: Antony A, Mukherjee S, Bhinder K, Murlidhar M, Zarrintan A, Goenka AH.. Open Access, 2025. Available at: PMC12187166. DOI: 10.1159/000546603. License: Open Access.