Artificial intelligence in pancreatic cancer

PMC 2022 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why AI Matters for a Cancer with 11% Survival

Pancreatic cancer (PC) carries the lowest five-year relative survival rate of any major cancer at just 11% in the United States. In 2020, the disease accounted for 495,773 new cases and 466,003 deaths worldwide, representing 2.6% of all new cancer diagnoses but 4.7% of all cancer deaths. The primary driver of this dismal prognosis is late detection. Only about 20% of patients are diagnosed at an early stage, since initial symptoms such as jaundice, fatigue, and indigestion mimic many non-cancer conditions.

Early screening transforms outcomes: A multicenter study demonstrated that patients whose pancreatic cancer was detected through screening had a 5-year survival rate of 73.3% and a median survival of 9.8 years, compared with just 1.5 years for those diagnosed outside of screening programs. These figures underscore the life-or-death importance of early detection and explain why the AI research community has increasingly turned its attention to this disease.

Scope of this review: The authors searched PubMed, Embase, Web of Science, and other databases using the keywords "artificial intelligence," "machine learning," and "pancreatic cancer" for literature published by July 2022. The review covers AI applications across the full pancreatic cancer workflow: medical image analysis (EUS, CT, MRI, PET), pathological examination, biomarker discovery, and prognosis prediction including survival time, recurrence risk, metastasis, and therapy response.

The AI landscape: The paper categorizes relevant algorithms into supervised learning (logistic regression, SVM, random forest, neural networks), unsupervised learning (K-means, PCA), semi-supervised learning, reinforcement learning, ensemble methods (AdaBoost, bagging), and deep learning architectures (CNN, RNN, MLP, GAN). Performance is evaluated primarily by AUC, accuracy, sensitivity, specificity, and dice similarity coefficient (DSC) for segmentation tasks.

TL;DR: Pancreatic cancer has an 11% five-year survival rate, but screening-detected cases achieve 73.3% five-year survival and 9.8-year median survival. This review covers AI applications across imaging (EUS, CT, MRI, PET), pathology, biomarkers, and prognosis prediction from literature through July 2022.
Pages 3-5
AI-Assisted Endoscopic Ultrasound for Pancreatic Lesion Detection

Endoscopic ultrasound (EUS) provides high-resolution images of the pancreas without interference from gas, bone, or subcutaneous fat, and it can guide tissue sampling through fine needle aspiration (FNA) or biopsy (FNB). The review catalogues over a dozen studies applying AI to EUS-based diagnosis, covering differential diagnosis between pancreatic cancer and chronic pancreatitis, detection of pancreatic neuroendocrine tumors (PNET), and classification of intraductal papillary mucinous neoplasm (IPMN).

Differential diagnosis performance: Across the surveyed studies, AI models for differential diagnosis of PC using EUS achieved AUC values of 0.940 to 0.986, accuracy of 80% to 98.26%, sensitivity of 87.59% to 100%, and specificity of 50% to 93.38%. The highest-performing model was by Udristoiu et al., who combined CNN and long short-term memory (LSTM) neural networks on multi-sequence EUS data (grayscale, color Doppler, arterial and venous phase contrast-enhancement, and elastography), achieving an AUC of 0.98 and 98.26% accuracy.

IPMN and segmentation: Kuwahara et al. used ResNet-50 to distinguish benign from malignant IPMN, reaching an AUC of 0.98, 94.0% accuracy, and 95.7% sensitivity. For pancreas segmentation, Iwasa et al. applied U-Net across 100 patients with different pancreatic conditions, achieving a median intersection-over-union of 0.77 using 4-fold cross-validation. Zhang et al. developed a station classification and pancreas segmentation model for EUS training and quality control, with a DSC of 71.5% for segmentation and 82.4% accuracy for station recognition.

Clinical context: While EUS provides excellent image quality, it is invasive and operator-dependent, meaning that improper handling can reduce diagnostic accuracy. AI models that standardize interpretation of EUS images could mitigate this variability, but the reliance on skilled endoscopists for image acquisition remains a bottleneck.

TL;DR: AI models on EUS data achieved AUC 0.940-0.986 and accuracy 80-98.26% for PC differential diagnosis. CNN+LSTM on multi-sequence EUS reached the top AUC of 0.98. ResNet-50 classified malignant IPMN at AUC 0.98 and 94.0% accuracy. U-Net pancreas segmentation achieved median IoU of 0.77.
Pages 5-8
CT-Based AI for Detection, Staging, and Differential Diagnosis

CT is the dominant imaging modality for pancreatic cancer due to its wide availability and high spatial resolution. The review identifies the broadest body of AI research in this modality, spanning PC detection, differential diagnosis, pancreas segmentation, tumor grading, and resectability prediction. Across studies focused on PC diagnosis or precursor lesion detection, AI models achieved AUC of 0.79 to 0.999, accuracy of 77.66% to 99.2%, sensitivity of 76.64% to 100%, and specificity of 85.59% to 98.5%.

Highest-accuracy detection: Chu et al. achieved the highest accuracy (99.2%) by including 190 PDAC patients and 190 healthy controls with 64-MDCT scans. They used 0.75-mm venous phase slices, manual segmentation, minimum-redundancy maximum-relevancy feature selection, and a random forest classifier. All PDAC cases were correctly classified, with only one normal case misclassified, yielding AUC 0.999, sensitivity 100%, and specificity 98.5%. Mukherjee et al. compared KNN, SVM, RF, and XGBoost for PDAC detection, achieving AUC 0.98 and 92.2% accuracy with the best model.

Differential diagnosis: Multiple teams applied radiomics-based approaches for distinguishing between pancreatic subtypes. Ren et al. used an RF classifier on 792 radiomics features from CE-CT to differentiate pancreatic adenosquamous carcinoma (PASC) from PDAC (AUC 0.98). Li et al. used LASSO logistic regression to discriminate focal-type autoimmune pancreatitis (AIP) from PDAC (AUC 0.97). Ziegelmayer et al. applied deep CNN for radiomics feature extraction combined with extremely randomized trees for AIP vs. PDAC prediction (AUC 0.90).

Segmentation and other tasks: Pancreas segmentation DSC values ranged from 60.6% to 91%. Panda et al. developed a two-stage 3D CNN based on modified U-Net, achieving the highest mean DSC of 91% on 1,917 portal venous phase CT scans. Zhou et al. built a 4DCT-based tumor positioning method using ResNet and FPN with a DSC of 98%. For grading, Chang et al. generated a LASSO-based radiomics signature for preoperative PDAC grade prediction (AUC 0.961 for training, 0.770 for external validation).

TL;DR: CT-based AI achieved AUC 0.79-0.999 and accuracy up to 99.2% for PC detection. RF classifier on 64-MDCT data reached AUC 0.999 with 100% sensitivity. Pancreas segmentation DSC ranged from 60.6% to 91%. Differential diagnosis models (PASC vs. PDAC, AIP vs. PDAC) achieved AUC 0.90-0.98.
Pages 8-10
MRI Segmentation, PET Radiomics, and Cross-Modality Comparisons

MRI applications: MRI offers superior soft-tissue contrast and is particularly valuable for detecting small non-contour-deforming tumors, evaluating vascular encasement, and identifying liver and peritoneal metastases. However, it is more expensive than CT and susceptible to artifacts from metal implants. The review covers seven studies applying AI to MRI-based PC tasks. For segmentation, Li et al. attempted cross-modality unsupervised domain adaptation across four MRI modalities (T1, T2, DWI, arterial phase), achieving DSC values of 60.43% to 62.08%. Liang et al. trained a CNN with stochastic gradient descent with momentum (SGDM) and reached 71% DSC for PDAC segmentation.

MRI classification and grading: Cui et al. used multivariate logistic regression on extracted MRI features for BD-IPMN grading (AUC 0.903, specificity 94.8%, sensitivity 73.4%). Cheng et al. directly compared MRI and CT for predicting malignant IPMN using LASSO feature selection with LR and SVM classifiers. The MRI-based model significantly outperformed the CT-based model (AUC 0.940 vs. 0.864), with MRI+SVM achieving 86.7% accuracy, 95.7% sensitivity, and 81.1% specificity.

PET/CT radiomics: PET provides functional metabolic information through FDG uptake, though physiological glucose metabolism in inflamed tissues can cause false positives. Li et al. developed a hybrid feedback SVM-RF (HFB-SVM-RF) model on PET/CT pseudo-color images for PC diagnosis, achieving 96.47% accuracy and 97.51% specificity. Liu et al. used SVM on 502 radiomics features from dual-time PET/CT to distinguish PDAC from AIP (AUC 0.967, accuracy 89.91%). Xing et al. applied XGBoost to PET/CT radiomics for PDAC pathological grade prediction (AUC 0.994).

Across all imaging modalities, the consistent finding is that AI can enhance diagnostic accuracy beyond what single biomarkers or individual radiologists achieve. The complementary strengths of each modality, such as CT for spatial resolution and MRI for soft-tissue contrast, suggest that multimodal AI integration is a promising future direction.

TL;DR: MRI-based models outperformed CT for IPMN prediction (AUC 0.940 vs. 0.864). MRI segmentation DSC ranged from 60.43% to 71%. PET/CT AI achieved up to 96.47% accuracy for PC diagnosis and AUC 0.994 for PDAC grading. Cross-modality comparisons favor combining imaging strengths.
Pages 10-12
AI in Digital Pathology, Liquid Biopsy, and Multi-Omics

Digital pathology: AI can analyze H&E-stained and immunofluorescent-stained whole slide images (WSI) as well as FNA/FNB cytology samples. Song et al. built systems for automatic epithelial cell nuclei segmentation and morphological feature extraction, achieving 94.38% accuracy for PDAC diagnosis and 77.03% for PDAC grading using SVM classifiers. Kriegsmann et al. used CNN for automatic tissue category localization on whole slides, achieving 73% balanced accuracy for non-aggregated and 92% for aggregated categories. Niazi et al. proposed a deep learning method on Ki67-stained biopsy images for PNET identification (97.8% sensitivity, 88.8% specificity).

FNA and FNB analysis: Momeni-Boroujeni et al. used K-means clustering to segment cell clusters from FNA slides, then trained a multilayer perceptron neural network to discriminate benign from malignant pancreatic cytology, achieving 100% accuracy. Naito et al. trained CNN on FNB-based slides for PDAC assessment (AUC 0.984). Kurita et al. combined cyst fluid biomarkers, FNA cytological features, and clinical variables in neural networks for differentiating malignant from benign pancreatic cystic lesions (AUC 0.966, accuracy 92.9%).

Biomarker-based approaches: The review covers genomics, transcriptomics, proteomics, exosomes, and multi-omics. Gao et al. combined SELDI-TOF-MS protein peaks with CA19-9 to classify PC patients (AUC 0.971). Alizadeh Savareh et al. identified five circulating miRNAs (miR-663a, miR-1469, miR-92a-2-5p, miR-125b-1-3p, miR-532-5p) using PSO+ANN that achieved 93% accuracy for PC diagnosis. Yu et al. analyzed extracellular vesicle long RNA profiles with SVM to detect PDAC (AUC 0.960) and identify resectable stage I/II cancers (AUC 0.949).

Multi-omics integration: Yang et al. constructed a multi-analyte panel combining extracellular vesicle miRNAs, mRNAs, cfDNA, and CA19-9, achieving AUC 0.95 and 92% accuracy for PDAC diagnosis, with 84% accuracy for disease staging. Zhang et al. reported a laser desorption/ionization mass spectrometry-based liquid biopsy that achieved 100% accuracy for PC detection in an internal validation cohort. These results demonstrate the value of combining multiple data types rather than relying on any single biomarker.

TL;DR: Digital pathology AI achieved up to 94.38% accuracy for PDAC diagnosis and 100% accuracy for benign vs. malignant cytology. Biomarker models using proteomics (AUC 0.971), circulating miRNAs (93% accuracy), and exosomal RNA (AUC 0.960) show strong performance. Multi-omics panels reached AUC 0.95 and 92% accuracy for PDAC diagnosis.
Pages 13-15
Survival, Recurrence, Metastasis, and Treatment Response

Survival prediction: Classical prognostic factors like lymph node status and AJCC stage do not fully account for long-term survival differences in pancreatic cancer. The review covers radiogenomics approaches where ML detects gene expression profiles (p53, PD-L1, FAP, ITGAV) from CT and MRI images, as well as direct radiomics-based survival models. Tang et al. proposed a wavelet-based deep learning method trained on multi-omics data (genomic, epigenomic, and clinical) that outperformed traditional LASSO for prognosis prediction (AUC 0.937 vs. 0.802). Beak et al. used multi-omics data with logistic regression, achieving accuracy of 0.776 and AUC of 0.769 for overall survival.

Recurrence risk: Li et al. collected demographics, biochemical, and pathological variables from multi-institutional PDAC patients and tested six ML algorithms. SVM and KNN models achieved the highest accuracy for predicting 1-year (70.9%) and 2-year (73.4%) recurrence, respectively. Combining radiomics features with clinical data further improved prediction. He et al. used CT radiomics analysis three months post-surgery with multivariable logistic regression, achieving AUC 0.742 for the combined model versus 0.533 for clinical data alone. Li et al. developed intratumoral and peritumoral radiomics-clinical models for recurrence prediction (AUC 0.764 for 1-year, 0.773 for 2-year).

Metastasis prediction: An et al. analyzed preoperative dual-energy CT images using ResNet-18 for lymph node metastasis classification. The optimal 100+150 keV combination yielded AUC 0.87, which improved to AUC 0.92 when integrating clinical features (CT-reported T stage, LN status, glutamyl transpeptidase, and glucose). Multiple studies using CT radiomics with multivariable logistic regression achieved AUCs of 0.75 to 0.912 for lymph node metastasis prediction. Zambirinis et al. performed liver radiomics on preoperative CE-CT to predict early liver metastasis after PDAC resection (AUC 0.76).

Therapy response: AI has been applied to predict chemotherapy, radiotherapy, immunotherapy, and surgical outcomes. For chemotherapy, Kaissis et al. used gradient boosted trees to identify PDAC molecular subtypes (KRT81+ vs. KRT81-) from MRI, finding that KRT81+ patients responded significantly better to gemcitabine-based therapy (HR 2.33). For stereotactic body radiotherapy (SBRT), Simpson et al. used random forest on 0.35T MRI radiomics features for treatment response prediction (AUC 0.81-0.845). For immunotherapy, Bian et al. developed XGBoost models to predict tumor-infiltrating lymphocytes from CT radiomics (AUC 0.79), since TIL status is associated with immunotherapy response.

TL;DR: Multi-omics DL survival prediction outperformed LASSO (AUC 0.937 vs. 0.802). Recurrence models combining radiomics and clinical data reached AUC 0.742-0.773. Lymph node metastasis prediction achieved AUC 0.75-0.92. SBRT response prediction reached AUC 0.81-0.845, and TIL prediction for immunotherapy achieved AUC 0.79.
Pages 15-17
Data Quality, Reproducibility, and Interpretability Challenges

Data accessibility and bias: A quality assessment of public pancreas imaging datasets found that a substantial proportion of CT images were unsuitable for AI due to biliary stents and other artifacts. Minorities are often underrepresented in clinical trials, and insufficient data on diverse populations may produce algorithms that fail to account for patient diversity. Most studies reviewed were single-center and retrospective in design, limiting the generalizability of reported performance metrics.

Reproducibility concerns in radiomics: The review highlights multiple sources of variability that challenge reproducibility: intra-individual test-retest repeatability, image-acquisition technique differences, multi-machine variability, segmentation method differences, radiomics feature definitions, and parameter settings. Many studies reused the same dataset for both model development and validation, lacking external validation entirely. The authors note that consistency correlation coefficients (CCCs) should be used to assess feature repeatability, and only repeatable features should be retained for model construction. Standardizing image acquisition, segmentation, and feature extraction workflows is essential for progress.

Interpretability vs. performance: A fundamental trade-off exists between model performance and explainability. Deep learning models typically achieve the best results but remain the least explainable because they are purely data-driven. The review cites the real-world example of IBM Watson Health's cancer AI algorithm, which was used across hundreds of hospitals but was found to contain operational errors, illustrating the consequences of deploying opaque systems. Three approaches to address DL interpretability are discussed: proxy models (using traditional statistics to explain DL behavior), visualization of internal mechanisms, and internal interpretability where models explain which inputs were most influential.

Lack of prospective validation: Most AI devices approved by the FDA have undergone only retrospective evaluation. The absence of prospective studies creates a risk of unexpected failures during real clinical deployment. The review emphasizes that before computer-aided diagnosis systems can be used in the clinic, they must be validated for both safety and efficacy to avoid patient harm.

TL;DR: Key limitations include poor data quality in public datasets (biliary stent artifacts), underrepresentation of minorities, widespread lack of external validation, radiomics reproducibility problems across institutions, the performance-vs-explainability trade-off in deep learning, and near-total reliance on retrospective study designs.
Pages 17-18
Multimodal Integration, Electronic Health Records, and Collaborative Development

Multimodal and multi-omics fusion: Training AI on a single imaging modality or biomarker type is insufficient for the complexity of pancreatic cancer. The review identifies multimodal feature integration, combining image data with multi-omics information (genomics, transcriptomics, proteomics), as a critical next research direction. Studies that combined CT images with serum tumor markers or radiomics with clinical variables consistently outperformed single-source models, suggesting that the future of pancreatic cancer AI lies in data fusion.

Electronic health records for early screening: EHR-based AI screening represents a potentially transformative approach. Malhotra et al. demonstrated that logistic regression applied to EHRs could indicate cancer risk over a decade before clinical diagnosis. Roch et al. developed a natural language processing system to identify pancreatic cyst keywords in electronic medical records with 99.9% sensitivity and 98.8% specificity. These approaches could enable passive screening of millions of patients without additional tests or costs.

Novel diagnostic systems: Emerging platforms include Raman spectroscopy combined with CNN models for tissue classification (AUC close to 0.99), 3D virtual pancreatography systems for non-invasive lesion classification, and automated liquid biopsy cell enumeration (ALICE) for identifying circulating tumor cell subpopulations. These technologies could supplement or eventually replace some invasive diagnostic procedures.

Collaborative development: The authors emphasize that advances in AI for pancreatic cancer require coordinated effort among clinicians, basic scientists, statisticians, and engineers. As computing costs decrease and biotechnology improves, the path forward depends on building large, diverse, multi-institutional datasets and establishing standardized evaluation frameworks. The 2020 AI and Early Detection of Pancreatic Cancer Virtual Summit and the Alliance of Pancreatic Cancer Consortia meetings have both highlighted AI as a priority for the field.

TL;DR: Future priorities include multimodal data fusion (imaging + multi-omics), EHR-based passive screening (one study detected risk a decade before diagnosis), novel diagnostic platforms (Raman spectroscopy AUC near 0.99, NLP-based cyst detection at 99.9% sensitivity), and multi-disciplinary, multi-institutional collaboration to build standardized datasets.
Citation: Huang B, Huang H, Zhang S, et al.. Open Access, 2022. Available at: PMC9576619. DOI: 10.7150/thno.77949. License: cc by.