Diagnostic ability of deep learning in detection of pancreatic tumour

PMC 2023 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Early Detection of Pancreatic Cancer Remains a Critical Challenge

Pancreatic cancer, most commonly pancreatic ductal adenocarcinoma (PDAC), is one of the deadliest malignancies. The 5-year survival rate sits at just 8.7%, and although surgical resection combined with chemotherapy offers the best chance of survival (approximately 31.5% at five years), only 10-20% of patients are candidates for surgery. The remaining 80-90% are diagnosed too late, with widespread or regional metastases that make curative treatment impossible.

Imaging limitations: Computed tomography (CT) is the primary imaging modality for initial pancreatic cancer evaluation, outranking ultrasonography, MRI, and endoscopic ultrasonography. CT has a 70-90% sensitivity for detecting pancreatic adenocarcinoma, with thin-section contrast-enhanced dual-phase multidetector CT considered the modality of choice. However, accurate early diagnosis remains challenging. Patients found incidentally during imaging for other conditions have significantly longer survival than those presenting with clinical symptoms, underscoring the value of early detection.

The deep learning proposition: This paper proposes a system that combines a standard Convolutional Neural Network (CNN) with a novel YOLO model-based CNN (YCNN) to predict pancreatic cancer early from CT scans. The YCNN model integrates YOLOv3 for object detection with DarkNet53 as a backbone, along with Feature Pyramid Networks (FPN) and a Dependencies Computation Module. The system was also tested on urinary biomarker data, providing a dual-modality evaluation approach.

The authors argue that most prior deep learning studies for pancreatic cancer performed only binary classification (cancer vs. no cancer) without localizing lesions simultaneously, and only one prior study reported performance on tumours smaller than 2 cm. This paper aims to address those gaps by combining detection and classification in a single framework.

TL;DR: Pancreatic cancer has an 8.7% five-year survival rate, and 80-90% of patients are diagnosed too late for surgery. CT sensitivity is 70-90%. This study proposes a YOLO-based CNN (YCNN) model that combines tumour detection and classification from CT scans, tested alongside urinary biomarker data.
Pages 3-4
The YCNN Architecture: From Segmentation to Classification

The proposed framework operates as a multi-stage pipeline. First, CT images undergo pre-processing and segmentation using Kapur's thresholding optimized by a Sailfish Optimizer (SFO-KT), a metaheuristic algorithm modelled on sailfish hunting behaviour. The SFO identifies optimal threshold values for separating problematic regions in CT scans by treating the optimization as a search problem, where sailfish positions represent candidate threshold values and sardine positions represent alternative solutions.

ResNet50 for pancreas identification: The system uses ResNet50 to identify CT slices that contain the pancreas. Only transverse-plane CT images confirmed to include pancreatic tissue are passed to the subsequent classification stage. Texture characteristics from the segmentation step are fused into the image data, giving the downstream network richer diagnostic information beyond raw pixel values.

Feature Pyramid Network (FPN): Because pancreatic tumours are often small relative to the overall CT image, the authors constructed an Augmented Feature Pyramid Network that works from the bottom up. Starting with standard FPN layers (Q1 through Q4), an enhanced path applies 3x3 convolutions with stride 2, element-wise summation, and additional convolutions to produce refined feature maps (R1 through R4). This bottom-up augmentation improves the transmission of low-level localization information, which is critical for detecting small tumours that might be lost during successive pooling operations.

Self-Adaptive Feature Fusion (SAFF): Rather than relying on a single feature pyramid level, the SAFF module integrates hierarchical feature maps from multiple layers. Combined with a Dependencies Computation Module that uses softmax-weighted attention to capture spatial relationships between a proposal and its surrounding tissues, this design allows the network to consider both local geometric structures and global context when detecting tumours.

TL;DR: The pipeline uses Sailfish Optimizer-based segmentation, ResNet50 for pancreas localization, an Augmented Feature Pyramid Network for multi-scale feature extraction, and Self-Adaptive Feature Fusion with a Dependencies Computation Module for context-aware tumour detection.
Pages 5-6
CNN Model Design and YOLOv3-Based Classification

CNN architecture: The base CNN model consists of three convolutional layers, each followed by a max-pooling layer, a ReLU activation layer, and a batch normalization (BN) layer. An average-pooling layer reduces feature dimensionality before a fully connected classification layer. The authors used a 0.5 dropout rate between the median and fully connected layers to prevent overfitting. Spatial Dropout between convolutional and max-pooling layers was tested but ultimately discarded because it degraded performance.

Training protocol: The model was trained with a mini-batch size of 32, using cross-entropy loss and the Adam optimizer. Training ran for up to 100 epochs, and the model with the best validation accuracy was selected. A tenfold cross-validation procedure was employed: images were split into 10 groups, with 8 used for training, 1 for validation, and 1 for testing. Each fold rotated as the test set once, and results were averaged across all folds.

YOLOv3 integration: The classification stage uses the full YOLOv3 architecture with DarkNet53 as the backbone and a three-layer spatial pyramid as the neck. The BCE (Binary Cross-Entropy) Loss function was used as the target loss function. Because accurate classification mattered more than precise bounding-box detection for early cancer identification, the authors assigned a larger weight to the classification loss relative to the detection loss. The network was initialized randomly to ensure activation function values fell within a reasonable interval for fast convergence.

Transfer learning and fine-tuning: To prevent overfitting on the relatively small dataset, the DarkNet53 backbone was first pre-trained on ImageNet for image recognition and on the dataset for object classification. A three-layer pyramid detection neck was then added and fine-tuned using the cancer dataset. Images were resized to 224 x 224 pixels, processed in batches of 64 (split into 32 subdivisions due to GPU memory constraints), and trained for 100 epochs with a cosine learning rate schedule starting at 0.01, dropping to 0.001 after a two-epoch warm-up phase.

TL;DR: The CNN uses 3 convolutional layers with 0.5 dropout and Adam optimizer across 100 epochs with tenfold cross-validation. YOLOv3 (DarkNet53 backbone) was pre-trained on ImageNet, then fine-tuned on 224x224 pixel cancer images with a cosine learning rate schedule (0.01 to 0.001) and weighted classification loss.
Pages 7-9
CT Image and Urinary Biomarker Datasets

CT image dataset: The image dataset was collected between June 2017 and June 2018. It consists of 3,494 CT images from 222 patients with pathologically proven pancreatic cancer and 3,451 CT images from 190 patients with healthy pancreases. All images had a slice thickness of 5.0 mm. The images were divided into phase-based subsets and evaluated using tenfold cross-validation for binary classification (cancer vs. non-cancer).

Urinary biomarker dataset: The second dataset comprised urinary biomarkers collected from three patient populations: healthy individuals, patients with pancreatic ductal adenocarcinoma, and patients from malignant pancreatic environments. The cohorts were matched by age and sex. The dataset included 590 samples with features including creatinine levels and four key biomarkers: LYVE1 (Lymphatic Vessel Endothelial Hyaluronan Receptor 1), REG1B (Regenerating islet-derived protein 1 beta), TFF1 (Trefoil factor 1), and REG1A (Regenerating islet-derived protein 1-alpha).

Biomarker relevance: Each of these biomarkers has established associations with pancreatic cancer. REG1B was found to be significantly elevated in the serum of pancreatic cancer patients compared to healthy controls in a 2014 study published in Pancreas. TFF1 levels were significantly higher in both serum (PLOS ONE, 2017) and urine (Oncotarget, 2018) of pancreatic cancer patients. LYVE1 expression has been found upregulated in multiple tumour types including breast, lung, and pancreatic cancer. REG1A serves as a prognostic indicator, with higher levels correlating to poorer outcomes.

The datasets are publicly available on Kaggle, covering both CT scan images and the urinary biomarker panel. This dual-dataset approach allows the researchers to evaluate whether image-based or biomarker-based detection performs better, and whether they can complement each other in clinical practice.

TL;DR: The CT dataset contains 3,494 cancer images (222 patients) and 3,451 healthy images (190 patients) at 5.0 mm slice thickness. The biomarker dataset includes 590 samples with LYVE1, REG1B, TFF1, and REG1A measurements from three patient cohorts matched by age and sex.
Pages 10-14
YCNN Achieves Near-Perfect Accuracy on Both Datasets

The YCNN model delivered striking results across both datasets. On the urinary biomarker dataset, the model achieved 100% accuracy with precision, recall, and F1-score all at 100% (macro and weighted averages). On the CT image dataset, accuracy reached 99.9%. The confusion matrix confirmed near-perfect classification of both cancer and non-cancer cases. The model also achieved an area under the curve (AUC) of 1.00 and an F1 score of 99.9% on an independent testing dataset.

Comparative model performance: The authors benchmarked YCNN against eight other models. On the CT image dataset, MLP achieved 80.6% accuracy, LSTM reached 88.5%, standard CNN hit 97.5%, VGG19 scored 98.5%, ResNet50 reached 98.6%, Inception achieved 97.8%, DenseNet scored 98.4%, and MobileNet came closest at 99.5%. On the urinary biomarker dataset, accuracy was generally lower: MLP at 75.8%, LSTM at 85.9%, CNN at 96.8%, VGG19 at 98.9%, ResNet50 at 97.5%, Inception at 96.5%, DenseNet at 98.6%, and MobileNet at 98.8%. YCNN was the only model to reach 100% on both datasets.

Image vs. biomarker performance: Across all models, the CT image dataset consistently yielded higher accuracy than the urinary biomarker dataset. This suggests that image-based diagnostic tests may have higher predictive power for pancreatic cancer detection compared to urinary biomarkers alone. However, CNN-based architectures (VGG19, ResNet50, DenseNet, MobileNet, YCNN) all performed substantially better than non-convolutional models (MLP, LSTM), confirming the advantage of convolutional feature extraction for this task.

Processing speed: The end-to-end automatic diagnosis takes approximately 16.5 seconds per patient, from initial abdominal CT input to diagnostic output. The algorithm completed computations in about 11 minutes total without failing in any instances during the evaluation runs.

TL;DR: YCNN achieved 100% accuracy on the biomarker dataset and 99.9% on the CT image dataset, with AUC of 1.00 and F1 of 99.9%. The closest competitor was MobileNet at 99.5% (images) and 98.8% (biomarkers). End-to-end diagnosis takes 16.5 seconds per patient.
Pages 14-15
Cross-Prediction Patterns Among Urinary Biomarkers

While the overall classification accuracy on the biomarker dataset was 100% for cancer vs. non-cancer classification, the individual biomarker prediction confusion matrix revealed notable cross-prediction patterns. Creatinine was predicted with 100% accuracy as creatinine. However, the creatinine biomarker was mispredicted as LYVE1 at 34%, REG1B at 26%, and TFF1 at 40% in some evaluations.

Inter-biomarker confusion: LYVE1 was mispredicted as creatinine at 34% and TFF1 at 58%. REG1B was mispredicted as creatinine at 26%, LYVE1 at 54%, and TFF1 at 69%. TFF1 was mispredicted as creatinine at 40% and REG1B at 69%. These cross-prediction rates indicate substantial overlap in the feature distributions of individual biomarkers, which is expected given that multiple biomarkers respond to the same underlying disease process.

Clinical interpretation: These confusion rates do not undermine the diagnostic utility of the system. The primary task is distinguishing cancer patients from healthy controls, not identifying individual biomarkers. The inter-biomarker confusion actually reflects the biological reality that pancreatic cancer causes correlated changes across multiple urinary proteins. The model leverages the combined signal from all biomarkers together, which is why the overall cancer classification remains accurate despite individual biomarker overlap.

TL;DR: Individual biomarker identification showed cross-prediction rates of 26-69% between LYVE1, REG1B, TFF1, and creatinine. This reflects biological signal overlap but does not affect the primary cancer vs. non-cancer classification task, which achieved 100% accuracy using the combined biomarker panel.
Pages 15-16
Key Constraints and Methodological Concerns

Dataset size and diversity: The CT dataset, while substantial at nearly 7,000 images, comes from a limited patient pool (222 cancer patients and 190 healthy controls). The paper does not describe multi-centre data collection or external validation on datasets from different institutions, scanners, or imaging protocols. This raises concerns about generalizability, as deep learning models are known to overfit to institution-specific imaging characteristics.

Perfect accuracy claims: The reported 100% accuracy on both datasets warrants caution. In medical AI research, perfect classification is rare and may indicate overfitting, data leakage between training and test sets, or an insufficiently challenging test distribution. The tenfold cross-validation helps mitigate some of these concerns, but without external validation on completely independent data, the reported performance may not reflect real-world clinical utility.

Limited focus on early-stage and small tumours: Although the paper motivates its work by citing the importance of early detection, the results do not report performance stratified by tumour size or cancer stage. The authors note that most prior studies ignored tumours smaller than 2 cm, but their own study does not explicitly demonstrate superior performance on these early-stage lesions either.

Retrospective design: The study is entirely retrospective, using previously collected CT scans and biomarker samples. Prospective validation, where the model would be tested in a real clinical workflow with consecutive patients, has not been performed. The absence of comparison with radiologist performance in a head-to-head reading study is another gap, as the clinical value of the system ultimately depends on whether it improves upon or complements existing human diagnostic capability.

TL;DR: Key limitations include a limited patient pool (412 total patients), no external validation or multi-centre testing, 100% accuracy claims that may indicate overfitting, no performance breakdown by tumour size or stage, and a purely retrospective design without head-to-head comparison against radiologists.
Page 16
Planned Enhancements and Clinical Potential

Advanced deep learning models: The authors propose enhancing the system by adopting more recent deep learning architectures beyond YOLOv3 and DarkNet53. The rapid pace of development in object detection (YOLOv5, YOLOv8, and transformer-based detectors) could improve both detection sensitivity for small tumours and computational efficiency, potentially reducing the 16.5-second processing time even further.

Dataset expansion and augmentation: Expanding the image dataset and using augmentation techniques to increase colour and orientation variation would help the model generalize across different scanners and imaging protocols. The authors also suggest using synthetic image generation methods, guided by pathology specialists, to create additional training images of pancreatic cancer. This approach could address the fundamental data scarcity problem, as pancreatic cancer is relatively rare compared to other cancer types.

Clinical deployment: The researchers envision two primary clinical applications. First, the model could be used for large-scale pre-diagnosis during routine physical examinations, screening CT scans for signs of pancreatic abnormalities before a radiologist reviews them. Second, the system could assist diagnosis at resource-limited facilities that lack specialist radiologists. The authors note that the system could be deployed through a straightforward web interface requiring no installation.

Saliency maps for interpretability: The model's capacity to generate saliency maps, which highlight the image regions most influential in the diagnostic decision, is identified as an important feature for clinical adoption. Interpretability remains a critical requirement for clinician trust in AI systems, and saliency maps provide a mechanism for pathologists to verify that the model is focusing on diagnostically relevant anatomy rather than artefacts or irrelevant features.

TL;DR: Next steps include upgrading to newer detection architectures, expanding the dataset with synthetic image generation, deploying via a web interface for large-scale screening and resource-limited settings, and leveraging saliency maps to improve clinician trust and model interpretability.
Citation: Dinesh MG, Bacanin N, Askar SS, Abouhawwash M.. Open Access, 2023. Available at: PMC10272117. DOI: 10.1038/s41598-023-36886-8. License: cc by.