Breast Cancer Detection Using Convolutional Neural Networks: A Deep Learning-Based Approach

PMC (Open Access) 2024 AI 7 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Deep Learning Is Transforming Breast Cancer Detection

Breast cancer remains one of the leading causes of mortality among women worldwide, particularly in low- and middle-income countries where limited healthcare access and delayed diagnosis contribute to poor outcomes. Traditional detection methods such as mammography and histopathology-based examinations depend on expert radiologists and pathologists, which introduces subjectivity, high cost, and potential misdiagnoses. The manual interpretation pipeline is resource-intensive and time-consuming, creating a bottleneck that delays critical diagnoses for millions of patients globally.

The promise of CNNs: Convolutional neural networks (CNNs) have emerged as a powerful alternative, demonstrating superior performance in medical image classification and tumor detection by extracting hierarchical features automatically from large-scale datasets. Several studies have achieved an area under the curve (AUC) of up to 0.98 and sensitivity values exceeding 90%, suggesting that AI-driven approaches can match or surpass human expert performance. This review paper examines and compares multiple deep learning architectures, feature extraction techniques, and optimization strategies to determine which approaches are most effective for breast cancer classification.

High-dimensional challenges: One of the major obstacles in cancer detection using deep learning is the high dimensionality of datasets, particularly gene expression profiles. The imbalance in sample sizes between malignant and benign cases further complicates the training process, leading to biased predictions. Researchers have proposed advanced gene selection techniques, such as the Kullback-Leibler (KL) divergence method, which selects genes with higher divergence as model features. This approach has achieved an AUC of 0.99 in lung cancer prediction and shows potential for breast cancer detection as well.

Scope of this review: The paper provides a comprehensive review of recent advancements in CNN-based breast cancer detection. It evaluates deep learning architectures including standard CNNs, recurrent neural networks (RNNs), and hybrid models, highlighting their strengths, limitations, and applicability in medical image classification. The authors also address emerging techniques such as transfer learning, data augmentation, and ensemble learning that improve robustness and reliability of detection systems.

TL;DR: Breast cancer detection traditionally depends on expert radiologists, introducing subjectivity and delays. CNNs have achieved AUC values up to 0.98 and sensitivity above 90%. This review compares multiple deep learning architectures and optimization strategies to identify the most effective approaches for automated breast cancer classification.
Pages 3-4
The Kaggle Dataset: 569 Samples With 33 Tumor Morphology Features

The study utilizes a dataset obtained from Kaggle consisting of 569 individual instances, each described by 33 features related to tumor morphology. These features include measurements such as radius, texture, perimeter, and area, all of which characterize cell nuclei observed in breast tissue samples. The target variable, called "diagnosis," is a binary classification label where "M" denotes malignant tumors and "B" denotes benign tumors. This dataset provides a well-structured foundation for training and comparing multiple deep learning models on the same classification task.

Preprocessing and cleaning: Before model training, the raw data underwent several preprocessing steps. Redundant columns (such as "Unnamed: 32") were removed, and missing values were handled to maintain dataset integrity. Min-Max scaling was applied to normalize feature values between 0 and 1, ensuring uniform feature distribution and enhancing convergence during model training. The normalization step is critical because features like "area" can have values in the thousands, while "smoothness" typically ranges near 0.1, and these scale differences can confuse neural networks during learning.

Data splitting and augmentation: The dataset was partitioned into training (70%), validation (15%), and test (15%) subsets using stratified sampling to maintain class balance across all splits. To enhance model generalization and reduce overfitting, data augmentation techniques such as rotation, flipping, and contrast adjustments were employed. These augmentation methods artificially expand the effective training set by creating transformed versions of existing samples, helping the models learn more robust feature representations that generalize better to unseen data.

Training framework: Model training and optimization were conducted using TensorFlow and PyTorch frameworks. Hyperparameter tuning was applied to optimize learning rates, batch sizes, dropout rates, and weight initialization strategies. The categorical cross-entropy loss function was employed for multi-class classification tasks, with Adam and RMSprop optimizers used for gradient-based optimization. K-fold cross-validation was implemented to ensure model robustness across different dataset partitions, preventing results from being skewed by a single favorable or unfavorable split.

TL;DR: The study uses a Kaggle dataset of 569 instances with 33 tumor morphology features, split 70/15/15 for training, validation, and testing. Data was cleaned, normalized with Min-Max scaling, and augmented with rotation and flipping. Models were trained in TensorFlow and PyTorch with K-fold cross-validation for robust evaluation.
Pages 4-5
Seven Deep Learning Models Compared: From VGG16 to CNN+LSTM Hybrids

The study implemented and compared seven distinct deep learning architectures to determine which approach delivers the best breast cancer classification performance. The three standalone CNN architectures tested were VGG16, ResNet, and EfficientNet. VGG16 achieved 96.1% accuracy with an AUC-ROC of 0.97. ResNet performed better at 97.4% accuracy and 0.98 AUC-ROC, benefiting from its residual connections that allow gradients to flow through very deep networks without degradation. EfficientNet scored 94.8% accuracy and 0.96 AUC-ROC, using a compound scaling method that uniformly scales network width, depth, and resolution.

Sequential models (RNN and MLP): Beyond CNNs, the study tested a recurrent neural network built on Long Short-Term Memory (LSTM) cells and a standalone Multilayer Perceptron (MLP). The LSTM model achieved 89.7% accuracy with 0.91 AUC-ROC by processing sequential dependencies within tumor feature sets, utilizing gated mechanisms to retain long-term dependencies and relevant information. The MLP architecture included multiple hidden layers with dropout regularization and batch normalization, optimized using Adam and Stochastic Gradient Descent (SGD) optimizers, reaching an accuracy of approximately 89.1% with 0.90 AUC-ROC.

Hybrid models: The most compelling results came from hybrid architectures that combine CNN feature extraction with secondary classifiers. The CNN+LSTM hybrid achieved the highest accuracy of 98.2% with an AUC-ROC of 0.99, effectively capturing both spatial and sequential dependencies within tumor feature sets. The CNN+MLP hybrid reached 96.4% accuracy with 0.98 AUC-ROC by leveraging hierarchical CNN feature extraction with deep fully connected layers for refined classification. These hybrid designs demonstrate that pairing CNN-based feature learning with a complementary classification mechanism consistently outperforms any single architecture used alone.

Performance evaluation metrics: All models were evaluated using five key metrics: accuracy, precision, recall, F1-score, and AUC-ROC. The CNN+LSTM hybrid led across all five metrics, achieving 97.6% precision, 98.1% recall, and 97.5% F1-score in addition to its 98.2% accuracy. The consistent superiority of hybrid models across all metrics, not just accuracy, suggests that these architectures produce more balanced and clinically reliable predictions than standalone models.

TL;DR: Seven models were compared. The CNN+LSTM hybrid achieved the best results: 98.2% accuracy, 97.6% precision, 98.1% recall, and 0.99 AUC-ROC. Standalone CNNs (VGG16, ResNet, EfficientNet) ranged from 94.8% to 97.4% accuracy, while LSTM and MLP scored around 89%. Hybrid architectures consistently outperformed single-model approaches.
Pages 5-6
How Transfer Learning, Focal Loss, and Vision Transformers Boost Detection

Transfer learning for medical imaging: Transfer learning has been widely used to improve CNN-based breast cancer detection models. Pre-trained architectures such as VGG16, ResNet, InceptionV3, and DenseNet have been fine-tuned for medical imaging tasks, significantly boosting classification accuracy even when annotated medical images are scarce. These models are initially trained on large-scale general image datasets (like ImageNet), learning rich visual feature representations that can then be adapted to medical imaging with relatively small amounts of domain-specific data. Studies have shown that transfer learning improves cancer classification accuracy, with AUC values reaching 0.98 in independent test sets.

Addressing class imbalance with focal loss: The class imbalance issue in medical image datasets significantly impacts CNN performance. In breast cancer datasets, benign cases often outnumber malignant ones, which can bias models toward predicting the majority class and missing critical malignant diagnoses. To address this, focal loss functions have been introduced to improve model sensitivity to minority class samples. By adjusting the learning process to down-weight easy examples and focus training on hard-to-classify minority cases, focal loss ensures that malignant cases receive adequate attention. Studies employing focal loss and augmentation techniques have reported sensitivity of 86.1% and specificity of 80.1% for breast cancer detection in digital mammograms.

Generative data augmentation: Beyond traditional augmentation (rotation, flipping), generative adversarial networks (GANs) and synthetic oversampling methods have been employed to create entirely new synthetic malignant samples. GANs learn the underlying distribution of minority-class data and generate realistic synthetic images that expand the training set. This reduces overfitting and enhances the model's ability to generalize across different datasets, addressing one of the key bottlenecks in medical AI where labeled data is expensive and limited.

Vision transformers: The application of vision transformers (ViTs) in breast cancer detection has gained attention as a newer alternative to CNNs. ViTs outperform traditional CNNs in histopathological image classification by modeling long-range dependencies more effectively, achieving accuracy improvements of up to 2% over conventional deep learning models. Unlike CNNs, which process images through local receptive fields, transformers use self-attention mechanisms to capture relationships between distant regions of an image, potentially identifying subtle patterns that convolutional filters might miss.

TL;DR: Transfer learning with pre-trained models (VGG16, ResNet, DenseNet) reaches 0.98 AUC even with limited labeled data. Focal loss combats class imbalance by focusing training on hard minority samples (86.1% sensitivity). GANs generate synthetic training data. Vision transformers improve accuracy by up to 2% over standard CNNs by modeling long-range image dependencies.
Pages 6-7
Beyond Imaging: Combining Genomics, Health Records, and Multi-Cancer Detection

Multimodal learning: One of the most promising directions explored in this review is multimodal learning, which integrates imaging data with genomic information and electronic health records. Rather than relying on a single data source, multimodal approaches combine mammographic images with gene expression profiles to provide a more comprehensive understanding of tumor characteristics. Researchers have employed sentence transformers such as SBERT and SimCSE to extract DNA sequence representations, which, when fed into machine learning models like XGBoost and LightGBM, enhance cancer classification accuracy. These approaches have achieved an accuracy of approximately 75%, which, while lower than imaging-only methods, demonstrates potential for improving cancer detection beyond imaging alone.

Applicability to other cancer types: The CNN-based approaches reviewed in this paper extend well beyond breast cancer. Pre-trained CNNs have achieved high accuracy in lung cancer detection, with KL divergence-based feature selection contributing to AUC scores as high as 0.99. Deep learning models applied to skin cancer detection have demonstrated superior performance in early-stage melanoma classification, leveraging lesion parameters such as symmetry, color, size, and shape for improved differentiation between benign and malignant cases. Research in prostate and colorectal cancer classification has also benefited from CNN-based models, with studies indicating that transfer learning enhances model generalization across different imaging modalities.

Consistent cross-cancer performance: CNN-based models have achieved AUC values ranging from 0.91 to 0.98 across different cancer types, reinforcing the effectiveness of deep learning in oncology diagnostics broadly. This cross-cancer consistency suggests that the core architectural principles and training strategies identified in breast cancer research, including transfer learning, data augmentation, and hybrid model design, form a transferable toolkit that can be adapted to other malignancies with relatively modest domain-specific adjustments.

Conventional methods as a baseline: The review contextualizes these AI advances against conventional diagnostic techniques. Mammography and histopathological analysis, while well-established, necessitate expert interpretation that makes them resource-intensive, time-consuming, and susceptible to human error. Automated detection methods utilizing deep learning have gained significant traction precisely because they can extract complex patterns and achieve high classification accuracy without requiring a specialist for every scan, potentially democratizing access to high-quality cancer screening.

TL;DR: Multimodal approaches combining imaging with genomics and health records show promise, reaching 75% accuracy with DNA sequence transformers. CNN-based models achieve 0.91 to 0.98 AUC across lung, skin, prostate, and colorectal cancers, demonstrating that breast cancer deep learning techniques transfer effectively to other malignancies.
Pages 7-8
The Black-Box Problem: Explainable AI and Real-World Deployment Barriers

The interpretability gap: Despite the impressive accuracy of deep learning models, their deployment in clinical practice faces a fundamental barrier: interpretability. CNN models often function as black-box systems with limited explainability, making it difficult for clinicians to understand why a model classified a tumor as malignant or benign. In a medical context where a single incorrect prediction can endanger a patient's life, this opacity undermines trust and limits adoption. Clinicians need to understand the reasoning behind AI predictions to integrate them responsibly into diagnostic workflows.

Explainable AI techniques: To address this transparency deficit, researchers have proposed explainable AI (XAI) techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Shapley Additive Explanations (SHAP). Grad-CAM produces visual heatmaps that highlight which regions of an image most influenced the model's classification decision, allowing radiologists to verify that the model is focusing on clinically relevant tissue areas. SHAP, grounded in cooperative game theory, assigns each input feature a contribution score that quantifies its impact on the final prediction. Studies applying these XAI methods have demonstrated improved clinician trust and interpretability, though further research is needed to optimize these techniques for widespread clinical adoption.

Generalization across imaging platforms: Another critical challenge is generalization across different imaging modalities. CNNs trained on one dataset may struggle when applied to images from different scanners, institutions, or patient populations. Domain adaptation techniques have been explored to improve model robustness across heterogeneous mammography platforms, allowing deep learning models to maintain high accuracy even when trained on diverse datasets. Domain adaptation and lightweight CNNs have demonstrated improvements in generalization by 1-3% across different datasets, further validating their potential for deployment in varied clinical environments.

Lightweight architectures for low-resource settings: Lightweight CNN architectures have been proposed to address computational constraints in low-resource healthcare settings. These compact models reduce the reliance on high-performance computing resources while maintaining diagnostic accuracy, facilitating the deployment of AI-driven cancer detection systems in developing countries and rural hospitals where expensive GPU hardware may not be available. This is particularly important given that breast cancer mortality is disproportionately high in low- and middle-income countries where healthcare infrastructure is most limited.

TL;DR: Deep learning models face three deployment barriers: black-box opacity (addressed by Grad-CAM and SHAP), poor cross-platform generalization (addressed by domain adaptation, yielding 1-3% improvement), and high computational cost (addressed by lightweight architectures for low-resource settings). All three must be solved for real-world clinical adoption.
Pages 8-10
Federated Learning, Transformer Models, and the Path to Clinical Integration

Key findings summarized: The study demonstrates that CNN-based models, when integrated with advanced optimization techniques such as transfer learning, data augmentation, and feature selection, significantly enhance classification accuracy and generalization across diverse datasets. The hybrid CNN+LSTM model achieved the highest accuracy at 98.2% with 0.99 AUC-ROC, while standalone architectures ranged from 89% (MLP) to 97.4% (ResNet). These results strongly support the potential of deep learning, particularly hybrid architectures, in revolutionizing automated breast cancer diagnosis by offering a high degree of accuracy and efficiency.

Federated learning for privacy-preserving AI: The future of CNN-based breast cancer detection lies in integrating advanced AI techniques, with federated learning being among the most promising. Federated learning enables collaborative model training across multiple medical institutions without sharing patient data, addressing both privacy concerns and regulatory requirements like HIPAA. Recent research suggests that federated learning approaches can achieve accuracy improvements of up to 2% while preserving data privacy, making them a viable path toward building more robust models trained on institutionally diverse datasets without compromising patient confidentiality.

Transformer-based models and explainable AI: The authors recommend that future research should explore transformer-based models, which have already shown 2% accuracy improvements over conventional CNNs in histopathological classification. Combined with explainable AI techniques, transformers could provide both superior pattern recognition and transparent decision-making. The expansion of publicly available annotated datasets and the development of interpretable AI models will further enhance the reliability and acceptance of deep learning in clinical practice. The observed improvements in predictive accuracy reaching up to 98% reinforce the feasibility of deep learning as a dependable tool for real-world clinical applications.

Clinical integration outlook: The integration of AI in clinical workflows has the potential to significantly reduce human error, enhance diagnostic consistency, and expand access to high-quality cancer screening in both developed and resource-limited healthcare settings. By assisting radiologists and oncologists in making faster, more reliable diagnostic decisions, these AI-driven models can ultimately improve early detection rates and patient outcomes. However, realizing this potential requires addressing remaining challenges around dataset diversity, model interpretability, and regulatory approval before widespread deployment becomes feasible.

TL;DR: Hybrid CNN+LSTM models reach 98.2% accuracy with 0.99 AUC-ROC for breast cancer detection. Future priorities include federated learning (up to 2% accuracy gain while preserving privacy), transformer architectures, expanded public datasets, and explainable AI. Clinical integration promises to reduce diagnostic errors and improve early detection across all healthcare settings.
Citation: Nasir F, Rahman S, Nasir N.. Open Access, . Available at: PMC12049196. License: cc by.