Skin Lesion Analysis for Melanoma Detection Using the Novel Deep Learning Model Fuzzy GC-SCNN

Cancers 2022 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Automated Melanoma Detection Matters and What This Paper Proposes

Melanoma is the most dangerous form of skin cancer, affecting the melanin-producing cells in the skin. Globally, approximately 5.4 million new skin cancer cases are reported each year, and in the United States alone the number rose from 95,360 in 2017 to 207,390 in 2021. Early detection is critical for reducing mortality, but accurate diagnosis depends heavily on dermatoscopic training and clinical experience. Manual inspection using conventional criteria such as color, diameter, and asymmetry is time-consuming and subject to inter-observer variability.

Existing machine learning approaches: Prior work applied gradient boosting, support vector machines (SVM), K-Nearest Neighbor (KNN), and Quadtree methods to classify skin lesion features extracted from techniques like the grey level co-occurrence matrix (GLCM). Deep learning models, particularly convolutional neural networks (CNNs), have shown they can match or even outperform dermatologists in lesion segmentation. A stacked CNN model with an improved loss function previously achieved 94.8-98.4% classification accuracy, but struggled with inhomogeneous features and fuzzy lesion boundaries.

The proposed approach: This paper introduces fuzzy-based GrabCut-stacked convolutional neural networks (Fuzzy GC-SCNN), a hybrid pipeline that combines fuzzy logic image preprocessing, GrabCut segmentation, stacked CNN feature extraction (using Inception-V3, Xception, and VGG-19), and an enhanced SVM classifier. The goal is to simultaneously improve classification accuracy and reduce processing time compared to existing single-model approaches.

TL;DR: Skin cancer cases in the US more than doubled from 95,360 (2017) to 207,390 (2021). This paper proposes Fuzzy GC-SCNN, a hybrid deep learning pipeline combining fuzzy preprocessing, GrabCut segmentation, stacked CNNs (Inception-V3, Xception, VGG-19), and enhanced SVM classification to automate melanoma detection from dermoscopy images.
Pages 3-4
Training and Testing Datasets: HAM10000, ISIC 2018, ISIC 2019, and PH2

The authors evaluated their model across multiple publicly available dermoscopy datasets to demonstrate generalizability. The primary benchmark was HAM10000 (Human Against Machine), which contains 10,015 dermatoscopic images of pigmented skin lesions categorized into seven classes: Actinic Keratoses and Intraepithelial Carcinoma (AKIEC), Basal Cell Carcinoma (BCC), Benign Keratosis-like Lesions (BKL), Dermatofibroma (DF), Melanoma (MEL), Melanocytic Nevi (NV), and Vascular Lesions (VASC). The dataset was split 80:20 for training and testing.

ISIC datasets: The ISIC 2018 archive includes 10,015 training images and 1,512 test images across seven lesion categories. ISIC 2019 is substantially larger with 25,531 training images and 8,238 test images divided into nine categories, adding squamous cell carcinoma and an unknown class. Both ISIC datasets are biased towards melanocytic lesions and disregard non-melanocytic lesions, which the authors acknowledge as a limitation.

PH2 dataset: This smaller dataset was designed specifically for melanoma diagnosis. A notable caveat is that many of these datasets contain clinical photographs rather than true dermoscopic images, creating a mismatch between training data and real-world dermatoscopy workflows. The original image sizes ranged widely from 540 x 576 to 2016 x 3024, so all images were resized to a uniform 256 x 256 resolution before processing.

TL;DR: The model was evaluated on HAM10000 (10,015 images, 7 lesion classes), ISIC 2018 (10,015 train / 1,512 test), ISIC 2019 (25,531 train / 8,238 test, 9 classes), and PH2. Images were resized to 256 x 256 from originals as large as 2016 x 3024. An 80:20 train/test split was used.
Pages 3-5
Fuzzy Logic Preprocessing and GrabCut Segmentation

The first stage of the pipeline is fuzzy logic-based image preprocessing. Raw dermoscopy images contain artifacts like skin hair, uneven lighting, and blurry lesion boundaries. The authors map pixel intensities into a fuzzy domain using a logarithmic membership function, then apply trigonometric transformations to enhance contrast across each color channel (R, G, B) independently. This fuzzy enhancement sharpens lesion boundaries that would otherwise be ambiguous. After enhancement, morphological operators (erosion and dilation) remove hair artifacts from the skin surface. Defuzzification converts the enhanced fuzzy-domain image back into standard pixel values.

GrabCut segmentation: After preprocessing, the GrabCut (GC) algorithm segments the lesion from the background. GrabCut is a semi-automated technique that classifies image pixels into three regions: definite background, definite foreground, and uncertain pixels. It models these regions using Gaussian Mixture Models (GMMs), estimating weight, mean, and covariance matrices for each component. The segmentation energy function combines a probability distribution term (how likely each pixel belongs to foreground or background) and a regularization term (enforcing spatial coherence among neighboring pixels). This approach isolates the lesion region more reliably than simple thresholding methods.

Why fuzzy preprocessing matters: Existing literature on skin lesion detection often skips robust image processing and fails to address the uncertainty inherent in lesion boundary detection. Fuzzy logic directly models this uncertainty, producing cleaner segmentation inputs and ultimately reducing false positives and false negatives downstream in the classification step.

TL;DR: Fuzzy logic maps pixel intensities to a fuzzy domain for contrast enhancement and boundary sharpening across R, G, B channels. Morphological operators remove hair artifacts. GrabCut segmentation then uses Gaussian Mixture Models to separate lesion foreground from background, handling uncertain boundary pixels explicitly.
Pages 5-6
Stacked CNN Architecture: Inception-V3, Xception, and VGG-19

After GrabCut segmentation, the stacked CNN (SCNN) module extracts discriminative features from the segmented lesion images. Rather than relying on a single CNN backbone, the authors stack three pre-trained deep learning architectures: Inception-V3, Xception, and VGG-19. Each of these networks was originally trained on ImageNet and then fine-tuned for dermoscopy images. Inception-V3 uses factorized convolutions and auxiliary classifiers for efficient multi-scale feature extraction. Xception extends this idea with depthwise separable convolutions that decouple spatial and channel-wise feature learning. VGG-19 uses a straightforward 19-layer architecture of 3x3 convolutions for deep feature hierarchies.

Stacking mechanism: In the first module, all three pre-trained models independently process each segmented image, producing three separate prediction vectors (P1, P2, P3). These are concatenated into a single combined feature vector. The second module then trains six sub-models during the CNN training phase, and the stacked ensemble is assembled from these sub-models. This concatenated feature vector is passed to the SVM classifier rather than using a standard SoftMax output layer, which allows the SVM to learn optimal decision boundaries over the richer combined feature space.

Rationale for stacking: Single-model approaches have limited discriminative power, especially when lesion images share similar visual textures. By stacking models that capture features at different scales and abstraction levels, the ensemble compensates for individual model weaknesses. The authors specifically note that prior studies using only one model (even deep models) typically plateaued around 85-90% accuracy on multi-class lesion datasets.

TL;DR: Three pre-trained CNNs (Inception-V3, Xception, VGG-19) independently extract features from segmented images. Their prediction vectors are concatenated into a combined feature vector and fed to an SVM classifier, creating an ensemble that outperforms single-model approaches that typically cap around 85-90% accuracy.
Pages 6-7
Enhanced SVM Classifier and Hyperparameter Tuning

The final classification stage uses an enhanced SVM rather than a conventional SoftMax layer. The SVM calculates feature scores via linear mapping on the concatenated feature vectors from the stacked CNN, then computes a loss value. The key innovation is an improved loss function that calculates a weighted score for each pixel in the segmented lesion image, reducing overfitting and minimizing the number of active neurons. This enhanced loss function also reduces the computational load on the segmented images fed into the SVM, directly cutting processing time.

Hyperparameter search: The authors performed a manual grid search over four optimizers (RMSProp, Adam, AdaGrad, Adadelta), two batch sizes (32, 64), two learning rates (0.0001, 0.001), two weight decay values (0.0001, 0.001), two dense layer configurations (4, 5 layers), and two epoch counts (50, 100). This resulted in 64 total configurations tested. The best performance was achieved with Adam optimizer, batch size 32, 4 dense layers, learning rate 0.0001, weight decay 0.0001, and 50 epochs, yielding the lowest loss of 6.26 with a processing time of only 3 milliseconds.

Optimizer comparison: Adam consistently produced lower loss values (6.26-7.27 range) and faster processing times (3-6 ms) than the alternatives. RMSProp was competitive (6.25-7.79 loss), while AdaGrad (6.25-8.18) and Adadelta (6.28-8.19) trailed behind. Larger batch sizes (64) generally increased both loss and processing time across all optimizers. The 5-dense-layer configurations also tended to produce higher loss values than 4-dense-layer configurations.

TL;DR: An enhanced SVM with improved loss function replaces standard SoftMax for final classification. Grid search over 64 hyperparameter combinations found the best setup: Adam optimizer, batch size 32, 4 dense layers, learning rate 0.0001, weight decay 0.0001, 50 epochs, achieving loss of 6.26 at 3 ms processing time.
Pages 8-10
Classification Performance Across All Datasets

HAM10000 results: The Fuzzy GC-SCNN achieved an overall classification accuracy of 99.75% with 100% sensitivity and 100% specificity. Per-class breakdown from the confusion matrix showed AKIEC at 99.21%, BCC at 99.34%, BKL at 100%, DF at 98.44%, MEL at 99.83%, NV at 99.78%, and VASC at 100%. In comparison, the best prior method on HAM10000 was DilatInceptV3 at 90.10% accuracy, 87% sensitivity, and 89.43% specificity. DCN transfer learning achieved 94.92% accuracy but only 80.36% sensitivity. MobileNet managed just 83.1% accuracy with 89% sensitivity.

ISIC 2018 results: The proposed model achieved 99.78% accuracy, 100% sensitivity, and 100% specificity on the ISIC 2018 test set. The previous best was Gessert et al. at 98.70% accuracy (80.9% sensitivity, 98.4% specificity), followed by Ailin et al. at 98.20%. Other approaches ranged from 85.80% (Huang et al.) to 93.81% (Gan et al.). The proposed model outperformed the previous best by approximately 1 percentage point in accuracy while dramatically improving sensitivity from 80.9% to 100%.

ISIC 2019 results: On the larger and more challenging ISIC 2019 dataset, the model achieved 99.51% accuracy, 100% sensitivity, and 100% specificity. The nearest competitor was Molina et al. at 97% accuracy (69.04% sensitivity, 95.92% specificity). Other approaches ranged from 90% (Iqbal et al.) to 94% (Ahmed et al. and Kaseem et al.). The proposed model outperformed the previous state-of-the-art by approximately 2.5 percentage points in accuracy.

Processing efficiency: The overall prediction time for lesion detection was measured at 2.513 milliseconds per image. The enhanced loss function reduced processing time by 25-35 milliseconds compared to standard SVM implementations, while simultaneously increasing accuracy by 2-5% over single-model baselines. This speed makes the model viable for real-time clinical screening applications.

TL;DR: HAM10000: 99.75% accuracy, 100% sensitivity/specificity (vs. prior best 94.92%). ISIC 2018: 99.78% accuracy (vs. prior best 98.70%). ISIC 2019: 99.51% accuracy (vs. prior best 97%). Prediction time: 2.513 ms per image. Processing time reduced by 25-35 ms and accuracy improved by 2-5% over baselines.
Pages 10-11
Scope Constraints and Methodological Caveats

Limited lesion coverage: The model was trained and evaluated on only seven lesion categories from HAM10000 (nine for ISIC 2019). Minute and rare lesion types were not included in the evaluation. The authors explicitly acknowledge that their solution produced the best possible accuracy but focused on a limited set of lesions while neglecting smaller and less common lesion subtypes that dermatologists encounter in practice.

Dataset biases: Both PH2 and ISIC datasets are biased towards melanocytic lesions and disregard non-melanocytic lesions. Furthermore, many available training images are clinical photographs rather than true dermoscopic images, creating a distribution mismatch between training data and the images a dermoscopy-based system would encounter in real clinical settings. This domain gap could reduce model performance when deployed in actual practice.

Validation design: The study used a single 80:20 train/test split without cross-validation, which limits confidence in the reported metrics. There was no external validation on an independent clinical cohort. The reported 100% sensitivity and 100% specificity values, while impressive, raise concerns about potential overfitting given the absence of held-out or multi-center validation data. The evaluation was conducted on a single machine (Intel Core i5 3.4 GHz) using Python with Anaconda IDE, and performance on different hardware or in production environments was not assessed.

No clinical integration testing: The study did not compare the model's performance against practicing dermatologists on the same test sets, nor did it evaluate the system in a prospective clinical workflow. The processing time of 2.513 ms per image is promising for real-time use, but no end-to-end latency measurements including image acquisition and display were reported.

TL;DR: Key limitations include coverage of only 7-9 lesion types (missing rare subtypes), melanocytic bias in datasets, clinical-vs-dermoscopic image mismatch, a single 80:20 split without cross-validation, no external or multi-center validation, and no head-to-head comparison with practicing dermatologists.
Pages 11-12
Expanding Lesion Detection and Improving Feature Extraction

Latent factor analysis for feature extraction: The authors propose incorporating latent factor analysis into future versions of the model to improve detection of negligible and minute lesions that were excluded from the current study. Latent factor models can uncover hidden patterns in high-dimensional image data that standard CNN feature extractors might miss, potentially enabling detection of early-stage or atypical lesions that do not conform to standard dermoscopic patterns.

Expanded lesion taxonomy: Future work aims to incorporate more lesion types beyond the current seven to nine categories. This is clinically important because dermatologists routinely encounter dozens of distinct skin lesion subtypes, and a production-grade system would need to handle at least the most common differential diagnoses. Adding more classes will also require larger and more diverse training datasets to maintain high per-class accuracy.

Noise reduction through architecture design: The authors suggest that incorporating noise reduction directly into the neural network architecture (rather than relying solely on preprocessing) could enhance model significance. This could involve attention mechanisms that focus on diagnostically relevant image regions, or adversarial training strategies that make the model robust to image quality variations encountered across different clinical settings and imaging devices.

The broader context from existing literature also points toward transfer learning with larger pre-trained models (e.g., ResNet-101, EfficientNet) and multi-modal approaches that incorporate patient metadata (age, sex, lesion location) alongside image data. Several competing studies already report improved performance when patient clinical information supplements image-only classification, suggesting this is a natural extension for the Fuzzy GC-SCNN pipeline.

TL;DR: Future work includes latent factor analysis for detecting minute lesions, expanding beyond 7-9 lesion categories, embedding noise reduction into the network architecture, and potentially integrating patient metadata. These improvements aim to bridge the gap between benchmark accuracy and real-world clinical deployment.
Citation: Bhimavarapu U, Battineni G.. Open Access, 2022. Available at: PMC9141659. DOI: 10.3390/healthcare10050962. License: cc by.