IoMT-Based Leukemia Detection with Deep Learning

Overview & Background

Pages 1-2

An IoMT Framework for Automated Leukemia Subtype Identification

This study proposes an Internet of Medical Things (IoMT) framework for the automated detection and classification of leukemia using deep learning. The core idea is to connect IoT-enabled microscopes to a cloud-based system where blood smear images are uploaded, processed by deep learning models, and diagnostic results are returned to clinicians in real time. The authors frame this as a solution for fast, safe, and accurate early-stage leukemia diagnosis, noting that manual examination of blood smears by hematologists is both time-consuming and subject to human error.

Clinical context: Leukemia is a white blood cell (WBC) cancer affecting bone marrow and blood. It is categorized into four subtypes based on progression rate and cell lineage: acute lymphocytic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), and chronic myeloid leukemia (CML). Accurate subtype identification is critical because treatment protocols differ substantially across these four categories. The authors highlight that most prior CAD systems focused on only one or two subtypes, leaving a gap in comprehensive multi-class classification.

IoMT architecture: In the proposed framework, an IoT-enabled microscope captures blood smear images and uploads them to a "leukemia cloud." The cloud runs either a ResNet-34 or DenseNet-121 model to classify the image as healthy, ALL, AML, CLL, or CML. Results are then displayed on the clinician's computer for treatment decisions. The authors also argue this framework is useful during pandemics such as COVID-19, where patients with chronic conditions may be unable to visit hospitals in person, enabling remote diagnosis without breaking quarantine protocols.

Key contribution: Unlike previous studies that classified only healthy-vs-leukemia or detected a single subtype (usually ALL), this work addresses all four leukemia subtypes simultaneously using advanced CNN architectures with transfer learning. The two models tested, ResNet-34 and DenseNet-121, achieved overall average accuracies of 99.56% and 99.91%, respectively, on combined ALL-IDB and ASH image bank datasets.

TL;DR: The paper presents an IoMT cloud-based framework using ResNet-34 (99.56% accuracy) and DenseNet-121 (99.91% accuracy) to classify blood smear images into five categories: healthy, ALL, AML, CLL, and CML, enabling remote, automated leukemia diagnosis.

Clinical Background

Pages 2-3

The Four Leukemia Subtypes and Why Classification Matters

The paper provides a detailed breakdown of the four leukemia subtypes that the model must distinguish. Acute Lymphocytic Leukemia (ALL) is most common in children and involves rapid overproduction of immature WBCs in the bone marrow. Its symptoms mimic flu (exhaustion, weakness, joint pain), making early detection difficult. ALL is further sub-classified into L1, L2, and L3 types under the French-American-British (FAB) system.

Acute Myeloid Leukemia (AML) is the most common acute leukemia type and occurs when bone marrow produces abnormal blasts and immature WBCs, sometimes also producing abnormal RBCs and platelets. AML has eight distinct FAB subtypes. Chronic Lymphocytic Leukemia (CLL) progresses slowly and is most common in adults, with symptoms including weight loss, fever, night sweats, and recurrent infections. Chronic Myeloid Leukemia (CML) also grows slowly but can transform into an acute, fast-growing phase, progressing through chronic, accelerated, and blast stages.

The distinction between these subtypes is not merely academic. Treatment strategies vary dramatically: ALL in children may respond well to chemotherapy, while CML often requires targeted therapy such as tyrosine kinase inhibitors. A misclassification could lead to inappropriate treatment. Traditional diagnosis involves optical microscope examination of peripheral blood smears by trained hematologists, which is subjective, labor-intensive, and bottlenecked by specialist availability. The authors argue that automated CAD systems using deep learning can reduce this dependency on manual expertise.

TL;DR: Leukemia has four subtypes (ALL, AML, CLL, CML) with distinct progression patterns and treatment protocols. Manual microscopic classification is slow and subjective, motivating the need for automated deep learning-based classification systems.

Literature Review

Pages 4-5

Prior Approaches and Their Limitations

The authors survey a substantial body of prior work on CAD-based leukemia detection. Traditional machine learning methods dominate the literature, with classifiers including Random Forest (94.3% accuracy), KNN with Naive Bayes (92.8% on 60 sample images), PCA-based ABC-BPNN (98.72%), and various SVM-based pipelines. Feature extraction in these studies relied on hand-crafted methods such as Discrete Orthogonal Stockwell Transform (DOST), Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA).

ALL-IDB dataset studies: Several studies focused exclusively on ALL detection using the ALL-IDB dataset. K-medoids achieved 98.60%, DOST with PCA and LDA reached 99.66%, Generative Adversarial Optimization (GAO) attained 93.84%, Genetic Algorithm with ANN scored 97.07%, and Chronological Sine Cosine Algorithm (SCA) deep CNN hit 98.70%. Using ALL-IDB with SVM-based methods, results ranged from 89.81% to 99.00% depending on the approach.

ASH image bank studies: On the ASH image bank, Rawat et al. used Genetic Algorithm with multilayer perceptron (MLP) for ALL subtype identification (97.1% accuracy) and Gaussian radial basis kernel for AML subtypes (98.5% accuracy), reaching 99.50% for combined healthy/ALL/AML classification. A CNN-based convnet approach on ASH data achieved only 81.74% for subtype classification.

Critical gap: The authors identify that nearly all prior work focused on binary classification (healthy vs. leukemia) or detection of a single subtype, typically ALL. Very few studies attempted to classify all four subtypes simultaneously. The study by Ahmed et al. (2019) used a simple CNN for subtype classification but achieved only 81.74% accuracy. This paper positions itself as an advancement over that work, using deeper architectures (ResNet-34 and DenseNet-121) with transfer learning to tackle the full four-subtype classification problem.

TL;DR: Prior work achieved 81.74% to 99.66% accuracy but mostly addressed only binary classification or single-subtype detection. The only prior four-subtype classifier (simple CNN) reached just 81.74%, leaving a clear gap for deeper architectures.

Methodology

Pages 5-6

Datasets, Augmentation, and Preprocessing Pipeline

The study uses two publicly available datasets. The ASH (American Society of Hematology) image bank is a freely accessible online repository containing annotated cell images across various hematological conditions, from which the authors selected all available annotated blood cell images with leukemia subtypes. The ALL-IDB dataset provides annotated microscopic blood cell images developed specifically for segmentation, evaluation, and classification, though it contains only healthy and ALL samples. Combining both datasets provides coverage of all four subtypes plus healthy controls.

Sample distribution before augmentation: The raw dataset was extremely small: ALL had 181 images (from ALL-IDB), AML had 55 images, CLL had 38 images, CML had 57 images, and healthy had 187 images (from ALL-IDB). These numbers are far too small for training deep CNN models, making data augmentation essential.

Data augmentation: The authors applied rotation, height shift, width shift, horizontal flip, zoom, and shearing transformations to expand the dataset. After augmentation, the ASH image bank grew to 3,277 samples and ALL-IDB to 2,359 samples. The per-class distribution after augmentation became: ALL with 1,079 images, AML with 1,194 images, CLL with 840 images, CML with 1,243 images, and healthy with 1,280 images. This represents roughly a 7x to 22x expansion depending on the subtype, with CLL seeing the largest relative increase (from 38 to 840 images).

The augmentation strategy addresses a common challenge in medical imaging: the scarcity of labeled data, particularly for rarer conditions. However, it is worth noting that augmentation from only 38 original CLL images means the model is learning from a very narrow base of real morphological variation for that subtype, which could limit generalizability to clinical deployment.

TL;DR: Two datasets (ASH image bank and ALL-IDB) were combined, starting with only 518 total images (38 CLL, 55 AML, 57 CML, 181 ALL, 187 healthy). After augmentation via rotation, flipping, shifting, zoom, and shearing, the dataset expanded to 5,636 total samples across five classes.

Model Architecture

Pages 6-7

ResNet-34 and DenseNet-121: Architecture and Transfer Learning

ResNet-34 is a 34-layer deep residual network that solves the vanishing gradient problem through skip connections. In traditional deep networks, performance degrades after a certain depth because gradients become too small to update early layers effectively. ResNet addresses this by allowing gradients to "skip" layers through identity shortcut connections, expressed mathematically as Y = f(x) + x. ResNet-34 uses four groups of residual blocks with 3x3 convolutions and feature maps of sizes 64, 128, 256, and 512, followed by average pooling and a softmax classifier for the five output classes (healthy, ALL, AML, CLL, CML).

DenseNet-121 takes a different approach to feature propagation. Instead of skip connections between non-adjacent layers, DenseNet connects every layer to all subsequent layers within a dense block. Each layer receives the concatenated feature maps from all preceding layers, making training more efficient and reducing parameter count compared to ResNet. The architecture uses dense blocks composed of Batch Normalization (BN), ReLU activation, and 3x3 convolution operations, with transition layers between blocks. DenseNet-121 achieved best-in-class results on CIFAR-10 and ImageNet benchmarks, suggesting strong general-purpose image classification capability.

Transfer learning approach: Both models were used as pre-trained networks (trained on ImageNet) and fine-tuned on the leukemia dataset using supervised learning. This transfer learning strategy leverages features learned from millions of natural images and adapts them to the medical imaging domain. The authors implemented both models in Python using the fastai deep learning library, and all experiments were conducted on Google Colab. The choice of pre-trained models is particularly important given the small original dataset size, as training a 34-layer or 121-layer network from scratch on just 518 images would be impractical.

TL;DR: ResNet-34 (34 layers, skip connections) and DenseNet-121 (121 layers, dense connections where each layer connects to all subsequent layers) were used with ImageNet pre-training and fine-tuned on leukemia blood smear images using the fastai library on Google Colab.

Results

Pages 7-9

Classification Performance Across All Leukemia Subtypes

ResNet-34 performance: The confusion matrix for ResNet-34 shows near-perfect classification. ALL achieved 100% accuracy with precision, recall, and F1 score all at 1.0. AML reached 99.65% accuracy (precision 1.0, recall 0.99, F1 0.99), with only 1 CML sample and 2 CLL samples misclassified as AML. CLL scored 99.73% accuracy (precision, recall, and F1 all at 0.99), and CML reached 99.73% accuracy (precision 0.99, recall 1.0, F1 0.99). Healthy samples were classified with 100% accuracy across all metrics. The total number of test samples was 1,126 (225 ALL, 228 AML, 175 CLL, 248 CML, 250 healthy), with only 5 total misclassifications.

DenseNet-121 performance: DenseNet-121 achieved even stronger results. ALL, CML, and healthy classes all reached 100% accuracy with perfect precision, recall, and F1 scores of 1.0. AML achieved 99.91% accuracy with precision, recall, and F1 all at 1.0. CLL reached 99.91% accuracy (precision 1.0, recall 0.99, F1 1.0), with only 1 CLL sample misclassified as AML out of 157 test samples. The DenseNet-121 test set contained 1,126 samples (211 ALL, 240 AML, 156 CLL, 275 CML, 245 healthy), with just 1 total misclassification.

Training convergence: Both models showed training and validation loss approaching 0 over approximately 700 batches. DenseNet-121 exhibited faster and more stable convergence, with training and validation losses closer to 0 compared to ResNet-34. This suggests DenseNet-121's dense connectivity pattern, which reuses features across all layers, provides a more efficient learning path for this particular classification task.

Comparison with prior methods: Against existing approaches on the same datasets, DenseNet-121 (99.91%) outperformed GA with SVM (99.50%), ResNet-34 (99.56%), and the baseline CNN (81.74%). A broader comparison table in the paper shows 15+ prior methods, with the best previous result being 99.66% (DOST/PCA/LDA) for ALL-only detection. The proposed models achieve comparable or higher accuracy while addressing the harder problem of full four-subtype classification.

TL;DR: DenseNet-121 achieved 99.91% average accuracy with only 1 misclassification out of 1,126 test samples. ResNet-34 reached 99.56% with 5 misclassifications. Both outperformed the prior best four-subtype classifier (CNN at 81.74%) by a massive margin.

Limitations

Pages 9-10

Dataset Constraints, Augmentation Risks, and Validation Gaps

Extremely small original dataset: The most significant limitation is the tiny number of original images before augmentation. The CLL class had only 38 images, AML had 55, and CML had 57. While data augmentation expanded these to 840, 1,194, and 1,243 samples respectively, augmented images are geometric transformations of the originals and do not introduce new morphological patterns. A model trained on augmented versions of just 38 CLL images may learn the specific visual characteristics of those 38 cases rather than the generalizable features of CLL as a disease.

No external validation: The study evaluates performance only on a held-out test set from the same two datasets. There is no external validation on images from different institutions, scanners, or staining protocols. In clinical practice, blood smear images vary substantially based on preparation technique, microscope type, lighting conditions, and staining quality. The reported 99.91% accuracy may not translate to real-world deployment without multi-center validation.

Dataset mixing concerns: The ALL and healthy samples come from ALL-IDB, while AML, CLL, and CML samples come from the ASH image bank. These two datasets likely differ in imaging conditions, magnification levels, and staining protocols. The model could potentially learn to distinguish between datasets rather than between leukemia subtypes, a well-known confounding factor in medical imaging studies. The paper does not address whether cross-dataset batch effects were evaluated or mitigated.

No comparison with clinical workflow: While the IoMT framework concept is described, no actual IoMT deployment or latency benchmarks are provided. The paper does not report inference time per image, cloud upload/download latency, or comparison of the automated system's turnaround time against standard hematologist review. The COVID-19 remote diagnosis use case, while conceptually appealing, remains purely theoretical without pilot testing.

TL;DR: Key limitations include a very small original dataset (as few as 38 CLL images), no external or multi-center validation, potential cross-dataset confounding between ALL-IDB and ASH image bank, and no real-world IoMT deployment or latency benchmarks.

Future Directions

Pages 10-11

Expanding Datasets, Subtype Granularity, and Clinical Integration

The authors identify several directions for future work. First, they suggest extending the dataset by adding new blood image samples and exploring additional augmentation techniques to improve model robustness. Given that the current dataset starts from only 518 original images across five classes, this is arguably the most critical next step. Techniques such as Generative Adversarial Networks (GANs) for synthetic blood smear image generation could complement traditional augmentation by producing genuinely novel morphological variations rather than geometric transformations of existing images.

Second, the authors propose equipping the IoMT framework with the ability to diagnose subcategories within each leukemia type. For example, ALL has L1, L2, and L3 FAB subtypes, and AML has eight distinct FAB subtypes. Expanding from 5-class to potentially 15+ class classification would require substantially more training data per class and likely more sophisticated architectures, but it would bring the system closer to the granularity needed for clinical treatment planning.

Third, the proposed models could be adapted to detect other blood abnormalities beyond leukemia, such as anemia, thrombocytopenia, or other hematological conditions. This would transform the IoMT platform from a leukemia-specific tool into a broader hematological screening system. To make this clinically viable, future work should include prospective clinical trials comparing the AI system's diagnostic accuracy against board-certified hematologists, multi-center validation across diverse hospital settings, and integration with electronic health records for seamless clinical workflow.

From a technical standpoint, exploring newer architectures such as Vision Transformers (ViT), EfficientNet, or attention-based mechanisms could potentially improve performance while reducing computational overhead for cloud deployment. Additionally, explainability tools like Grad-CAM could help clinicians understand which cellular features the model uses for classification, building trust in the automated system and potentially revealing previously unrecognized morphological markers.

TL;DR: Future work should prioritize larger, multi-center datasets, finer-grained subtype classification (e.g., AML's 8 FAB subtypes), prospective clinical validation against hematologists, and integration of explainability tools like Grad-CAM to support clinical adoption.

IoMT-Based Automated Detection and Classification of Leukemia Using Deep Learning

Original Paper (PDF)

Plain-English Explanations