Leukemia is the most common childhood cancer and one of the leading causes of cancer-related death globally. The World Health Organization reported that approximately 19 million patients were diagnosed with cancer in 2020, with 10 million deaths. Leukemia occurs when the bone marrow produces abnormal white blood cells that cannot perform their protective functions, leading to immune compromise and organ damage. The two most prevalent forms are acute lymphocytic leukemia (ALL), which primarily affects children between ages 3 and 7, and acute myelogenous leukemia (AML), which is more common in adults.
Clinical detection challenges: Traditional leukemia diagnosis requires a blood smear analyzed under a microscope by experienced hematologists. This morphological process is time-consuming, subjective, and dependent on specialist availability. The distinction between ALL and AML cells requires careful visual assessment of cell shape, size, and staining patterns. Additional diagnostic procedures, including complete blood count (CBC) tests, spinal taps, bone marrow biopsies, and imaging tests such as CTs and MRIs, may be needed to confirm a diagnosis.
The proposed solution: This paper introduces a deep learning-based method to detect and classify both ALL and AML from microscopic blood smear images. The approach combines image segmentation with an 8-layer convolutional neural network (CNN) called AlexNet, paired with a support vector machine (SVM) classifier. The system not only identifies the type of leukemia but also determines its severity and generates a recommendation message for patients. Testing on the C-NMC_Leukemia dataset from Kaggle (approximately 15,000 images totaling nearly 10 GB) yielded over 98% accuracy.
Blood consists of three main components: red blood cells (which carry oxygen), white blood cells (which defend against infections from viruses and bacteria), and platelets (which support clotting). In leukemia, the bone marrow overproduces dysfunctional white blood cells that crowd out healthy cells and compromise the body's defenses. Hematologists focus specifically on white blood cells during analysis because numerous infections and malignancies are distinguished by abnormalities in these cells.
Acute vs. chronic leukemia: The four main types are ALL, AML, chronic lymphocytic leukemia (CLL), and chronic myelogenous leukemia (CML). Acute forms are the most dangerous because they spread rapidly and produce severe symptoms. In acute leukemia, most cells lose their ability to function, and quick detection is critically important. Chronic leukemia grows more slowly, with normal cells still performing their duties while some remain immature. This creates a longer window for detection but becomes increasingly threatening over time.
Symptoms and diagnostic triggers: Leukemia symptoms include frequent infections, unplanned weight loss, weakness, fever, bone pain, vomiting, and night sweats. Physicians may suspect leukemia from routine blood test results, but confirming the diagnosis typically requires physical exams, CBC tests, and potentially bone marrow biopsies. Since ALL is most common in children ages 3 to 7, and nearly two-thirds of diagnosed cases occur before age 6, early and accurate detection tools are especially valuable in pediatric care.
The paper focuses on ALL and AML specifically because these are the most prevalent acute forms. Treatments for leukemia include chemotherapy, radiation, surgery, and biological therapy, but treatment selection depends heavily on accurate classification of the leukemia type and stage.
The authors surveyed eight prior approaches to contextualize their contribution. Mondal et al. (2021) used a weighted ensemble of deep CNNs to detect ALL from microscopic images, reaching approximately 86% accuracy and 89% F1-score. Oliveira and Dantas (2021) modified standard neural network construction to classify malignant leukocytes, achieving around 93% F1-score across three test configurations with metrics including accuracy, precision, sensitivity, specificity, and F1-score.
AlexNet-based approaches: Shaheen et al. (2021) applied AlexNet specifically to AML detection, reaching 89% accuracy and approximately 88% precision on a dataset of 4,000 blood smears. Their comparison between AlexNet and LeNet showed AlexNet performed slightly better. However, their system could only identify AML, not ALL. Sashank et al. (2021) used AlexNet combined with machine learning models (SVM, KNN, XGBoost, and decision trees) on the ALL-IDB2 dataset containing 760 lymphocyte images (570 training, 190 testing), claiming 100% classification accuracy, though on a much smaller dataset.
Broader detection systems: Claro et al. (2020) presented a CNN architecture to differentiate ALL, AML, and healthy blood slides across 16 datasets containing 2,415 images, achieving 97% accuracy and precision. Dasariraju et al. (2020) used a random forest algorithm for AML detection via immature leukocyte analysis, obtaining approximately 93% detection accuracy and 94% classification accuracy but only 65% precision. Loey et al. (2020) used transfer learning with pretrained AlexNet on approximately 3,000 images, claiming 100% accuracy with their fine-tuned approach.
Recent advances: Hamza et al. (2022) used fuzzy c-means segmentation and competitive swarm optimization with NetB0 for ALL detection, achieving 96% accuracy, 95.7% precision, and 96.5% recall. Abir et al. (2022) used transfer learning to detect ALL with approximately 98.3% accuracy across four model types. A key limitation across most of these studies is that they detected only one type of leukemia (typically ALL or AML alone), while the method proposed in this paper handles both.
The proposed algorithm follows a multi-stage pipeline. First, a blood smear image (227 x 227 pixels) is read and displayed. In preprocessing, the foreground is separated from the background, and the image is converted to grayscale. RGB values are extracted from the original image, and a mapping between foreground values and RGB values is performed during image segmentation to increase contrast. This segmentation step is critical for isolating individual blood cells from the background noise of the smear.
Cell detection and isolation: The algorithm computes the radius of every blood cell in the grayscale image, identifies dark cells, and estimates their radii. Red lines are drawn around dark cells, and green rectangles are placed around detected white blood cells. The system counts both white and red blood cells. A threshold is calculated for each detected infected cell using Otsu's method, which minimizes the variance between white and black regions in the grayscale image. The grayscale image is then converted to a binary image using this threshold to locate potential areas of infected cells.
Feature extraction with AlexNet: The 8-layer AlexNet CNN extracts features from the isolated infected cells. The extracted features include mean squared error (MSE), histogram of oriented gradients (HOG), and local binary pattern (LBP), along with additional CNN-derived features. AlexNet was chosen because it provides a balance between computational efficiency and feature extraction quality for medical image classification tasks. The algorithm continuously learns from its results, refining its feature extraction with each iteration.
Classification with SVM: After feature extraction, a support vector machine (SVM) classifier determines whether each infected blood cell is ALL or AML. The algorithm then calculates the percentage of leukemia present, determines the severity status, and displays a message with recommendations to patients. The Harris-Stephens corner detection method is used to determine the precise location and boundaries of infected cells before features are passed to AlexNet.
The study used the C-NMC_Leukemia dataset from Kaggle, which consists of approximately 15,000 images totaling nearly 10 GB. This dataset contains microscopic blood smear images labeled as either healthy or infected with ALL or AML. The dataset was split into three portions: 70% (approximately 10,500 images) for training, 15% (approximately 2,250 images) for validation, and the remaining 15% (approximately 2,500 images) for testing.
Training process: During the training stage, blood samples are categorized as either healthy or infected. For every input or sequence of inputs, the algorithm extracts features from both healthy and infected samples. These features are deeply analyzed so the model can accurately determine and classify ALL and AML. The training leverages MATLAB as both the programming platform and simulation tool. The model's self-learning capability means it regularly updates its parameters based on obtained results.
Visual outputs: The paper demonstrates the algorithm's intermediate outputs through several figure examples. White blood cells are enclosed by green rectangles and red blood cells by red rectangles in the processed images. The upper-left output shows the original image, the upper-right shows detected white cells, and the bottom-left shows detected red blood cells. These intermediate visualizations are used in subsequent deep learning and classification phases, providing transparency into the detection pipeline.
The proposed algorithm was evaluated on 2,500 testing images and achieved strong results across all performance metrics. The model produced 1,981 true positives (correctly identified leukemia samples), only 14 false positives (incorrectly flagged as leukemia), 494 true negatives (correctly identified healthy samples), and just 11 false negatives (missed leukemia cases). This translates to 99.30% precision, 99.45% recall, and 99% overall accuracy.
Confusion matrix results: The confusion matrix on the testing dataset reveals strong per-class performance. For ALL classification, 953 samples (98.55%) were correctly identified while 14 (1.45%) were misclassified. For AML classification, 1,002 samples (98.92%) were correctly identified while 11 (1.08%) were misclassified. The slightly higher accuracy for AML detection may reflect differences in the morphological distinctiveness of AML cells compared to ALL cells in the training data.
AML detection example: The paper demonstrates AML detection through a step-by-step visual pipeline. White blood cells of the AML type are first detected and surrounded by green rectangles. The original image is then processed to remove noise, and the final output highlights detected AML cancerous cells in white color. A similar demonstration is provided for ALL detection, where the system identifies infected white blood cells and generates a status message for the patient.
Clinical messaging: Beyond classification, the algorithm determines the percentage of leukemia present in the sample and assigns a severity status. It then generates a message recommending next steps for patients. This feature adds practical clinical value by translating the raw detection output into actionable information that can support initial patient communication.
The authors conducted a comprehensive comparison against seven published methods, evaluating feature extraction approach, classifier type, accuracy, precision, and recall. The proposed CNN (AlexNet) + SVM method achieved 99.30% accuracy, 99.45% precision, and 99% recall, outperforming or matching all competitors on at least two of the three metrics.
Head-to-head comparisons: Mondal et al. (2021) used CNN-based feature extraction with a CNN classifier, reaching only 86.2% accuracy, 88.7% precision, and 88.8% recall. Oliveira and Dantas (2021) achieved 91.49% accuracy with a CNN approach. Shaheen et al. (2021), who also used AlexNet, reached 98.58% accuracy but only 87.4% precision and 88.9% recall, showing that the addition of SVM as a classifier (rather than using AlexNet end-to-end) significantly improved precision. Sashank et al. (2021) used SVM, KNN, and decision trees, achieving 95.05% accuracy and 95.25% precision.
Closest competitors: Claro et al. (2020) reached 97.18% accuracy and 97.23% precision using a pure CNN approach across 16 datasets. Dasariraju et al. (2020) used random forest with 92.99% accuracy but only 91.23% precision. The closest competitor was Loey et al. (2020), who used CNN with a fully connected (FC) classifier to achieve 99.04% accuracy and 99.64% precision, slightly exceeding the proposed method's precision but falling short on accuracy (99.04% vs. 99.30%) and recall (98.44% vs. 99%).
The key differentiator for the proposed method is the combination of AlexNet for feature extraction and SVM for classification. While pure CNN classifiers can achieve high accuracy, the SVM classifier provides more robust decision boundaries for separating ALL from AML features. Importantly, the proposed approach also detects both ALL and AML, whereas many competitors could only detect one type.
Single-dataset evaluation: The entire study relies on one dataset (C-NMC_Leukemia from Kaggle). While this dataset is substantial at 15,000 images and nearly 10 GB, testing on a single source introduces the risk of overfitting to dataset-specific characteristics. Blood smear images from different hospitals, staining protocols, and microscope manufacturers can vary significantly in color balance, resolution, and cell appearance. Without cross-dataset validation, the 99% accuracy figure may not generalize to clinical settings that use different imaging equipment or sample preparation techniques.
Limited leukemia subtypes: The algorithm only addresses ALL and AML, the two most common acute forms. Chronic lymphocytic leukemia (CLL) and chronic myelogenous leukemia (CML) are not covered, nor are rarer subtypes. A comprehensive clinical tool would need to classify across all four major types and potentially identify subtypes within each category. The authors acknowledge this scope limitation but do not provide a roadmap for expansion.
Retrospective design and lack of clinical validation: The study is entirely computational, with no prospective clinical trial or real-world deployment data. The algorithm was tested on pre-labeled images in a controlled MATLAB environment, which does not account for the variability and noise present in live clinical workflows. Integration with laboratory information systems, turnaround time in real-world settings, and agreement with expert hematologist diagnoses on prospective cases remain untested.
Methodological gaps: The paper does not report key metrics such as F1-score, AUC-ROC curves, or confidence intervals, which are standard in modern machine learning evaluation. The claim of 99% accuracy, while promising, lacks the statistical rigor needed to establish clinical utility. Additionally, the computational requirements (processing time per image, hardware specifications) are not discussed, making it difficult to assess feasibility for real-time clinical deployment. The use of MATLAB as the sole development platform may also limit accessibility and scalability compared to frameworks like TensorFlow or PyTorch.