An AI-based automatic leukemia classification system utilizing dimensional Archimedes optimization algorithm

PMC 2024 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
What Is Leukemia and Why Does Automated Classification Matter?

Leukemia is a type of blood cancer characterized by the abnormal and uncontrolled proliferation of white blood cells (WBCs). It originates in cells that would normally develop into various blood cell types and primarily affects the blood and bone marrow. The disease diminishes the bone marrow's capacity to produce platelets and red blood cells, and the resulting abnormal cells can damage organs such as the kidneys, liver, and spleen. Leukemia is classified into four primary categories: Acute Lymphocytic Leukemia (ALL), the most common form in those aged 0 to 39; Acute Myelogenous Leukemia (AML), the prevailing acute type in adults over 65; Chronic Lymphocytic Leukemia (CLL); and Chronic Myelogenous Leukemia (CML).

Diagnosing leukemia early is critical for patient survival, but challenging because symptoms such as lymph node enlargement, pallor, fever, and weight loss overlap with many other conditions. The gold standard for diagnosis involves collecting and analyzing bone marrow samples, and the most common initial screen is microscopic examination of peripheral blood smears (PBS). However, this manual process is expensive, time-consuming, and heavily dependent on the expertise of specialist hematologists. Variability in human interpretation introduces inconsistency and diagnostic error.

Machine learning (ML) methods offer a way to automate the diagnosis and classification of leukemia from blood smear images, making it faster, less expensive, and more reproducible. This paper introduces a new AI-based system called the Leukemia Classification System (LCS) that combines image preprocessing, segmentation, feature extraction, a novel feature selection algorithm, and ensemble classification to automatically distinguish between benign cells and three malignant ALL subtypes from PBS images.

TL;DR: Leukemia is classified into four main types (ALL, AML, CLL, CML), and early detection is vital but difficult due to symptom overlap with other diseases. Manual microscopic diagnosis of blood smears is costly and specialist-dependent. This paper proposes LCS, an AI system that automates leukemia classification from peripheral blood smear images.
Pages 3-6
The Archimedes Optimization Algorithm and Dimensional Learning Strategy

The core innovation of this paper lies in two algorithmic foundations that power the feature selection stage. The first is the Archimedes Optimization Algorithm (AOA), a physics-inspired metaheuristic developed by Fatma Hashim in 2020. AOA encodes each candidate solution as an "object" with three attributes: volume (V), density (D), and acceleration (acc). A population of N objects is initialized randomly in D dimensions. In each iteration, the algorithm evaluates every object using a fitness function, identifies the best-performing object, and then updates the volume and density of all objects based on their relationship to the best object. A Transfer Factor (TF) parameter shifts the search from exploration (broad search) to exploitation (local refinement) as iterations progress.

Limitations of standard AOA: While AOA offers the benefits of few parameters and straightforward implementation, it suffers from limited search diversity and a tendency to become trapped in local optima. This can cause a "two steps forward, one step back" phenomenon where the algorithm oscillates rather than converging smoothly toward the best solution.

To address these shortcomings, the authors integrate a Dimensional Learning Strategy (DLS), originally proposed by Xu et al. for Particle Swarm Optimization. DLS creates a learning exemplar for each particle by allowing every dimension of a particle's personal best position to learn from the corresponding dimension of the population's global best. Rather than each particle learning only from its own history, DLS transfers valuable positional information across the population. This preserves important information in each dimension while incorporating collective knowledge, improving both convergence precision and speed.

The combination of AOA and DLS produces the Dimensional Archimedes Optimization Algorithm (DAOA). In DAOA, object positions are updated using modified equations that incorporate the DLS exemplar position rather than the raw personal best. The resulting positions are converted to binary vectors (using a sigmoid transfer function) so each bit indicates whether a specific feature is selected (1) or discarded (0). The time complexity of DAOA is O(T(D + N) + (C x N)), where T is the number of iterations, D is the problem dimension, N is the population size, and C is the cost of the objective function.

TL;DR: DAOA merges the Archimedes Optimization Algorithm (a physics-based metaheuristic using volume, density, and acceleration) with Dimensional Learning Strategy (which transfers global-best information dimension-by-dimension to each particle). This fixes AOA's tendency to get stuck in local optima and improves convergence speed and precision for feature selection.
Pages 7-8
Prior Approaches to Leukemia Classification and Their Gaps

The paper reviews several recent approaches to automated leukemia classification from microscopic images. Inception v3 + XGBoost (Ramaneswaran et al.): This hybrid model uses Inception v3 for feature extraction and XGBoost for classification, achieving a weighted F1 score of 0.986. However, it was limited by a small dataset. CNN-ECA (Ullah et al.): A VGG16 model with Efficient Channel Attention modules reached 91.1% precision but suffered from a high false positive rate. MobilenetV2 + ResNet18 (Das and Meher): A probability-weighted fusion of these two architectures achieved 99.39% accuracy on ALLIDB1 and 97.18% on ALLIDB2, but suffered from high time complexity.

ALNett (Jawahar et al.): A custom CNN using depth-wise convolution with varying dilation rates achieved 91.13% classification accuracy and an F1 score of 0.96, outperforming VGG16, ResNet-50, GoogleNet, and AlexNet, though the system was noted for high complexity. DeepLeukNet (Saeed et al.): A CNN-based adaptation model with data augmentation and qualitative analysis of activation layers, again limited by time complexity. DNN (Mallick et al.): A five-layer deep neural network for gene expression data from 72 leukemia patients achieved 98.2% precision, 96.59% sensitivity, and 97.9% specificity. IFM (Alzahrani et al.): A UNET-based framework for segmentation and classification reached 97.82% average accuracy and 98.64% F-score across four datasets.

The authors identify four key gaps in the existing literature. First, dataset limitations: most studies rely on small, publicly available datasets (ALLIDB1, ALLIDB2) that lack diversity in annotated images. Second, model interpretability: deep learning models act as "black boxes," which undermines clinical trust. Third, generalization across populations: models trained on narrow demographics may fail in diverse real-world clinical settings. Fourth, computational efficiency: many deep models demand significant compute resources, making deployment in resource-constrained clinics difficult.

TL;DR: Prior leukemia classifiers range from 91% to 99% accuracy but face recurring issues: small datasets, high computational cost, black-box interpretability, and poor cross-population generalization. The best prior models include Inception v3 + XGBoost (F1 = 0.986) and MobilenetV2 + ResNet18 (99.39% on ALLIDB1).
Pages 9-11
From Raw Blood Smear Images to Discriminative Features

The proposed LCS pipeline begins with an Image Processing Stage (IPS) that prepares raw blood smear images for analysis. Each RGB image is first resized to 500 x 500 pixels for computational efficiency. Image intensity is then adjusted to a normalized range of 0.5 to 0.9 for enhancement. A Gaussian filter is applied to reduce instrument noise from the camera and correct pixel values, producing uniform lighting across the image. Finally, median and Wiener filters are applied sequentially for smoothing and further noise removal. These steps are essential because the original images often contain noise from irregularities introduced during the blood staining process.

The Image Segmentation Stage (ISS) isolates white blood cells from red blood cells and background particles. The K-means clustering algorithm is used for initial segmentation because it runs faster than alternatives like Otsu thresholding and fuzzy C-means. For cases where nucleated WBCs (lymphocytes and monocytes) overlap, marker-controlled watershed segmentation with erosion and dilation is applied. A border cleaning method removes incomplete cell structures at image edges, since partial cells would introduce large errors in feature extraction.

In the Feature Extraction Stage (FES), two categories of features are computed from segmented images. Texture features: 22 Gray Level Co-occurrence Matrix (GLCM) features are extracted at four angles (0, 45, 90, and 135 degrees) with a distance of d = 1. These include contrast, homogeneity, correlation, entropy, energy, cluster shade, cluster prominence, autocorrelation, maximum probability, and sum of squares (variance). Morphological features: These capture cell geometry as used by hematologists, including nucleus area, cytoplasm area, nucleus and cell perimeter, number of distinct nucleus parts, mean and variability of nucleus and cytoplasm boundaries, cytoplasm-to-nucleus area ratio, and roundness (computed as RC = Pe^2 / 4*pi*Ar). In total, 34 features are extracted per image.

TL;DR: Images are resized to 500x500, enhanced, and filtered (Gaussian, median, Wiener). K-means and watershed segmentation isolate WBCs. A total of 34 features are extracted per image: 22 GLCM texture features (at 4 angles) and 12 morphological features (area, perimeter, roundness, cytoplasm-to-nucleus ratio, etc.).
Pages 11-13
How DAOA Selects the Most Informative Features

With 34 raw features extracted from each image, the Feature Selection Stage (FSS) identifies the subset that maximizes classification performance while removing redundant or noisy features. Redundant features complicate training, increase computation time, and can degrade classifier accuracy through overfitting. The authors frame feature selection as a discrete (binary) optimization problem: each candidate solution is a binary vector of length 34, where 1 means a feature is selected and 0 means it is discarded.

The proposed DAOA uses a Naive Bayes (NB) classifier as the base evaluator within its fitness function. The fitness of each candidate solution is calculated as: F(Pi) = x * (1 - accuracy(Pi)) + y * (M/N), where accuracy(Pi) is the NB classification accuracy for that feature subset, M is the number of selected features, N is the total feature count (34), and x = 0.99 and y = 0.01 are weighting constants. This formulation rewards both high accuracy and small feature subsets, with much stronger emphasis on accuracy (weight 0.99) than parsimony (weight 0.01).

The DAOA process initializes a population of candidate solutions, evaluates each with the fitness function, and identifies the global best. DLS then constructs a learning exemplar for each object by blending its personal best with the global best on a dimension-by-dimension basis. The updated continuous positions are converted to binary using a sigmoid transfer function: Sig(Pij) = 1 / (1 + e^(-Pij)), and a random threshold determines whether each feature is selected. This process repeats for a maximum of 30 iterations.

The final selected features from DAOA include 7 GLCM texture features (contrast, homogeneity, correlation, entropy, energy, cluster shade, cluster prominence) and 7 morphological features (area, eccentricity, elongation, solidity, circularity, perimeter, and roundness), reducing the original 34 features to 14 informative ones.

TL;DR: DAOA reduces the feature set from 34 to 14 (7 GLCM texture + 7 morphological) using a fitness function weighted 0.99 for NB classification accuracy and 0.01 for feature parsimony. The binary optimization runs for 30 iterations with sigmoid-based conversion, and DLS ensures convergence does not stall in local optima.
Pages 14-16
Ensemble Classifier and the ALL Dataset

Once DAOA selects the most informative features, the Classification Stage (CS) uses an Ensemble Classifier (EC) that combines four machine learning algorithms: Random Forest (RF), Decision Tree (DT), Gradient Boosting (GB), and Adaptive Boosting (AdaBoost). Each classifier independently predicts the class for a given sample, and the final decision is made by maximum voting, where the class predicted by the majority of classifiers wins. The rationale for ensemble methods is that combining multiple weak learners reduces the misclassification rate compared to any single classifier.

The system was evaluated on a publicly available ALL dataset from Taleqani Hospital in Tehran, consisting of 3,256 peripheral blood smear (PBS) images from 89 patients suspected of having ALL. Images were captured with a Zeiss camera at 100x magnification and stored as JPG files. Cell subtypes were verified using flow cytometry. The dataset is divided into four classes: benign hematogones (504 samples), early pre-B ALL (985 samples), pre-B ALL (963 samples), and pro-B ALL (804 samples). The data was split 70/30, yielding 2,279 training images and 977 testing images.

The authors assessed class imbalance using a distribution ratio (DR). The largest imbalance was between the benign class (504 samples) and early pre-B ALL (985 samples), with DR = 1.95. Other ratios ranged from 1.02 to 1.9. Since no DR was deemed significantly imbalanced, the authors did not apply oversampling techniques such as SMOTE but noted that class weights could be applied using CW = N / (4 * Nn) if needed. They also discussed strategies for handling imbalance in future work, including generative adversarial networks (GANs) and data augmentation.

TL;DR: The ensemble classifier combines RF, DT, GB, and AdaBoost via maximum voting. The dataset contains 3,256 PBS images from 89 patients across 4 classes (504 benign, 985 early pre-B ALL, 963 pre-B ALL, 804 pro-B ALL). Training/testing split was 70/30 (2,279/977 images). Maximum class imbalance ratio was 1.95.
Pages 17-20
Performance Results and Component-Level Analysis

DAOA vs. other feature selection methods: When compared against eight established metaheuristic feature selection algorithms using NB as the base classifier, DAOA outperformed all competitors. Specifically, DAOA achieved 97.8% accuracy, 87% precision, 80.1% recall/sensitivity, and 81.2% F-measure. The comparison set included Harris Hawks Optimization (HHO), Grey Wolf Optimizer (GWO), Whale Optimizer Algorithm (WOA), Salp Swarm Algorithm (SSA), Sine Cosine Algorithm (SCA), Ant Colony Optimization (ACO), Arithmetic Optimization Algorithm, and the standard Archimedes Optimization Algorithm.

Full LCS performance: The complete system (DAOA for feature selection + ensemble classifier) was compared against seven recent leukemia classification methods from the literature. The proposed LCS achieved 99.2% accuracy, 90.5% precision, 89.9% specificity, 89% sensitivity, 89.7% F-measure, 98% Dice Similarity Coefficient (DSC), and 96.5% Jaccard Index (JI). The error rate was minimized to just 0.8%. These results surpassed all competing methods including Inception v3 + XGBoost, CNN-ECA, ALNett, DeepLeukNet, DNN, and IFM.

Ablation studies systematically evaluated the contribution of each component. Using all 34 features without feature selection (Model 1) yielded 96% accuracy. Adding PSO-based feature selection (Model 2) improved it to 96.4%. Using only texture features with DAOA (Model 3) gave 94% accuracy, while morphological features alone with DAOA (Model 4) gave 93.6%. Replacing the ensemble classifier with single classifiers yielded 96.5% (RF), 97% (DT), and 96.1% (GB). These results confirm that both DAOA feature selection and ensemble classification are essential: DAOA contributes a 3.2% accuracy gain over no feature selection, and the ensemble adds 2.2% over the best single classifier (DT).

The time complexity analysis showed that each pipeline stage has manageable computational cost. IPS runs at O(H x W) for small filter kernels. ISS with K-means and watershed runs at O(N x K). The DAOA feature selection stage runs at O(T(D + N) + (C x N)). The ensemble training complexity is O(T x N x F x log(N)), and testing is O(T x log(N) + M), where T is the number of trees, N is training samples, F is features, and M is test samples.

TL;DR: The full LCS achieves 99.2% accuracy, 90.5% precision, 89% sensitivity, 98% DSC, and 96.5% JI with only 0.8% error. DAOA alone (with NB) reaches 97.8% accuracy, outperforming 8 other metaheuristic feature selectors. Ablation studies show DAOA adds 3.2% accuracy over no feature selection, and the ensemble adds 2.2% over the best single classifier.
Pages 20-21
Constraints of the Current System and Paths Forward

Data dependency: The proposed LCS relies on the availability of high-quality, labeled training data. The dataset used (3,256 images from a single hospital in Tehran) may not capture the full variability of leukemia presentations across different demographics, ethnicities, and geographic regions. The effectiveness of the model may decrease when applied to rare blood cell types or populations not represented in the training data. Furthermore, while the class imbalance in this dataset was moderate (maximum DR of 1.95), more severely imbalanced datasets could significantly impact model reliability.

Scope limitations: The system was evaluated only on ALL subtypes (early pre-B, pre-B, pro-B) and benign hematogones. It does not address AML, CLL, or CML, which limits its clinical applicability as a general leukemia classifier. The evaluation was performed on a single dataset without external validation on independent cohorts from different institutions, staining protocols, or imaging equipment. Additionally, the paper does not report confidence intervals or statistical significance tests for the reported metrics, making it difficult to assess the robustness of the performance differences.

Future directions: The authors propose several avenues for improvement. First, incorporating multi-modal data such as blood cell morphology, immunophenotyping data, and clinical metadata (patient history) could improve diagnostic accuracy and provide more comprehensive insights. Second, addressing dataset imbalance through generative adversarial networks (GANs) and advanced data augmentation strategies would help the model handle skewed class distributions. Third, developing multi-modal data fusion techniques could enable more robust blood cell classification across different leukemia types.

Broader challenges: The paper's related work section identifies interpretability as a major barrier to clinical adoption of deep learning models. While the proposed LCS uses traditional ML classifiers (which are more interpretable than deep neural networks), the feature selection process via DAOA is still a black-box optimization. Real-time deployment in resource-constrained settings such as rural hospitals or smaller clinics remains a challenge, and future work on lightweight architectures, model pruning, and compression could improve practical applicability.

TL;DR: Key limitations include single-center data (3,256 images from one hospital), coverage of only ALL subtypes (not AML/CLL/CML), no external validation, and no confidence intervals. Future work should incorporate multi-modal data (immunophenotyping, clinical metadata), use GANs for data augmentation, and optimize for deployment in resource-constrained clinical settings.
Citation: Shaban WM.. Open Access, 2025. Available at: PMC12084586. DOI: 10.1038/s41598-025-98400-6. License: cc by.