Artificial Intelligence-Based Management of Adult Chronic Myeloid Leukemia: Where Are We and Where Are We Going?

PMC 2023 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
CML Biology and Why AI Could Transform Its Management

Chronic myeloid leukemia (CML) is a clonal myeloproliferative disease that affects roughly 1 in 10,000 people per year. It is defined by a reciprocal translocation between chromosomes 9 and 22, known as t(9;22), which produces the Philadelphia chromosome (Ph+). This translocation creates the BCR::ABL1 fusion gene encoding a chimeric tyrosine kinase, most commonly the p210 oncoprotein, whose constitutively high kinase activity drives the disease's pathogenesis and progression. CML progresses through three stages: chronic phase (CP), accelerated phase (AP), and blastic phase (BP).

Tyrosine kinase inhibitors (TKIs) targeting the BCR-ABL1 p210 protein have transformed CML from a near-fatal diagnosis to a manageable condition. TKIs induce a major or deep molecular response (DMR) in 80-90% of patients, measured by the reduction of BCR::ABL1 transcript levels using reverse transcription-quantitative PCR (RT-qPCR). However, 10-20% of patients remain refractory or resistant to TKIs. Molecular relapse occurs in approximately 50% of patients who undergo TKI discontinuation programs, highlighting the persistence of leukemic stem cells in the bone marrow niche even when BCR::ABL1 is undetectable by standard assays.

The role for AI: Managing CML involves tracking multiple biochemical, biomolecular, imaging, and clinical parameters simultaneously. The authors argue that AI, and machine learning (ML) in particular, can integrate these multiparametric datasets to improve diagnosis, prognosis, and personalized treatment. This scoping review covers publications found in PubMed from 2003 to 2023 using the search terms "chronic myeloid leukemia" and "artificial intelligence," with no date restriction imposed. The growing publication count on PubMed (from 1,022 AI papers in 2017 to 17,637 in 2022 across all of medicine) signals the rapidly expanding interest in applying these techniques to hematologic malignancies.

TL;DR: CML affects 1 in 10,000 people annually and is driven by the BCR::ABL1 fusion gene. TKIs achieve deep molecular response in 80-90% of patients, but 10-20% are resistant and ~50% relapse after discontinuation. This review surveys AI applications in CML from 2003 to 2023 across diagnosis, prognosis, and treatment optimization.
Pages 3-5
Review Scope and the CML Diagnostic and Prognostic Workflow

The authors conducted a scoping review of PubMed literature using the combined search terms "chronic myeloid leukemia" and "artificial intelligence." The time frame of 2003 to 2023 was not artificially restricted but instead reflects the actual production of published literature in this niche. The review identified 24 distinct studies, which are catalogued in a summary table organized by study, number of cases, training/testing methodology (cross-fold validation, independent testing sets, or pre-trained models), ML techniques used, and clinical aims (diagnostic classifier, prognostic classifier, survival analysis, drug efficacy, etc.).

Current CML diagnostic workflow: Diagnosis begins with a bone marrow (BM) aspirate for morphology evaluation, supported by a core biopsy for fibrosis assessment and blast detection. Cytogenetics on BM cells identifies the Philadelphia chromosome, followed by qualitative RT-PCR on peripheral blood to determine BCR::ABL1 transcript type. If molecular assays detect BCR::ABL1 but cytogenetics cannot identify the Ph chromosome, fluorescence in situ hybridization (FISH) is required. Physical examination of spleen and liver, standard biochemistry, and an electrocardiogram complete the workup.

Prognostic scoring: Four established scoring systems guide CML prognosis. The Sokal score uses age, spleen size, platelet count, and blast count. The Euro score adds basophil and eosinophil counts. The EUTOS score, developed after TKI introduction, considers only basophil counts and spleen size. The EUTOS long-term survival (ELTS) score classifies patients into 3 risk categories based on probability of CML-related death. The authors note that prognostication remains imprecise, particularly after second-generation TKIs and treatment-free remission (TFR) became clinical goals.

Across the 24 reviewed studies, the most commonly used AI techniques were artificial neural networks (ANNs), support vector machines (SVMs), decision trees (DTs, including gradient boosting variants), and random forests (RF). The studies spanned cohort sizes from 28 to 2,575 subjects. Most used single techniques in isolation (16 out of 24 studies), while only 7 studies combined multiple techniques, and just 1 study tested all four major technique families.

TL;DR: This scoping review identified 24 studies from PubMed (2003-2023). CML diagnosis relies on BM aspirate, cytogenetics, RT-PCR, and FISH. Four prognostic scores exist (Sokal, Euro, EUTOS, ELTS), but none are precise enough for the TFR era. Most studies (16/24) used a single AI technique; cohort sizes ranged from 28 to 2,575.
Pages 5-7
Convolutional Neural Networks and Image Analysis for CML Cell Classification

The DiffMaster system (2003): The earliest AI application in CML was reported by Swolin and colleagues 20 years ago. The DiffMaster Octavia system combined an automated microscope, camera, motorized stage, and software using artificial neural networks for pre-classification of blood cells. Agreement between the automated system and manual microscopy was 91%, regardless of whether the sample was normal or abnormal. Sensitivity for blast cell identification was slightly higher with DiffMaster than with manual review. This pioneering work showed that a decision support system, paired with a qualified morphologist, could generate high-quality leukocyte differential count reports.

CNN-based leukemia subtyping: In 2019, Ahmed and colleagues applied a convolutional neural network (CNN) to public repository databases (ALL-IDB and ASH Image Bank) for classifying four leukemia subtypes: ALL, AML, CLL, and CML. The CNN achieved 88.25% accuracy for leukemia vs. healthy classification and 81.74% accuracy for multi-class subtype classification, outperforming other ML algorithms available at the time. A year later, Bibi et al. tested an Internet of Medical Things (IoMT) framework on 1,122 samples using DenseNet-121 and ResNet-34, achieving 99.56% and 99.91% accuracy respectively for the same four-subtype classification task. Their cloud-connected IoMT system enabled real-time coordination for leukemia diagnosis.

Transfer learning approaches: Huang et al. (2020) applied three different CNN frameworks to 104 BM smears, including 18 CML cases. Using transfer learning to refine pre-trained models, they reached 95% prediction accuracy in the CML subset. Zhang and colleagues developed a conditional generative adversarial network (cGAN) model for segmenting megakaryocytes from myeloid cells in bone marrow biopsies. Tested on images from 58 CML cases and 31 healthy subjects, the cGAN outperformed 7 other deep learning models in segmentation performance. Separately, Dese et al. tested an ML-based optical image processing system on 250 blood smears, reporting accuracy of 97.69%, sensitivity of 97.86%, and specificity of 100% on test datasets for leukemia type classification.

TL;DR: AI imaging for CML evolved from 91% agreement (DiffMaster, 2003) to near-perfect accuracy: DenseNet-121 at 99.56% and ResNet-34 at 99.91% for leukemia subtyping (2020). Transfer learning on CNNs achieved 95% accuracy for CML on BM smears, and a cGAN model outperformed 7 rival architectures on 58 CML + 31 control cases.
Pages 7-9
Predicting Diagnosis, Prognosis, and Treatment Response from Non-Imaging Data

Early ANN work: The first non-imaging AI application in CML was reported by Dey et al. in 2011. They applied a commercial artificial neural network (ANN) program to 40 CML cases who had progressed to accelerated or blastic phase, dividing them by time to progression (within 18 months vs. 30 months). Using clinical, hematologic, and morphometric data, the ANN correctly classified patients into early and late progression groups, demonstrating that even off-the-shelf software could predict disease trajectory.

Flow cytometry and SVM: Ni and colleagues (2013) used a support vector machine (SVM) with the LIBLINEAR solver to improve flow cytometry for identifying malignant neutrophils in CML. Training on 18 CML cases and testing on 67 newly diagnosed patients, the model differentiated pathologic from normal neutrophils with both specificity and sensitivity exceeding 95%. Multi-algorithm benchmarking: Shanbehzadeh et al. tested 8 ML algorithms on data from 837 CML patients, including XGBoost, multilayer perceptron, k-nearest neighborhood, and two SVM variants. The SVM with radial basis function (RBF) kernel performed best on selected features with 85.7% accuracy, 85% specificity, and 86% sensitivity. Performance dropped to 69.7% accuracy when all features were used, underscoring the importance of feature selection via minimal redundancy maximal relevance (mRMR).

Early CML detection from blood counts: Hauser et al. retrospectively studied 1,623 patients with BCR-ABL1 testing and at least 6 consecutive years of blood cell counts. Using XGBoost (decision tree-based) and LASSO (logistic regression-based) models, they found that minimum basophil percentage was predictive up to 1 year before the diagnostic BCR-ABL1 test. Model performance was optimal at the time of testing but remained similar at 6 months and 1 year prior. Performance declined for data acquired more than 2 years before diagnosis. Haider et al. similarly developed a radial basis function network (RBFN) on 1,577 hematologic malignancy patients and found that ANN classification was highest for CML: 90.1% in the training set and 97.5% in the testing set.

Treatment response prediction: Banjar et al. built classification and regression tree (CART) models to predict which CML patients would not achieve MR3.0 within 24 months on first-line imatinib. Six models were developed, all achieving positive predictive values of 73-96%, compared to 67% for conventional scores (Sokal, Euro, EUTOS). The ML model's highest specificity was 35%, confirming its strength in identifying the negative group (non-responders). Sasaki et al. from the University of Tokyo and Houston developed LEAP (LEukemia Artificial Intelligence Program), an extreme gradient boosting decision tree method. Trained on 504 patients and tested on 126, considering 101 variables collected at diagnosis, LEAP recommended treatment options (imatinib, dasatinib, nilotinib, or ponatinib) that were associated with better survival probability than non-recommended treatments.

TL;DR: SVM achieved >95% sensitivity/specificity for CML neutrophil detection via flow cytometry (67 patients). SVM-RBF was best among 8 algorithms on 837 patients (85.7% accuracy with feature selection). Blood count data predicted CML up to 1 year before diagnosis (1,623 patients). CART models reached 73-96% positive predictive value for imatinib response, and LEAP (504 training patients) matched treatment to better survival outcomes.
Pages 9-11
Personalized TKI Therapy, Drug Resistance, and Novel Compound Discovery

Mathematical therapy optimization: Padhi and Kothari (2007) were the first to apply mathematical models and bioinformatics to optimize TKI therapy. Their approach combined optimal dynamic inversion with model-following neuro-adaptive control design to create an automatic drug administration scheme. While results on simulated nominal patients were encouraging, the authors acknowledged that real patients would differ, so they perturbed all model parameters and randomly selected numerical values for realistic testing. The technique proved general enough to be applicable to other nonlinear control design problems.

Drug efficacy prediction from gene expression: Borisov et al. (2018) pioneered transferring gene-expression characteristics from cell lines to predict clinical drug efficiency. Using 28 imatinib-treated CML samples (16 responders, 12 non-responders), they tested SVM, binary trees, and random forests (RF) as predictor-classifiers. RF was unsuitable for data transfer in this context, but optimal parameters for SVM and binary trees successfully separated responders from non-responders. In a related study, Yen et al. applied RF and Bayesian ML algorithms combined with survival analysis to data from 58 patients in the ENEST clinical trial, identifying differentially expressed microRNAs as predictive biomarkers of nilotinib response.

Drug resistance and new molecule design: Liu et al. combined single-cell mass spectrometry focused on cell metabolism with RF, ANN, and penalized logistic regression, finding ANN superior for predicting drug resistance on single-cell metabolomic datasets (though limited to cell line data). Melge et al. used ML-supervised models to design a novel dual-action compound incorporating ponatinib's BCR-ABL1 targeting, demonstrating growth inhibition in both TKI-sensitive and TKI-resistant cell lines. The SUSPECT-ABL web tool was created through in silico saturation mutagenesis to predict ABL1 resistance mutations and their effects on ligand affinity, and has been made freely available to the research community.

AI-driven de novo drug design: Naveed et al. combined multiple AI tools, including AlphaFold for 3D protein structure prediction, DeepSite for binding pocket analysis, and ProTox-II for toxicity prediction, to create 3 de novo therapeutic molecules targeting the BCR-ABL1 chimeric protein. The most promising candidate (AIGT) showed a binding affinity of -7.486 kcal/mol when docked with BCR-ABL1 and potential hepatoprotective properties. Additionally, Jie Su et al. used AI to develop new molecules against the T315I resistance mutation, confirming cell cycle arrest, autophagy, apoptosis, and inhibition of BCR-ABL1 phosphorylation in vitro.

TL;DR: AI therapy optimization spans from mathematical TKI dosing models (2007) to de novo drug design using AlphaFold and DeepSite. SVM and binary trees separated imatinib responders from non-responders (28 samples). ANN outperformed RF and logistic regression for resistance prediction. Three AI-designed molecules showed BCR-ABL1 binding affinity of -7.486 kcal/mol, and the freely available SUSPECT-ABL tool predicts resistance mutations in silico.
Pages 11-12
Text Mining for Adverse Events and Ontology-Based Reasoning for CML

TKI adverse event prediction: TKI side effects are frequent enough to drive therapy switches and treatment discontinuation. In 2022, a novel cross-domain text mining approach was applied to 2,575 clinical abstracts about CML-TKI therapy. The method combined a knowledge graph, link prediction, and hub node network analysis to forecast adverse events (AEs), including under-reported and preclinical ones. The system mined over 30 million biomedical papers in PubMed and used bag-of-words cluster analysis to connect AEs to specific TKI medication classes. Using unsupervised rank aggregation, three physiology-based surveillance tiers were created: tier 1 for regular surveillance, tier 2 for rare surveillance, and tier 3 for symptom-based surveillance. This cross-domain NLP and ML approach enabled exploratory analyses that would not be possible with conventional methods.

Traditional medicine targets: Li and colleagues applied network pharmacology to identify candidate targets in Qingdai, a traditional Chinese medicine used for CML. They built three visual networks (compound-target, target-pathway, and target-target) and validated results through molecular docking simulation. Seven components in Qingdai were selected and 32 candidate targets identified, all playing roles in CML progression. While further clinical validation is needed, this approach demonstrates how AI-based tools can predict new molecular interactions for combination therapy strategies.

Formal ontologies and automated reasoning: Beyond machine learning, the review highlights formal ontologies as an emerging AI discipline for CML. Querying BioPortal with "Chronic Myeloid Leukemia" retrieves 31 ontologies, from generalist ones like SNOMED to specialized resources such as the Biological Pathway Taxonomy and the Ontology of Drug Adverse Events. These ontologies, represented in languages like OWL and parsed by semantic reasoners (JENA, PROVA, FLORA-2), enable automated logical inference. For example, cascading deductions from basic patient data could flag dangerous drug interactions. Ontologies can be validated by human experts, supporting trustworthiness and explainability, and can be combined with ML approaches to generate synergies.

TL;DR: Text mining of 2,575 CML-TKI abstracts and 30+ million PubMed papers identified and ranked known and novel adverse events across three surveillance tiers. Network pharmacology found 32 Qingdai targets relevant to CML. BioPortal lists 31 CML-related ontologies that enable automated reasoning and could complement ML-based tools.
Pages 13-15
Eight Critical Barriers to Clinical AI Adoption in CML

Black box problem: Most ML models, particularly ANNs and random forests, cannot transparently show which features drive their predictions. While tools like the GINI mean index can estimate covariate importance, the mathematical meaning remains opaque to most clinicians. Explainable AI (XAI) is growing as a field, but current efforts focus on post-hoc analysis rather than truly explaining how models internally work. The rapid release of new techniques makes achieving true explainability increasingly difficult.

Lack of standards ("Wild West"): The authors identify significant heterogeneity in experimental approaches, from feature selection to the critical validation/testing step. There is no consistent "computational pipeline" spanning data collection to performance assessment. Standardization initiatives like TRIPOD and IBSI exist but are often unknown or ignored. This makes it hard for non-experts to evaluate the quality of reported experiments. A priori vs. a posteriori validation: Unlike classical statistics, where experimental settings are defined before data collection, ML takes a more empirical, a-posteriori approach. This creates tangible risk of overfitting and requires more critical interpretation of results.

Publish-or-perish culture: The authors present a striking statistic: querying PubMed with "Radiomics" in 2019 retrieved 132 reviews and systematic reviews, but only 36 clinical trials and randomized controlled trials across the entire 2015-2019 period. This imbalance suggests that the marriage of publication pressure and AI enthusiasm leads many researchers to write about AI rather than conduct original AI research, risking community over-excitement and subsequent disillusionment (echoing the historical "AI Winter").

Reproducibility: In image analysis especially, results may depend on the specific hardware, reconstruction algorithms, and environmental conditions used for image acquisition. An AI model trained on images from one scanner version may interpret improved signal quality from a newer device as noise, creating an "obsolescence problem" that would require introducing lifecycle concepts for each predictor. Bias: AI systems trained predominantly on data from wealthy Western countries risk suboptimal treatment of underrepresented populations and a widening global health gap. Ethics: As AI performance approaches or exceeds human-level in specific subdomains, the boundary between a decision support system (DSS) and a decision maker (DM) becomes blurred, raising questions about clinician roles, liability, and the need for "v2.0 physicians" who can guide AI evolution.

TL;DR: Key barriers include the ML black box problem, lack of pipeline standards (TRIPOD/IBSI often ignored), overfitting risk from a posteriori validation, reproducibility challenges across devices, geographic data bias favoring Western countries, and a review-to-original-research imbalance (132 reviews vs. 36 trials for "Radiomics" on PubMed in 2019).
Pages 15-16
What Needs to Happen Next for AI in CML

Decision support systems for prognosis: The authors see the development of AI-based decision support systems as a top priority for CML. AI excels at manipulating large numbers of covariates and can overcome some limitations of classical statistics under conditions of strong non-linearity. The ability to work with high-dimensional feature spaces aligns directly with the goals of personalized medicine, because it allows incorporating a very detailed clinical profile when evaluating each patient. However, results from current studies are not easily reproducible, and the gap between classical statistical analysis and ML remains both technically significant and culturally challenging for clinicians who have limited time for AI training.

Large Language Models: The review identifies LLMs as a promising but largely unexplored avenue for capturing non-numeric patient traits. Patients have a human dimension that is difficult to express in numbers but is a significant source of inspiration for clinical decisions. While LLMs could theoretically help encode some of these qualitative aspects, their application in CML medicine is still in its earliest stages.

Integration with electronic health records: The authors envision AI agents being integrated into data treatment ecosystems where patient data is extracted from hospital electronic healthcare records, and models can be continuously updated with new evidence and newly discovered biomarkers. This would support the rapid pace of biomarker discovery, testing, and implementation that characterizes modern oncology. The technological nature of AI agents positions them well for autonomous data loading and high-performing model generation without requiring a causal hypothesis a priori.

Multidisciplinarity and new professional roles: The review concludes that the challenges facing AI in CML, including institutional data quality requirements, robust software infrastructure, data harmonization via common data models or ontologies, and patient privacy, will require creating new professional figures. Greater multidisciplinarity, improved data shareability across multiple centers, and cost reduction are identified as the three pillars for transforming AI in CML from "the land of promises" into clinical reality.

TL;DR: Future priorities include AI-driven decision support for personalized CML prognosis, integration with electronic health records for continuous model updating, exploration of LLMs for capturing non-numeric patient traits, and multicenter data harmonization. New interdisciplinary professional roles will be needed to bridge the gap between AI research and clinical practice.
Citation: Bernardi S, Vallati M, Gatta R.. Open Access, 2024. Available at: PMC10930728. DOI: 10.3390/cancers16050848. License: cc by.