The Use of Artificial Intelligence in Predicting Chemotherapy-Induced Toxicities in Metastatic Colorectal Cancer: A Data-Driven Approach for Personalized Oncology

PMC (Open Access) 2023 AI 7 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Predicting Chemotherapy Toxicity Matters in Colorectal Cancer

The clinical problem: Colorectal cancer (CRC) is one of the most common cancers worldwide, and chemotherapy remains a cornerstone of treatment across multiple stages. Over 21% of stage II patients, 60% of stage III patients, and nearly all stage IV patients with adequate performance status receive chemotherapy. With the introduction of anti-EGFR and anti-VEGF monoclonal antibodies, median overall survival for metastatic CRC has exceeded 30 months. However, these treatments carry significant toxicity profiles, including rash, diarrhea, hypertension, hypothyroidism, proteinuria, and hepatotoxicity, all of which can disrupt treatment schedules and compromise outcomes.

The rationale for AI: Artificial intelligence has already shown promise across CRC care, from analyzing histological images for microenvironment biomarkers to predicting liver metastases in early-stage (T1) disease using machine learning. The FDA has proposed a regulatory framework for deploying AI-based technology as medical devices, signaling institutional readiness. The logical next step is to apply these same data-driven approaches to predict which patients are most likely to experience chemotherapy-related side effects before treatment begins.

Prior prediction tools: Existing tools like "ColonPrediscores" (designed for elderly patients) incorporate factors such as polychemotherapy, hypoalbuminemia, C-reactive protein, ECOG performance status, metastatic disease burden, age, alkaline phosphatase, and sex. These were considered independent predictors and informed the development of the current model. The study builds on this foundation by employing machine learning to capture more complex, nonlinear relationships among patient features.

Study hypothesis: The authors hypothesized that patients' biological characteristics, genetic backgrounds, and clinical profiles play a significant role in individual susceptibility to adverse chemotherapy outcomes. Their goal was to integrate this understanding into a predictive framework that could help clinicians anticipate toxicity after the very first cycle of treatment, enabling proactive management rather than reactive intervention.

TL;DR: Chemotherapy toxicity is a major concern for the large proportion of CRC patients who receive systemic treatment. This study aimed to build a machine learning model using 95 patient features to predict which individuals are most likely to experience adverse effects after their first chemotherapy cycle.
Pages 3-4
Patient Population, Data Processing, and the Random Forest Approach

Cohort and setting: The study enrolled 74 colorectal cancer patients treated consecutively between January 2018 and December 2019 at the Regional Institute of Oncology in Iasi, Romania. Inclusion required confirmed colon or rectal adenocarcinoma (TNM stages II through IV), ECOG performance status of 0 to 2, at least one chemotherapy cycle administered, and available tissue for genetic testing. Patients with prior chemotherapy, active infections (hepatitis B, C, HIV, tuberculosis), or inability to monitor adverse events were excluded.

Feature engineering: The dataset contained 95 initial features representing each patient's health state before their first chemotherapy round. These were split into categorical and numerical types, with categorical features transformed through one-hot encoding, resulting in 140 total features. Toxicity was defined broadly: any patient experiencing at least one grade 1 or higher adverse event (per NCI-CTCAE v5.0) was classified as having toxicity. Of the 74 patients, 57 (77.0%) experienced at least one form of toxicity.

Why Random Forest: Given the high dimensionality (140 features for only 74 patients), the authors chose Random Forest (RF) as the modeling approach. RF builds an ensemble of 100 decision trees, each analyzing a random subset of features (up to the square root of total features). This strategy mitigates the "curse of dimensionality" and reduces overfitting. The class weight was set to "balanced" to account for the imbalance between toxic (77%) and non-toxic (23%) outcomes, and the maximum tree depth was capped at 2 to maintain interpretability.

Treatment regimens: The most commonly used first-line regimen was CAPOX (capecitabine plus oxaliplatin, 43.3%), followed by FOLFOX (30.8%) and capecitabine monotherapy (12.5%). Biological agents were administered based on mutational status: bevacizumab (52.9%), cetuximab (15.4%), and panitumumab (5.8%). Immunotherapy was not a standard first-line option at the time of the study, so MSI/MMR status testing was not mandatory but was included when available.

TL;DR: A cohort of 74 CRC patients with 95 pre-treatment features (expanded to 140 after encoding) was analyzed using Random Forest models. Toxicity was present in 77% of patients. The RF approach was chosen specifically to handle the high feature-to-patient ratio while maintaining interpretable results.
Pages 5-6
What Types of Toxicity Were Observed and How the Body Responded

Hematologic changes: Following chemotherapy, significant reductions were observed in white blood cells, neutrophil count, the neutrophil-to-lymphocyte ratio, and platelets. Despite these measurable declines, overt hematologic toxicities were relatively uncommon in this cohort: anemia occurred in only 2.70% of patients, neutropenia in 4.05%, and no cases of clinically significant platelet decrease were documented. This suggests that while blood counts shifted in response to treatment, most patients did not cross the threshold into severe hematologic toxicity.

Non-hematologic toxicities dominated: The most frequently recorded adverse events were liver toxicity (43.20%), fatigue (33.70%), and neurologic toxicity (24.30%). Digestive toxicity affected 13.50% of patients, while insomnia (4.05%), allergic reactions (2.70%), and other miscellaneous side effects (6.80%) were less common. Cardiac toxicity was not observed in any patient. These findings align with the known side-effect profiles of oxaliplatin-based regimens (neurotoxicity) and fluoropyrimidine-based regimens (liver and gastrointestinal effects).

Laboratory shifts after treatment: Concurrent with these clinical toxicities, there was a significant elevation in gamma-glutamyl transferase (GGT), creatinine, and lactate dehydrogenase (LDH) levels after chemotherapy. These biochemical changes reflect hepatic stress and altered metabolic function, underscoring the systemic impact of cytotoxic treatment even in patients who may not report overt symptoms. The distribution of these parameter changes was visualized by the authors across hematology and chemistry groups before and after chemotherapy.

Clinical implications: The predominance of liver toxicity and fatigue, combined with the significant post-treatment elevations in liver enzymes and LDH, highlights the importance of baseline hepatic function assessment before initiating chemotherapy. These patterns also informed which features the machine learning model would identify as most predictive, as many of the top-ranked numerical variables (discussed in subsequent cards) directly correspond to these observed toxicity patterns.

TL;DR: Liver toxicity (43.2%), fatigue (33.7%), and neurologic toxicity (24.3%) were the most common adverse events. Post-chemotherapy blood work showed significant rises in GGT, creatinine, and LDH. Hematologic toxicities were relatively rare despite measurable drops in blood counts.
Page 7
Validation Results: How Well the Random Forest Predicted Toxicity

Cross-validation approach: The authors ran 10 independent experiments, each time shuffling the dataset and splitting it into training and test sets using k-fold cross-validation. This repeated shuffling approach helps ensure that the model's performance is not an artifact of a single lucky data split. The confusion matrix revealed the expected class imbalance (77% toxic vs. 23% non-toxic), but the model demonstrated particularly strong prediction for the "no toxicity" class, which is clinically valuable for identifying patients who may safely tolerate standard-dose chemotherapy.

ROC curve performance: The receiver operating characteristic (ROC) curve showed an area under the curve (AUC) ranging from 0.91 to 0.97 on the training set across the 10 experiments. An AUC in this range indicates that the model learned meaningful relationships between patient features and toxicity outcomes, rather than simply memorizing the data. The authors interpreted this as justification for the model's potential applicability to real-world data prediction, though they acknowledged the need for external validation on an independent cohort.

Comparison to other models: The RF model's AUC of 0.91 to 0.97 compares favorably to other ML approaches in the chemotherapy toxicity prediction literature. For example, Li Chao et al. used XGBoost to predict fluoropyrimidine-induced cardiotoxicity in CRC and achieved a score of only 0.607. Another study using logistic regression, decision trees, and neural networks on EHR data to predict adverse drug reactions from FOLFOX and FOLFIRI found logistic regression to be most effective. The authors note that algorithm effectiveness can vary depending on the types of toxicities predicted, dataset characteristics, and specific treatment regimens.

Balanced class weighting: Setting the class weight to "balanced" was a deliberate design choice to prevent the model from simply predicting "toxicity" for every patient (which would yield 77% accuracy but no clinical utility). By penalizing misclassification of the minority class more heavily, the model was forced to identify genuine distinguishing features for both outcomes. The maximum tree depth of 2 further constrained complexity, favoring interpretability and reducing the risk of overfitting on this relatively small dataset.

TL;DR: The Random Forest model achieved an AUC of 0.91 to 0.97 across 10 cross-validation experiments, outperforming comparable ML studies in chemotherapy toxicity prediction. Balanced class weighting and shallow tree depth (max 2) helped maintain both accuracy and interpretability despite the small cohort.
Pages 7-10
Which Patient Features Most Strongly Predicted Chemotherapy Toxicity

Top categorical variables: The model identified curable disease status (importance score: 0.081), treatment setting such as adjuvant vs. metastatic (0.061), mucinous adenocarcinoma histology (0.044), T4a tumor stage (0.039), oligometastatic disease (0.032), absence of biologic treatment (0.031), M1c metastatic spread to the peritoneum (0.030), smoking history (0.030), and KRAS mutations (0.028) as the most influential categorical predictors. Treatment setting matters because metastatic disease is treated with palliative intent, where dose reductions occur more frequently, while adjuvant settings often aim for full-dose intensity.

Top numerical variables: White blood cell count (WBC) was the single most important numerical predictor (score: 0.128), followed by lactate dehydrogenase (LDH, 0.102), alkaline phosphatase (ALP, 0.090), aspartate aminotransferase (ASAT, 0.070), absolute neutrophil count (ANC, 0.069), gamma-glutamyl transferase (GGT, 0.062), platelets (0.058), creatinine (0.049), dose reduction (0.045), and blood urea nitrogen (BUN, 0.044). Many of these reflect baseline hepatic function, renal function, and bone marrow reserve, all of which are known to influence drug metabolism and clearance.

Clinical validation: These findings align closely with established risk assessment tools. The CRASH score (Chemotherapy Risk Assessment Scale for High-Age Patients) similarly identified lymphocytes, ASAT, LDH, and creatinine clearance as predictors of toxicity in elderly patients. Hyman et al.'s nomogram for serious drug-related toxicity in phase I trials also flagged WBC, creatinine clearance, albumin, and ASAT as significant predictors. The convergence of these independently developed models strengthens confidence in the biological relevance of the identified features.

LDH as a standout predictor: LDH emerged as particularly noteworthy because it serves a dual role: it is both a known prognostic marker for tumor burden (higher LDH correlates with poorer overall survival in metastatic CRC, per meta-analysis data) and a predictor of treatment toxicity. The authors found LDH especially valuable for anticipating neurological adverse events. Elevated ALP, the third-ranked numerical feature, is an indicator of hepatobiliary damage frequently elevated in CRC patients with liver metastases. Maisano et al. previously showed that high ALP patients on FOLFOX 4 had shorter time to progression and different toxicity profiles.

Mucinous histology and genetic features: Mucinous adenocarcinoma, defined by at least 50% mucinous tumor volume, tends to be proximal, associated with inflammatory responses, and more frequently harbors KRAS and BRAF mutations. These tumors often carry a negative prognosis, leading clinicians to administer full-dose regimens, which correlates with higher adverse event rates. KRAS mutations were also flagged as important, consistent with their known role in tumor progression and treatment resistance, though their direct impact on toxicity (as opposed to efficacy) requires further investigation.

TL;DR: WBC (score: 0.128), LDH (0.102), and ALP (0.090) were the top numerical predictors. Curable disease status (0.081), treatment setting (0.061), and mucinous histology (0.044) led the categorical variables. These findings align with established toxicity prediction tools like the CRASH score.
Pages 9-12
How This Model Compares to Other AI Approaches for Chemotherapy Toxicity

Neutropenia prediction models: Wiberg et al. developed an ML-based model to assess neutropenia risk at the start of each chemotherapy cycle, while Cho et al. used ML to improve febrile neutropenia prediction in breast cancer patients. Cuplov and Andre applied ML to forecast hematological toxicity in rhabdomyosarcoma. These studies collectively demonstrate that machine learning can predict specific toxicity subtypes across different cancer types, but each used different algorithms and focused on narrower toxicity endpoints than the current study's approach of predicting any type of adverse event.

Algorithm selection matters: The effectiveness of any ML algorithm varies significantly by context. Li Chao et al. found XGBoost best for predicting fluoropyrimidine-induced cardiotoxicity (score: 0.607), while another study using EHR data from FOLFOX and FOLFIRI patients found logistic regression outperformed decision trees and neural networks. A separate investigation of irinotecan toxicity (leukopenia, neutropenia, diarrhea) identified Random Forest as optimal for predicting leukopenia. This consistency in RF performance across different settings and toxicity types supports the authors' choice of algorithm.

Smoking as a contested variable: The model flagged smoking history as a significant categorical predictor, but the literature is mixed. Jassem et al. found preclinical evidence that nicotine may impair chemotherapy efficacy and alter drug clearance, potentially requiring dosage adjustments. However, a meta-analysis by Bergman et al. analyzing 9 studies with 3,307 patients found no statistically significant difference in chemotherapy-induced toxicity between smokers and non-smokers (pooled OR: 0.92, 95% CI: 0.53 to 1.60). Meanwhile, Peppone et al. showed that smokers experienced significantly higher total symptom burdens during treatment among 947 cancer patients.

The broader AI landscape: Wang et al. developed a risk prediction nomogram specifically for fluoropyrimidine-induced cardiotoxicity. Deenen et al. explored the relationship between dihydropyrimidine dehydrogenase (DPD) gene polymorphisms and capecitabine toxicity, showing that pharmacogenomics can predict drug-specific adverse events. Dercle et al. used Random Forest models to identify baseline parameters predicting resistance to anti-PD-L1 immunotherapy, finding elevated LDH as an independent predictor of limited overall survival. This convergence of LDH as important across prediction tasks reinforces its role as a key biomarker.

TL;DR: Multiple ML studies across cancer types confirm the viability of toxicity prediction, though optimal algorithms vary by context. Random Forest has shown consistent strength for broad toxicity prediction. The study's top predictors (LDH, WBC, ALP) align with independently developed models, reinforcing their biological significance.
Pages 12-13
Study Constraints and the Path Toward Clinical Implementation

Small sample size: The most significant limitation is the small cohort of only 74 patients with 140 engineered features, creating a high-dimensional problem where overfitting is a real risk. Although the authors mitigated this with balanced class weighting, shallow tree depth, and cross-validation, the AUC of 0.91 to 0.97 was measured on training data, not on an independent external validation set. The authors explicitly acknowledged that a new validation cohort would be needed to assess the model's true predictive potential. Without external validation, these performance figures should be interpreted as promising but preliminary.

Toxicity aggregation: The study analyzed all types of toxicity together as a single binary outcome (any toxicity vs. no toxicity). While this simplifies the prediction problem, it obscures clinically important distinctions. A model that could separately predict liver toxicity, neurotoxicity, or hematologic events would provide more actionable guidance for clinicians. The authors recognized this limitation and noted that per-type toxicity analysis would offer a better understanding of individual patient risk profiles.

Real-world data considerations: The patient data were collected from routine clinical practice at a single Romanian institution rather than from randomized controlled trials. While this reflects real-world clinical conditions, it also introduces potential selection biases and limits generalizability to other populations, healthcare systems, and treatment protocols. The study also did not investigate potential synergies between chemotherapy and biologic treatments in relation to side effects, focusing instead on patient-level characteristics regardless of specific drug class.

Strengths and future work: A notable strength is that all predictor variables (blood tests, clinical measurements, patient-reported parameters) are routinely and consistently available across clinical settings, making the model practical to deploy without requiring specialized assays. The authors plan to explore the relationship between biologic treatment-induced cutaneous toxicity (particularly from anti-EGFR therapy) and overall outcomes as a future research direction. Expanding to include different therapeutic regimens, different cancer types, and multi-institutional cohorts would further strengthen the evidence base for AI-driven toxicity prediction in oncology.

The broader vision: This study represents a proof of concept illustrating the potential for AI to analyze patients' biological and tumoral characteristics to anticipate future adverse events. If toxicity can be predicted before treatment begins, clinicians can proactively adjust doses, intensify monitoring, or choose alternative regimens for high-risk patients, thereby reducing treatment delays and improving outcomes. The machine learning model is positioned as a supportive tool, not a replacement for clinical judgment, assisting in identifying patients who may require closer follow-up due to elevated toxicity risk.

TL;DR: The main limitations are the small cohort (74 patients), lack of external validation, and aggregation of all toxicity types into a single outcome. However, the model uses routinely available clinical variables and achieved promising AUC (0.91 to 0.97). Future work should include larger multi-institutional cohorts, per-toxicity-type analysis, and independent validation.
Citation: Froicu EM, Oniciuc OM, Afrăsânie VA, et al.. Open Access, 2024. Available at: PMC11431340. DOI: 10.3390/diagnostics14182074. License: cc by.