Modernizing Colorectal Cancer Care With AI

Plain-English Explanations

Background

Page 1

Why CRC Needs Better Detection and How AI Fits In

Colorectal cancer (CRC) accounts for roughly 10% of all cancer diagnoses and deaths globally, making it the third most common malignancy and the second leading cause of cancer-related mortality. A screening colonoscopy reduces the 10-year CRC risk by 18% (risk ratio 0.82), as demonstrated in the Bretthauer et al. trial published in the New England Journal of Medicine. Despite the established value of colonoscopy, a meta-analysis of over 15,000 procedures revealed an adenoma miss rate of 26% and a serrated polyp miss rate of 27%, illustrating the gap between current practice and ideal detection.

Post-colonoscopy CRC, which occurs in approximately 0.6% to 4% of individuals after screening or surveillance, is predominantly driven by missed neoplasia during the initial exam. This underperformance has created a compelling rationale for deploying artificial intelligence to augment human detection capabilities, particularly for subtle, flat, or diminutive lesions that are prone to being overlooked by endoscopists.

Meta-analysis data show that AI-assisted colonoscopy meaningfully improves adenoma detection rates alongside increases in adenomas per colonoscopy and detection of non-advanced lesions, which are key precursors that, if removed, prevent future cancer. The authors position AI as a tool spanning the entire CRC pathway: from real-time polyp detection in colonoscopy, through digital pathology and molecular triage, to quantitative imaging with radiomics for staging and treatment response prediction.

TL;DR: CRC is a top global cancer killer with substantial polyp miss rates during colonoscopy (26% for adenomas). AI has been shown to meaningfully improve detection rates across the care pathway, from screening through survivorship.

Methodology

Pages 1-2

Review Design: Narrative Synthesis Across Three AI Pillars

This is a narrative, thematically organized review rather than a formal systematic review with quantitative pooling. The authors searched PubMed/MEDLINE, Embase, Web of Science, Scopus, and the Cochrane Library using search strings combining CRC-related terms with keywords such as "computer-aided detection/CADe," "radiomics," "deep learning," "digital pathology," "microsatellite instability," and "H&E." Targeted hand-searches also covered major society statements from organizations like the European Society of Gastrointestinal Endoscopy (ESGE) and the American Society for Gastrointestinal Endoscopy (ASGE).

Reference lists of key articles and recent meta-analyses were snowballed to capture additional high-yield studies. The review is organized around three pillars: (i) real-time endoscopic computer-aided detection (CADe) during colonoscopy, (ii) radiomics and AI-enhanced CT/MRI, and (iii) digital pathology including microsatellite instability (MSI) inference from hematoxylin and eosin (H&E) stained slides. Blood-based and omics-driven AI models are included as supportive evidence where they clarify pathways of care.

The authors applied pragmatic selection principles: they preferred higher-level evidence such as systematic reviews, meta-analyses, randomized or prospective multicenter trials, and large registries. Single case reports and very small single-arm series were excluded unless they were uniquely illustrative. Priority was given to consensus-backed work with transparent methods (public code/data, prespecified endpoints, calibration reporting). Recency (approximately 2020 to 2025) and clinical salience (adenoma detection rate, AUC, external validation) served as tiebreakers.

It is worth noting that because this is a narrative review without formal risk-of-bias assessment tools like QUADAS-2 or PROBAST, the strength of evidence for individual claims varies. The authors acknowledged this scope explicitly, stating their aim was to map decisive trends and typical effect sizes rather than to enumerate every publication.

TL;DR: The review searched five major databases and society statements, organized findings around three AI pillars (CADe, radiomics, digital pathology), and prioritized high-level evidence from 2020 to 2025. No formal bias assessment was performed, as this is a narrative synthesis.

Computer-Aided Detection

Pages 2-3

CADe Systems: Quantified Gains in Polyp and Adenoma Detection

Pooled evidence from 44 RCTs: A major meta-analysis comparing standard colonoscopy with AI-driven CADe across screening, surveillance, and diagnostic settings found that CADe increased the adenoma detection rate from 36.7% to 44.7% (RR = 1.21; 95% CI = 1.15 to 1.28) and raised adenomas per colonoscopy from 0.78 to 0.98 (incidence rate difference = 0.22; 95% CI = 0.16 to 0.28). In tandem colonoscopy designs, CADe halved the adenoma miss rate from 35.3% to 16.1% (RR = 0.47; 95% CI = 0.36 to 0.60).

Platform-specific performance: The review reports results across four named systems. YOLO-based architectures increased adenoma detection rates from 22% to 29% (RR = 1.36; 95% CI = 1.14 to 1.62). GI Genius (Medtronic) improved rates from 50% to 55% (RR = 1.16; 95% CI = 1.00 to 1.34). Fujifilm CAD EYE raised detection from 43% to 53% (RR = 1.21; 95% CI = 1.10 to 1.34). EndoScreener, built on the SegNet architecture, improved rates from 26% to 31% (RR = 1.22; 95% CI = 1.11 to 1.35).

Tandem trial confirmation: A separate meta-analysis of six tandem RCTs involving 1,718 patients confirmed that CADe versus standard white-light colonoscopy reduced the adenoma miss rate by 54% and the polyp miss rate by 56%, with low heterogeneity (I-squared = 18%). Sensitivity analyses confirmed benefits across screening and surveillance indications, supporting the use of CADe for accurate CRC screening.

Convolutional neural networks (CNNs) enable frame-by-frame inference at clinical frame rates, allowing real-time localization of diminutive and flat lesions without disrupting the endoscopist's workflow. Early prospective work by Urban et al. demonstrated real-time polyp identification on commodity hardware with 96% accuracy, results that foreshadowed the subsequent wave of randomized trials and regulatory clearances for commercial CADe platforms.

TL;DR: Across 44 RCTs, CADe raised the adenoma detection rate by about 8 percentage points and cut the adenoma miss rate in half. Named platforms (GI Genius, Fujifilm CAD EYE, EndoScreener, YOLO-based systems) all showed statistically significant improvements.

Digital Pathology

Pages 3-4

Inferring Microsatellite Instability From H&E Slides With Deep Learning

Clinical significance of MSI: Microsatellite instability, caused by defects in the mismatch-repair pathway, is found in roughly 5% to 20% of CRC tumors. Its prevalence is stage-dependent, exceeding 20% in stage II CRC but dropping below 5% in later stages. MSI status carries direct therapeutic implications because MSI-high tumors respond well to immune checkpoint inhibitors, making rapid and accurate MSI determination essential for treatment planning.

Deep-learning model performance: Kather et al. developed the first automated, end-to-end deep-learning model for MSI detection in 2019, achieving an AUC of 0.84 on the TCGA cohort. Subsequent studies using newer methodologies have pushed AUC values to a range of 0.78 to 0.98. These models work by analyzing whole-slide images (WSIs) of H&E-stained tissue, capturing morphologic signatures such as tumor-infiltrating lymphocytes and mucinous architecture that correlate with mismatch-repair deficiency.

Clinical deployment milestone: In 2022, MSIntuit (developed by Owkin, Paris/New York) became the first deep-learning biomarker detector to receive regulatory approval for routine clinical use in Europe. This tool functions as a rapid, low-cost pre-screen that enriches the pool of cases sent for formal molecular testing, streamlining reflex workflows and prioritizing confirmatory assays. The aim is to accelerate diagnosis and reduce unnecessary tests while maintaining a high negative predictive value.

Persistent challenges: Despite these advances, the authors note significant obstacles: scarce expertly labeled datasets, histologic variability requiring large numbers of examples per pattern, gigapixel WSIs that force patching and risk downsampling losses, reliance on weak single-task models, substantial computational and storage demands, susceptibility to adversarial perturbations and slide artifacts, and black-box decision-making that limits interpretability and trust. Full automation is considered neither realistic nor wise; the clinician remains the ultimate evaluator.

TL;DR: Deep-learning models can predict MSI status from routine H&E slides with AUCs of 0.78 to 0.98. MSIntuit became the first such tool approved for clinical use in Europe in 2022, serving as a pre-screen to prioritize molecular testing.

Radiomics & Imaging AI

Pages 4-5

AI-Enhanced CT and MRI: From Tumor Grading to Treatment Response

Radiomic feature extraction: Radiomics converts medical images into high-dimensional quantitative data, capturing tumor heterogeneity invisible to the naked eye. In CRC, CT-derived radiomic signatures have demonstrated the ability to differentiate high-grade from low-grade tumors (AUC = 0.7 to 0.9), separate stage I-II from stage III-IV disease (AUC = 0.8), and predict MSI status. A combined clinical-radiomics model applied to preoperative CT improved MSI prediction over either component alone (AUC = 0.8), demonstrating that radiomics should complement standard clinicopathologic assessment rather than replace it.

Lymph-node staging: A meta-analysis of CNN-based models for detecting nodal metastases on preoperative imaging reported an AUROC of approximately 0.92, substantially outperforming radiologists whose performance was around 0.68. This gap suggests that AI-assisted nodal staging could meaningfully improve surgical planning and treatment decisions in CRC, though continued refinement and external validation are required.

Tumor segmentation and treatment response: Deep-learning segmentation models provide highly accurate, automated delineation of colorectal tumors on MRI and CT, reducing interobserver variability. These precise tumor contours improve the consistency of volumetric measurements, sharpen local staging, and optimize radiotherapy target definition. For treatment response prediction, radiomics and deep-learning models derived from pretreatment MRI can identify rectal cancer patients likely to achieve a pathological complete response after neoadjuvant chemoradiation. Multimodal radiomic signatures fusing MRI and PET features have shown strong preoperative accuracy.

Post-treatment surveillance: In localized colon cancer, baseline CT radiomics models stratify relapse risk with a five-year relapse AUC of approximately 0.74. The authors highlight that multi-modal integration, combining radiologic features with genomic, pathological, and clinical data, represents the next frontier. These radiogenomic and multi-omics models promise to refine prognostication and treatment selection throughout CRC management.

TL;DR: Radiomics extracts quantitative features from CT/MRI to grade tumors (AUC 0.7 to 0.9), stage disease, and predict treatment response. CNN-based lymph-node detection (AUROC ~0.92) substantially outperforms radiologist assessment (~0.68).

Blood-Based & Genomic AI

Page 5

Noninvasive Screening: AI Applied to Blood Tests and Genetic Markers

Blood fluorescence spectroscopy: Soares et al. trained a support vector machine (SVM) model on blood fluorescence data that distinguished CRC from normal tissue with 87% sensitivity and 95% specificity, while identifying nonmalignant findings with 60% sensitivity and 79% specificity. This approach illustrates how standard blood-based assays can be augmented with machine-learning classification to improve noninvasive screening accuracy.

ColonFlag and circulating tumor cells: ColonFlag, a machine-learning tool leveraging demographics and complete blood count data, was evaluated in 17,676 individuals. A positive ColonFlag score doubled the odds of advanced precancerous lesions at 95% specificity, enabling targeted intensification of colonoscopy screening. The CellMax (CMx) platform, which enriches epithelial circulating tumor cells, demonstrated 100% experimental specificity and 80% clinical sensitivity in a 47-subject cohort, supporting its feasibility as a liquid biopsy approach.

Gene expression and mutation profiling: Hu et al. compared three models using gene expression profiles from UICC stage II cases and found that S-Kohonen networks classified relapse versus no relapse with 91% accuracy, outperforming back-propagation neural networks (66%) and SVM (70%). Xu et al. used an SVM pipeline to identify differentially expressed genes and validated a 15-gene panel that stratified high-risk patients and predicted prognosis. Additional work by Zhang et al. applied a counter-propagation artificial neural network to near-infrared assays for detecting BRAF V600E, achieving 100% sensitivity, 87.5% specificity, and 93.8% overall accuracy.

Epigenetic biomarkers: Kel et al. introduced a "walking pathway" strategy to discover methylated DNA biomarkers and used AI to interrogate cancer-specific enhancers in CRC. These diverse approaches collectively demonstrate that AI-driven blood and molecular analyses can extend noninvasive screening beyond traditional methods, potentially reaching populations who decline or lack access to colonoscopy.

TL;DR: AI applied to blood tests achieves up to 87% sensitivity and 95% specificity for CRC detection. Machine-learning tools like ColonFlag (17,676 patients) and gene expression classifiers (91% accuracy) show promise for noninvasive screening and risk stratification.

Limitations

Pages 5-6

Barriers to Clinical Adoption: Data, Deskilling, and the Black Box

Data bottlenecks: AI models require large, varied datasets for thorough validation and successful integration into standard healthcare workflows. Training frequently requires hundreds or thousands of annotated images, which demand considerable manual labeling by physicians. Only a few publicly available labeled datasets exist for CRC AI development. Crowdsourcing annotation can be faster and cheaper but introduces label noise, while active learning strategies can reduce the burden but remain underused.

Endoscopist deskilling: A particularly concerning finding is the risk of deskilling. After AI adoption, adenoma detection rates during non-AI colonoscopies fell from 28.4% to 22.1%, representing a 6.3 percentage-point absolute reduction (22.2% relative reduction). This suggests that over-reliance on automation may erode the diagnostic skills of endoscopists when AI assistance is unavailable, raising questions about maintaining competence in settings where AI tools may not always be accessible.

Interpretability and regulation: The "black-box" nature of deep-learning models poses fundamental challenges for clinical trust, interpretability, and regulatory approval. Data privacy concerns add another layer of complexity. The authors emphasize that issues with model interpretability, data privacy, and regulatory approval further complicate the path from promising research results to deployed clinical tools.

In digital pathology specifically, obstacles include pervasive histologic variability, gigapixel whole-slide images that force patching and risk information loss from downsampling, reliance on weak single-task models, substantial computational and storage demands, and susceptibility to adversarial perturbations and common slide artifacts. The authors stress that full automation is neither realistic nor wise, and that adoption depends on usability, clear return on investment, and demonstrated real-world performance.

TL;DR: Key barriers include scarce labeled datasets, endoscopist deskilling (adenoma detection dropped 6.3 percentage points when AI was removed), black-box models that resist clinical interpretation, and the impracticality of full automation in pathology.

Future Directions

Pages 6-7

The Road Ahead: Multicenter Trials, Biopsy Optimization, and Lynch Syndrome

Timeline and trajectory: The authors project that routine AI adoption in CRC care is plausible within the next 10 to 15 years, driven by advances in computing power, data availability, model architectures, and evolving regulatory and validation frameworks. Early evaluations across radiology, dermatology, ophthalmology, and pathology consistently report promising performance, suggesting meaningful potential to enhance diagnostic accuracy and workflow efficiency.

Pathology-specific priorities: Future work must move beyond TCGA-centric training by curating large, ethnically diverse, expert-reviewed WSI datasets and mandating robust external, multi-institution validation. Because many stage IV CRC cases yield only endoscopic biopsies rather than surgical resections, models require optimization and validation on small tissue samples to guide immunotherapy selection. A critical unmet need is differentiating Lynch syndrome from sporadic MSI-high tumors, a distinction with profound implications for patient management and family screening.

Trial design requirements: The authors call for multicenter, adequately powered trials with long-term follow-up of at least 10 years to assess patient-important outcomes after CADe-assisted colonoscopy. These outcomes should include CRC incidence, stage at diagnosis, post-colonoscopy CRC rates, and disease-specific mortality, alongside cost-effectiveness analyses, real-world implementation data, and subgroup effects stratified by lesion morphology and operator experience.

Governance and deployment: Standardized reporting, explainability audits, and benchmarking against established pathology quality standards are necessary to ensure trustworthy adoption. The authors envision AI embedded within multidisciplinary workflows that individualize surveillance intervals, neoadjuvant strategies, surgical planning, and adjuvant therapy. Continuous lifecycle performance monitoring and implementation frameworks that ensure usability, equity, and cost-effectiveness are positioned as essential for AI to evolve from promising research tools into dependable clinical infrastructure.

TL;DR: Routine AI adoption in CRC is projected within 10 to 15 years. Priorities include multicenter trials with 10+ year follow-up, validation on biopsy specimens, Lynch syndrome differentiation, ethnically diverse datasets, and governance frameworks ensuring explainability and equity.

Modernizing Colorectal Cancer Care With Artificial Intelligence: Real-Time Detection, Radiomics, and Digital Pathology

Original Paper (PDF)