Colorectal cancer (CRC) is the most common malignant tumor of the digestive system and ranks as the fourth leading cause of cancer death worldwide. In 2012, approximately 1.36 million new CRC cases were diagnosed globally, making it the third highest incidence of malignant tumors overall, ranking third in men and second in women. That same year, roughly 690,000 CRC deaths occurred, placing it fourth among cancer-related mortality. By 2015, estimates projected 777,987 new cases and 352,589 deaths in developed countries alone. Five-year survival varies considerably by country, ranging from 4.3% to 5.3% for men and 2.7% to 4.9% for women, underscoring the significant unmet need.
Colonoscopy remains the most commonly used clinical screening method, but it carries substantial limitations: poor patient compliance, lack of family history information, inconvenience of real-time monitoring, high expense, and risk of complications. These barriers have driven researchers to seek more effective strategies for early detection, recurrence monitoring, and tracking disease progression.
The AI opportunity: Artificial intelligence, encompassing technologies such as robotics, speech recognition, natural language processing, image recognition, and machine learning, has two main branches in medicine: a virtual branch (medical imaging, clinical decision support, drug development) and a physical branch (surgical and nursing robots). The authors note that approximately 70% of information required for clinical decisions comes from laboratory testing, making AI-driven automation and interpretation a natural fit for improving CRC care across screening, diagnosis, treatment, and prognosis.
Scope of this review: The paper summarizes research progress across all major domains where AI intersects CRC management: colonoscopy-based detection, pathological biopsy analysis, blood-based diagnostics, genetic testing, non-coding RNA biomarkers, surgical planning, chemotherapy optimization, personalized therapy, and prognosis prediction. The authors project that while many applications are not yet fully practical, they are very likely to be realized within 10 to 15 years.
Early detection methods: Automated polyp detection traces back to Fernandez-Esparrach et al., who designed an energy-map-based system that achieved 70.4% sensitivity and 72.4% specificity across 31 polyp types. This approach was subsequently refined through deep learning. By 2017, Zhang et al. developed an algorithm for automatic classification of polyps as hyperplasia or adenomatosis, while Takemura et al. combined narrow-band imaging (NBI) with support vector machine (SVM) technology to distinguish neoplastic from non-neoplastic polyps with 97.8% accuracy.
CNN-based breakthroughs: Gregor et al. trained a convolutional neural network (CNN) on 8,641 marked images from over 2,000 colonoscopies, achieving a cross-validation accuracy of 96.4% and an AUC of 0.991. Kominami et al. demonstrated real-time computer-aided diagnosis (CAD) for small adenomatous polyps. Mori et al. combined NBI with staining image technology for real-time recognition of small neoplastic polyps, reaching a pathologic prediction rate of 98.1%. Wang et al. showed that real-time image recognition systems significantly increase the adenoma detection rate (ADR).
Advanced systems: Akbari et al. applied CNN-based polyp segmentation with improved image patching, achieving 99.3% sensitivity, 74.8% specificity, and 97.7% accuracy. EndoBRAIN, an AI endoscopic diagnosis system that analyzes cell nuclei, crypt structures, and microvessels, distinguished neoplastic from non-neoplastic lesions with 96.9% sensitivity, 94.3% specificity, and 96.0% accuracy, significantly outperforming 30 endoscopists in a retrospective comparison by Kudo et al.
Capsule endoscopy expansion: Blanes-Vidal et al. extended AI to capsule endoscopy, developing a CNN for autonomous detection and localization of colon polyps that achieved 96.4% accuracy, 97.1% sensitivity, and 93.3% specificity. For nonpolyposis colon cancer, CAD systems can diagnose irregular, discontinuous crypt structures. Takeda et al. trained an endocytoscopy CAD system on 5,843 images of 375 lesions that achieved 89.4% sensitivity, 98.9% specificity, and 94.1% accuracy for invasive CRC detection.
The diagnostic challenge: Pathological biopsy is essential for CRC diagnosis and grading, but results are typically subjective assessments based on individual pathologist experience and knowledge. This inevitably produces significant inter-observer differences. AI technology can automatically classify and diagnose biopsy samples, improving accuracy while reducing time and costs.
CRC detection and grading: Rathore et al. developed a colorectal cancer detection (CCD) system based on the SVM radial basis function algorithm, achieving 95.40% cancer detection accuracy and 93.47% grading accuracy. Building on this, the same team proposed the HFS-CC technique, which classifies biopsy images using geometric features, morphology, and texture, reaching 98.07% test accuracy across 176 subjects. Yang et al. designed a color histogram and least squares SVM approach that achieved 96.78% tumor classification accuracy.
Nuclei classification: Korsuk et al. designed a spatially constrained CNN (SC-CNN) combined with a neighboring ensemble predictor (NEP) to detect and classify nuclei in HE-stained colon cancer specimens. Testing on 100 specimens, this joint approach yielded an average F1 score of 0.802 and 78.1% accuracy. For immunohistochemistry, Abdelsamea et al. developed TuPaQ to segment CRC tumor epitheliums with 84% sensitivity and 95% specificity, with the mean tumor area extremely close to manual annotation (r = 0.956, P less than 0.001).
Gland segmentation and slide annotation: Eycke et al. proposed a deep-learning method for automatically annotating slide images from colorectal tissue samples, capable of segmenting glandular epitheliums in both HE staining and IHC sections. Graham et al. improved CNNs by proposing MILD-Net, a fully convolutional network that compensates for information loss caused by max-pooling by reintroducing original images at multiple points within the network. These advances collectively move pathology closer to reproducible, AI-assisted diagnostics.
Blood fluorescence spectroscopy: Soares et al. trained an SVM on blood fluorescence data that distinguished CRC from normal tissue with 87% sensitivity and 95% specificity, while identifying nonmalignant findings with 60% sensitivity and 79% specificity. This demonstrates how standard blood-based assays can be augmented with machine-learning classification to improve noninvasive screening accuracy without requiring invasive procedures.
ColonFlag and circulating tumor cells: ColonFlag, a machine learning algorithm using basic demographics and complete blood counts, was evaluated in 17,676 subjects. A positive ColonFlag score doubled the odds of advanced precancerous lesions (odds ratio 2.0) at 95% specificity, enabling targeted intensification of colonoscopy screening for high-risk individuals. The CellMax (CMx) platform, which enriches epithelial circulating tumor cells, demonstrated 100% experimental specificity and 80% clinical sensitivity in a 47-subject cohort.
Gene expression and mutation profiling: Hu et al. compared three neural networks using gene expression profiles from UICC stage II patients and found the S-Kohonen network classified relapse vs. no-relapse with 91% accuracy, outperforming back-propagation neural networks (66%) and SVM (70%). Xu et al. used an SVM pipeline to identify differentially expressed genes and validated a 15-gene panel that stratified high-risk patients. Zhang et al. applied a counter-propagation ANN to near-infrared assays for detecting the BRAF V600E mutation, achieving 100% sensitivity, 87.5% specificity, and 93.8% overall accuracy.
Cell-free DNA and epigenetics: Wan et al. designed an AI program to improve plasma cfDNA extraction sensitivity for CRC patients. For a cohort heavily weighted toward early-stage cancer (80% stage I/II), they achieved a mean AUC of 0.92 with 85% sensitivity. Kel et al. developed a "walking pathway" strategy to discover methylated DNA biomarkers using AI to interrogate cancer-specific enhancers. These approaches collectively extend noninvasive screening beyond traditional methods, potentially reaching populations who decline or lack access to colonoscopy.
Why ncRNAs matter: Despite completion of the human genome project, many mechanisms of tumorigenesis remain unexplained by gene sequences alone. Non-coding RNAs (ncRNAs) have emerged as promising biomarkers for tumor diagnosis and treatment. However, analyzing ncRNA mechanisms involves massive data volumes and complex computations, making AI technology a natural bridge between ncRNA research and clinical application.
miRNA-based tumor prediction: Chang et al. measured miRNA expression profiles in 20 pairs of stage II CRC tissues and corresponding normal tissues, then designed an ANN algorithm tested on 102 samples. They identified three miRNAs (miR-139-5p, miR-31, and miR-17-92) capable of predicting tumor status. Amirkhah et al. refined this approach with CRCmiRTar, a naive Bayes classification model that not only predicts miRNAs but also reveals interactions between miRNAs and target messenger RNAs, achieving an AUC of 0.956 with 93% sensitivity and 86.1% specificity.
Advanced prediction models: Xuan et al. proposed CNNDMP, a dual-CNN prediction method that explores deep features of miRNA similarities and disease similarities. Case studies on breast cancer, CRC, and lung cancer demonstrated its capability for detecting potential disease-associated miRNAs. Afshar et al. screened four CRC-specific miRNAs from a database and achieved an AUC of 1.0 using an ANN to classify 371 patients and 150 controls. Clinical validation on 297 patients from eight Spanish medical centers showed SVM classification with 85% sensitivity and 90% specificity.
Small-sample solutions: ShrinkBayes was introduced to address the problem of excessive degrees of freedom when sample sizes are insufficient, demonstrating improved predictive accuracy through studies with small samples or complex designs. The authors note that AI has significantly accelerated the study of ncRNA mechanisms in tumor processes, producing new methods for screening CRC molecular markers that complement conventional approaches.
Preoperative staging with AI: Ding et al. applied the "Faster R-CNN" algorithm to MRI plain scan images of pelvic lymph nodes in 414 rectal cancer patients. Compared to conventional MRI evaluation, the AI-based N staging was closer to pathological criteria, offering greater clinical value for preoperative assessment. AI can also evaluate extramural vascular invasion (EMVI), enabling patients with positive EMVI to receive neoadjuvant chemoradiotherapy before surgery, significantly reducing local recurrence.
Reducing unnecessary surgeries: Ichimasa et al. designed an AI model for preoperative prediction of lymph node metastasis (LNM) after endoscopic resection of T1 CRC. Analyzing 45 clinicopathological factors, their model achieved 100% sensitivity, 66% specificity, and 69% accuracy. Critically, the AI model identified more unnecessary surgeries than existing guidelines from the United States, Japan, and Europe, potentially sparing patients from unneeded procedures.
Robotic surgery: The Da Vinci robot represents a major milestone in CRC surgery. A retrospective study of 71 patients undergoing rectal low anterior resection found that robot-assisted surgery had lower conversion and complication rates compared to traditional surgery. Another study of 61 patients demonstrated a less pronounced inflammatory response with robotic compared to open surgery. Researchers have noted that robots have a faster learning curve, meaning fewer training cases will be required for surgeons to achieve proficiency.
Chemotherapy and drug development: The NamiRobot system delivers drugs to cancer cells by sensing reduced oxygen levels caused by tumor proliferation. Ferrari et al. developed an AI model based on MRI texture analysis that predicted pathological complete response (pCR) and non-response following neoadjuvant chemotherapy with AUC values of 0.86 and 0.83, respectively. Oyaga-Iriarte et al. built an SVM model to predict irinotecan toxicity (leukopenia, neutropenia, and diarrhea) with accuracies of 76%, 75%, and 91%, respectively, enabling early dose adjustment.
Watson for Oncology: The WFO system, developed by IBM and Memorial Sloan Kettering Cancer Center, demonstrated approximately 90% concordance with human expert recommendations in clinical trials. In a South Korean study of 61 CRC samples, the concordance rate between WFO and a multidisciplinary team was 46.4%, rising to 88.4% when "for consideration" recommendations were included.
Survival and recurrence modeling: Grundner et al. used genetic markers to train models predicting overall survival (OS), disease-free survival (DFS), and recurrence rates. Peng et al. developed a prognostic ANN scoring system for stage IIA CRC that predicts 10-year OS and DFS based on clinical data. Kather et al. demonstrated that AI can assess independent prognostic factors (OS, CRC-specific OS, and recurrence-free OS) from pathological images alone with 94% accuracy, a remarkable result given the complexity of histological interpretation.
Tumor-stroma ratio classification: Geesink et al. used semi-automatic deep learning to classify tumor-stroma ratios (TSR) of CRC pathological specimens with 94.6% accuracy. TSR is an independent prognostic factor: tumors are assigned "stroma-high" or "stroma-low" classifications that help guide treatment decisions. Skrede et al. constructed 10 CNNs trained on over 12 million image tiles from 920 patients to search for novel CRC prognostic biomarkers.
Metastasis prediction: Saghapour et al. combined logistic regression with ANN systems to create a mixed prediction model achieving 100% sensitivity and 95.8% specificity for late-stage CRC metastasis prediction. Zhi et al. used SVM models to screen differentially expressed genes of metastatic CRC, identifying 40 characteristic genes across five integrated databases. For lymph node metastasis, Takamatsu et al. extracted features from cytokeratin immunohistochemical images, achieving 80.0% sensitivity, 94.5% specificity, and an AUC of 0.938.
Lymph node and immune cell analysis: Lu et al. built a Faster R-CNN system for LNM diagnosis using nearly 80,000 training epochs, achieving MRI diagnosis in 20 seconds (30 times faster than average radiologist time) with an AUC of 0.912. Eyraud et al. performed computer-aided analysis of whole-slide digital images to assess cell infiltration, while Reichling et al. used digital tumor parameters to automatically quantify lymphocyte density and infiltration surface area in stage III CRC patients, linking immune microenvironment features to prognosis.
Lack of reliable guidelines: AI diagnosis currently lacks reliable guidelines and gold standards. Pathologists frequently provide inconsistent judgments on the same pathological section, particularly for early lesions. When AI systems diagnose pathological sections, they focus solely on external input criteria and neglect other patient information, which could lead to overdiagnosis. The absence of standardized benchmarks makes it difficult to compare systems or establish clinical trust.
Image signal limitations: A lack of stratification of image signal strength limits accurate tumor diagnosis. Cancer presents many immune landscapes, meaning imaging signals must be differentiated in more subtle ways to provide accurate guidance for immunotherapy. Current AI systems struggle to capture these nuanced distinctions, particularly when training data does not adequately represent the full spectrum of tumor heterogeneity.
Cost and complexity: Developing AI systems is expensive and technically demanding. Training deep learning networks requires large numbers of training and verification samples to achieve adequate accuracy. Even improved algorithms designed for small sample sizes inevitably suffer reduced accuracy. Training processes demand powerful computing configurations and long training periods, while machine maintenance adds ongoing costs. The complexity of AI training methods means non-professionals can only conduct auxiliary diagnostics using pre-built functions, making it difficult to update databases and algorithms for novel cases.
Privacy and data concerns: Internet-connected AI systems face significant challenges regarding user screening and privacy protection. Increasing heterogeneous data sources and the richness of user data strongly increase the possibility of anonymized data re-identification. The authors note that no suitable technical solution currently exists to mitigate the challenge of preserving privacy while meeting the increasing need of data-driven science for accessing large genomic and phenotypic datasets.
Looking forward: Despite these obstacles, the authors remain optimistic about general AI application prospects in medicine. They highlight three emerging directions: quality-monitoring AI systems that can oversee colonoscopy results across multiple institutions simultaneously, AI-integrated mobile applications like ColorApp that share CRC information with community educators and clinicians, and personalized virtual health assistants that measure vital signs in real time and integrate services and data for more streamlined care pathways.