Endometrial cancer (EC) is the most common gynecological malignancy in developed countries, and it is growing rapidly. Between 1990 and 2021, the number of cases among women aged 55 and older more than doubled, rising from roughly 141,000 to over 360,000. By 2020, more than 417,000 women were diagnosed worldwide, representing a 132% increase over three decades. Cases are projected to climb another 40% between 2020 and 2040, driven by aging populations, rising obesity rates, and increasing prevalence of diabetes.
Scope of the review: This paper is a non-systematic literature review that searched PubMed and Google Scholar for publications up to February 2025. The search terms spanned artificial intelligence, neural networks, diagnostics, histology, imaging (MRI, ultrasound, CT), multi-omics, and preoperative diagnostics. After title and abstract screening followed by full-text evaluation, the authors selected 32 articles through consensus. The final set included original research and review articles covering AI applications in three domains: histopathology, medical imaging, and multi-omics analysis.
Current diagnostic landscape: Standard diagnostic tools for EC include transvaginal ultrasound, histopathological biopsy, and MRI. Each plays a distinct role, but each has limitations. MRI offers excellent soft-tissue contrast and avoids ionizing radiation, yet it struggles with sensitivity in early-stage disease. Conventional contrast agents used to improve MRI clarity can cause toxicity and poor renal clearance. Histopathology remains the gold standard, but manual evaluation of tissue samples is slow and can miss subtle nuclear and architectural features.
The AI opportunity: AI techniques, including deep learning and machine learning, can analyze complex datasets and detect patterns that human observers may overlook. The authors organized AI applications into three categories: histopathology (analyzing tissue slides and hysteroscopic images), imaging (enhancing ultrasound, CT, and MRI interpretation), and multi-omics (combining genomic, proteomic, metabolomic, and epigenomic data). Across all three areas, AI models have shown the ability to match or exceed expert-level accuracy.
Hysteroscopy is a procedure where a small camera is inserted into the uterus to visually inspect the endometrial lining. Two studies in this review used AI to automatically classify what the camera sees, reducing the subjectivity that comes with a physician's visual assessment alone.
VGGNet-16 for lesion classification (Zhang et al.): A convolutional neural network based on the VGGNet-16 architecture was trained on 6,478 hysteroscopic images from 454 patients covering various endometrial lesion types. When tested on 250 images, the model achieved 80.8% overall accuracy across multiple lesion categories. For the clinically critical task of distinguishing benign lesions from premalignant or malignant ones, it reached 90.8% accuracy, 83.0% sensitivity, and 96.0% specificity. Notably, the CNN outperformed gynecologists on the same lesion-classification tasks.
Multi-model ensemble with continuity analysis (Takahashi et al.): A different approach combined three deep neural network models and introduced a "continuity analysis" method to improve accuracy. Studying 177 patients with conditions ranging from normal endometrium to EC, the standard single-model approach achieved roughly 80% accuracy. When the three models were combined and continuity analysis was applied, accuracy rose to 90.29%, with 91.66% sensitivity and 89.36% specificity. Continuity analysis works by considering consecutive frames from the hysteroscopic video rather than isolated snapshots, mimicking how a clinician reviews the entire visual field.
Clinical implications: Both studies suggest that AI-assisted hysteroscopy could help clinicians make faster, more consistent diagnoses during the procedure itself. The VGGNet-16 model's ability to exceed gynecologist performance and the ensemble model's high sensitivity for detecting EC point toward a future where real-time AI feedback guides clinical decisions during hysteroscopic examinations.
Histopathology is the microscopic examination of tissue samples, and it remains the definitive method for diagnosing EC. However, manual analysis by pathologists is time-consuming and can be subjective. Three studies in this review applied deep learning to digitized tissue slides and cytology specimens to automate and improve this process.
Whole-slide image triage (Fell et al.): A CNN was trained on 2,909 whole-slide images (WSIs) from endometrial biopsies, each annotated with "malignant" and "other/benign" regions. The model generated heatmaps highlighting suspicious areas and classified entire slides as malignant, benign, or insufficient. It achieved 90% overall accuracy and 97% accuracy specifically for malignant slides. The goal was not to replace pathologists but to prioritize which slides they review first, potentially speeding up cancer diagnosis in busy laboratories.
Cytology screening with U-Net and DenseNet201 (Li et al.): Addressing the global shortage of cytopathologists, this study developed an AI pipeline for screening endometrial cell clumps (ECCs) from cytology slides. Using Li Brush sampling and liquid-based cytology, 113 patient samples (42 malignant, 71 benign) were collected, yielding 15,913 whole-slide images. A U-Net segmentation network isolated 39,000 ECC patches, which were then classified by a DenseNet201 model. The system achieved 93.5% accuracy, 92.2% specificity, and 92.0% sensitivity. When benchmarked against VGG16, InceptionV3, ResNet, and SVM, DenseNet201 outperformed all competitors and matched expert pathologist labels perfectly on the verification set.
HIENet: CNN with attention mechanisms (Sun et al.): The HIENet computer-aided diagnosis system was trained on over 3,500 H&E-stained histopathological images to classify four tissue types: normal endometrium, endometrial polyp, endometrial hyperplasia, and endometrial adenocarcinoma. In ten-fold cross-validation, it achieved 76.91% accuracy for the four-class problem and an AUC of 0.9579 for binary malignancy classification (81.04% sensitivity, 94.78% specificity). External validation confirmed 84.5% accuracy with an AUC of 0.9829 and perfect specificity of 100%. HIENet outperformed three expert pathologists and several other CNN classifiers, and its attention mechanism provided interpretable visualizations showing which tissue features drove each prediction.
Multi-omics refers to the simultaneous analysis of multiple biological information layers, including the genome, transcriptome, proteome, metabolome, and epigenome. In EC research, combining these datasets can reveal molecular markers and therapeutic targets that no single data type would uncover alone. Several studies in this review explored how AI and multi-omics approaches can improve EC diagnosis and subtyping.
Proteogenomic analysis (Dou et al.): This study performed an in-depth molecular analysis of 138 EC tumors and 20 normal tissues using ten different omics platforms, including whole-genome sequencing (WGS), whole-exome sequencing (WES), RNA sequencing, and various proteomics methods. AI deep learning models demonstrated strong capability in predicting EC subtypes and key mutations directly from H&E-stained histopathology images, supporting computational pathology for rapid diagnostics. The researchers also identified actionable molecular alterations, such as PIK3R1 in-frame insertions linked to elevated AKT phosphorylation and sensitivity to AKT inhibitors, pointing toward new targeted therapy options.
Panoptes multi-resolution CNN (Hong et al.): The Panoptes model is a customized multi-resolution deep convolutional neural network designed to predict histological subtypes, molecular subtypes (corresponding to TCGA classifications such as POLE ultramutated, MSI hypermutated, copy-number low, and copy-number high), and 18 common gene mutations from digitized H&E-stained slides. Trained on 496 slides from 456 patients, the model outperformed traditional methods and demonstrated strong generalization on independent datasets. This approach could potentially replace costly and time-consuming genomic sequencing for routine molecular subtyping.
Metabolomics and proteomics (Yi et al.): Analyzing endometrial tissue, urine, and intrauterine brushing samples from 44 EC patients and 43 controls, this study identified significant metabolic alterations related to amino acid and nucleotide metabolism. Key changes observed in tissue were verified in urine and brushing samples, suggesting that non-invasive or minimally invasive specimen types could serve as diagnostic alternatives. This multi-omics approach highlights the potential for biomarker-based early detection without requiring surgical tissue sampling.
Two of the most clinically promising studies in this review focused on making EC diagnosis less invasive and more accessible. One used machine learning on body fluid proteins, while the other used multimodal deep learning on routine clinical data to predict cancer recurrence.
Five-protein signature in cervico-vaginal fluid (Njoku et al.): This UK-based study analyzed proteomic data from 53 symptomatic postmenopausal women with EC and 65 without EC. Using machine learning to examine protein biomarkers in cervico-vaginal fluid and blood plasma, the researchers identified a five-protein signature that detected EC with an AUC of 0.95, 91% sensitivity, and 86% specificity. Critically, the signature maintained high performance even for early-stage (stage I) cancers, achieving an AUC of 0.92. The proteins are naturally shed by endometrial tumors into the lower genital tract, meaning they can be collected through simple, non-invasive fluid sampling rather than surgical biopsy.
HECTOR for recurrence prediction (Volinsky-Fremond et al.): Published in Nature Medicine in 2024, the HECTOR model is a multimodal deep learning system that predicts distant recurrence in EC using only H&E-stained whole-slide images and tumor stage. Trained on over 2,000 patients across eight cohorts, HECTOR achieved C-indices of 0.789, 0.828, and 0.815 across internal and external test sets. It stratified patients into three risk groups with strikingly different 10-year recurrence-free survival rates: 97.0% (low risk), 77.7% (intermediate risk), and 58.1% (high risk). The model's architecture combines self-supervised learning for extracting morphological patterns with multimodal integration of staging data and image-based molecular classifications.
Why these matter together: Both studies address a central challenge in EC care: the gold-standard molecular profiling that guides treatment decisions is expensive, slow, and often unavailable in routine clinical settings. The cervico-vaginal fluid test offers a screening alternative that avoids invasive biopsy entirely, while HECTOR eliminates the need for complex molecular profiling by extracting equivalent prognostic information from standard pathology slides that every hospital already produces. Together, they represent a path toward more equitable, accessible cancer diagnostics.
Ultrasound is often the first imaging study performed when EC is suspected, particularly transvaginal ultrasound. The review covers two key studies showing how AI can extract more diagnostic information from ultrasound images than conventional interpretation alone.
Radiomics-based classification (Capasso et al., Mayo Clinic): This study retrospectively analyzed data from 302 patients who underwent ultrasound and endometrial testing between 2016 and 2022. Physicians manually segmented ultrasound images, and radiomic features (texture, shape, and intensity characteristics) were extracted from the segmented regions. Multiple machine-learning classifiers were trained and evaluated on these features. The top-performing classifier achieved an AUC-ROC of 0.90 on validation data and 0.88 on test data, with sensitivity of 0.87 and specificity of 0.86. This approach transforms standard ultrasound images into quantitative data that AI can use to distinguish cancerous from non-cancerous tissue.
Systematic review of AI in gynecological ultrasound (Moro et al.): A broader systematic review examined 50 studies on AI in ultrasound for gynecological oncology, with 5 studies specifically addressing EC. Across the reviewed research, 70.3% of evaluated AI models focused on distinguishing benign from malignant lesions, with several achieving AUCs up to 0.99. For EC specifically, AI models demonstrated AUCs in the range of 0.90 to 0.92 for predicting malignancy, and also showed strong performance for risk stratification and predicting myometrial invasion depth. An additional 10.8% of studies predicted tumor histology (benign vs. borderline vs. malignant) with AUCs up to 0.97.
Combined imaging with VGG-16 AdaBoost (Wang et al.): This study evaluated combining transvaginal ultrasound, magnetic resonance dispersion-weighted imaging (MRDWI), and multilayer spiral CT for diagnosing early-stage EC across 100 patients. A deep VGG-16 AdaBoost hybrid classifier was applied to images preprocessed with adaptive Wiener filtering and segmented with fuzzy clustering. The combined imaging group showed significantly better diagnostic accuracy, specificity, sensitivity, and inter-rater agreement (kappa coefficient) compared to conventional Doppler ultrasound alone.
MRI is widely used to assess the extent of EC, including how deeply the tumor has invaded the uterine wall (myometrial invasion) and whether it has spread to nearby lymph nodes. Several studies in this review applied deep learning to MRI images for these tasks, with varying levels of success.
CNN vs. radiologists (Urushibara et al.): This study compared convolutional neural networks with radiologists for diagnosing EC on MRI. Using data from 204 cancer patients and 184 patients with non-cancerous lesions (MRI performed between 2015 and 2020), the CNNs achieved AUCs between 0.88 and 0.95 across single and combined image sequences. This performance was on par with experienced radiologists. Adding diverse MRI sequence types to the training data provided slight improvements, suggesting that multi-sequence training can boost model performance.
YOLOv3 for myometrial invasion depth (Chen et al.): Assessing whether EC has invaded more or less than 50% of the myometrium is a critical staging decision (FIGO staging). This study used the YOLOv3 object detection algorithm to locate lesion areas on T2-weighted MR images from 530 patients, followed by a classification model to determine invasion depth. The model achieved 84.78% accuracy overall, with 86.67% precision on coronal images and 77.14% on sagittal images. When combined with radiologist interpretation, accuracy improved to 86.2%, sensitivity rose to 77.8%, and the negative predictive value reached 96.3%, meaning that a "shallow invasion" prediction was correct 96.3% of the time.
ResNet and shallow CNN for staging (Tao et al.): Three network architectures (a shallow CNN, ResNet, and an optimized network) were compared for analyzing MRI images from 80 EC patients. The study found that 90% of patients (72 of 80) were correctly identified as stage I EC based on MRI findings of endometrial thickening and uneven enhancement. Predictive modeling for lymph node metastasis (Bourgioti et al.): Using MRI data from 105 patients, a predictive algorithm identified tumor size as an independent predictor of both pelvic lymph node metastasis and myometrial invasion, achieving 78% sensitivity, 92.7% specificity, and a 90.5% positive predictive value.
Limitations of texture analysis (Bereby-Kahane et al.): Not all MRI-AI approaches succeed. Using TexRAD software on 1.5T MRI data from 73 patients, texture analysis showed poor performance for predicting high-grade tumors (AUC = 0.64) and lymphovascular space invasion (AUC = 0.59). While tumor volume and short axis measurements were significantly higher in high-grade cases, the texture features alone lacked the discriminative power needed for reliable clinical predictions. This result underscores that AI is not a guaranteed improvement and that model selection, feature engineering, and data quality all matter.
Despite the impressive performance numbers reported throughout this review, the authors emphasize several important limitations that must be addressed before AI diagnostic tools can be widely adopted in clinical practice for EC.
Data and generalizability concerns: Many of the AI models reviewed were trained and validated on retrospective, single-institution datasets. This raises serious questions about how well they would perform across different patient populations, hospitals, imaging equipment, and tissue preparation protocols. A model trained on H&E-stained slides from one laboratory may behave differently when applied to slides prepared with slightly different staining techniques elsewhere. The lack of large, multi-center prospective validation studies remains a critical gap in the field.
The "black box" problem: Many deep learning models, including the CNNs and multi-resolution networks reviewed here, operate as "black boxes" where the internal decision-making process is opaque. Clinicians are understandably reluctant to trust a diagnosis they cannot understand or verify, especially for high-stakes decisions like cancer staging. While some models (like HIENet with its attention mechanisms) offer interpretable visualizations, most lack this transparency. Making AI models explainable and auditable is essential for clinical trust and regulatory approval.
Infrastructure and implementation barriers: Deploying AI in clinical workflows requires substantial digital infrastructure, including high-quality image scanners, robust data storage, and computational resources. It also demands clinician training on how to use and interpret AI outputs, as well as standardized workflows for integrating AI predictions into existing diagnostic pathways. These requirements create significant barriers for resource-limited settings, which are often the areas that would benefit most from AI-augmented diagnostics. The cost and complexity of implementation must be weighed against the demonstrated improvements in accuracy and efficiency.
Overall conclusions from the review: Across histopathology, imaging, and multi-omics, AI has demonstrated strong potential to enhance EC diagnostic accuracy, reduce subjectivity, and speed up decision-making. AI tools offer scalable, non-invasive solutions that can complement or approximate the performance of more complex molecular techniques. The evidence supports AI's role in enabling earlier diagnosis, improved risk stratification (as shown by HECTOR), and more personalized treatment strategies. However, rigorous prospective validation, transparent model development, and thoughtful implementation strategies are needed before these tools can become part of routine EC care.