Digital Pathology and Artificial Intelligence in Renal Cell Carcinoma Focusing on Feature Extraction: A Literature Review

Frontiers in Oncology 2024 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Page 1
Renal Cell Carcinoma, Digital Pathology, and the Interpretability Problem

Renal cell carcinoma (RCC) is a diverse group of malignant tumors originating from renal tubule epithelial cells, with rising incidence worldwide. The 2022 Fifth WHO classification divides RCC into three major subtypes: clear cell RCC (ccRCC, 70-80% of cases), papillary RCC (pRCC), and chromophobe RCC (chRCC), which together account for over 90% of all RCC diagnoses. Grading of ccRCC has shifted from the older Fuhrman classification to the WHO/ISUP system, and the Fifth WHO update introduced molecular-driven categorizations such as TFE3/TFEB rearranged RCC and ALK rearranged RCC, underscoring the growing role of molecular profiling in RCC diagnosis and prognosis.

Digital pathology (DP) converts traditional glass slides into high-resolution digital formats and, when integrated with AI, enables automated image analysis and diagnostic support. Whole-slide imaging (WSI) captures entire slides at high resolution, and a randomized controlled trial confirmed diagnostic equivalence between glass slides and WSIs across six pathologists. Pathomics is an emerging field that uses computational techniques for high-throughput feature extraction and quantification from pathology images to develop predictive models and uncover potential biomarkers.

The interpretability challenge: One of the central bottlenecks in applying AI to WSIs is the "black box" nature of deep learning models. Features automatically extracted during training are often high-dimensional and abstract, lacking biological significance. This makes it difficult for clinicians to understand the relationship between extracted features and prediction tasks. Researchers have responded by focusing on improving feature extraction steps and prediction models to enhance transparency, for instance through techniques like the Structural Priors Guided Network (SPG-Net) that incorporates prior structural knowledge into segmentation.

This review specifically addresses a gap in the literature: while previous reviews summarized AI and DP applications in RCC diagnosis and prediction, none focused on the extracted features themselves. The authors aimed to provide a comprehensive overview of feature extraction techniques that can improve model interpretability and mitigate the black box problem in RCC research.

TL;DR: RCC comprises three major subtypes (ccRCC at 70-80%, pRCC, and chRCC, totaling over 90% of cases). This review uniquely focuses on feature extraction methods in AI-based RCC analysis, addressing the interpretability gap where previous reviews overlooked which specific features drive model predictions.
Pages 2-3
Search Strategy, Screening Process, and Text Mining Analysis

The authors searched PubMed and Web of Science in January 2024 using the query: "(renal cell carcinoma) AND ((artificial intelligence) OR (machine learning) OR (deep learning))." They supplemented this with manual searches for additional relevant articles. Inclusion criteria limited results to English-language, peer-reviewed publications from 2017 to January 2024, chosen because advances in deep learning and CNNs began meaningfully impacting digital pathology around that time. Non-peer-reviewed articles, case reports, comments, and conference summaries were excluded.

Screening and selection: Duplicate removal was performed in Endnote (version 20), followed by manual screening of titles and abstracts. Literature screening and evaluation followed the PRISMA 2020 checklist. From the initial search, 1,032 records were excluded, leaving 28 full-text articles for inclusion in the review. The PRISMA flow diagram details the selection process.

Text mining: The authors used two software programs, SATI (sationline.cn) and Voyant (voyanttools.org), to read and analyze full records and cited references. Of the 28 included records, 26 were in the Web of Science core collection and were included in the text mining analysis. This analysis covered keywords, authors, institutions, and citation patterns, providing objective index statistics to help explore the breadth of research activity in this field.

While the text mining component provided quantitative bibliometric data rather than qualitative scientific evaluation of the literature, it gave the authors a useful lens for identifying research trends, key contributors, and institutional networks working on AI-driven RCC pathology.

TL;DR: Searched PubMed and Web of Science (2017 to January 2024) following PRISMA 2020 guidelines, screening down from over 1,000 records to 28 included studies. Text mining with SATI and Voyant tools analyzed 26 records for bibliometric trends in AI-based RCC research.
Pages 3-4
Supervised, Weakly Supervised, and Unsupervised Learning Approaches in RCC

Supervised learning uses fully labeled data and is the most common approach in RCC studies. CNNs are the dominant architecture, consisting of multiple convolutional filter layers that automatically extract image features. However, supervised methods require either manual pixel-level annotation of gigapixel WSIs or large slide-level labeled datasets, which is labor-intensive and error-prone. Chen et al. proposed Supervised Multimodal Fusion, combining CNNs with graph CNNs for cell graph features, Graph Convolutional Networks (GCNs) for survival prediction, and gating-based attention mechanisms with Kronecker product fusion of genomic and histology data.

Weakly supervised learning uses image-level (WSI-level) labels rather than pixel-level annotations, offering coarser but more scalable labeling. Graph Neural Networks (GNNs) with attention and integrated gradients (IG) have been used for model interpretation in this setting. Lee et al. proposed the TEA graph method, a GNN-based approach that represents WSIs through super patches and analyzes spatial interactions of histopathological features. The CLAM method (clustering-constrained attention-based multiple instance learning) uses attention-based learning to identify diagnostically valuable subregions while applying instance-level clustering to refine the feature space, eliminating the need for pixel-level annotations or ROI extraction.

Unsupervised and self-supervised learning: Faust et al. used an unsupervised pretrained CNN as a feature extractor, generating groupings through dimensionality reduction and clustering. Chen et al. introduced a self-supervised image search using a Vector Quantized Variational Autoencoder (VQVAE) trained on a large dataset. DiPalma et al. proposed Resolution-Based Distillation, which transfers knowledge from a teacher model to a student model trained at lower resolution while minimizing performance loss.

Both weakly supervised and unsupervised methods face greater interpretability challenges compared to supervised approaches with comprehensive annotations. However, there is growing interest in these methods because they can leverage rich unlabeled or partially labeled data, significantly reducing the annotation burden that currently limits the scalability of AI in digital pathology.

TL;DR: The review catalogues four learning paradigms in RCC: supervised (CNNs with full labels), weakly supervised (CLAM, GNNs with slide-level labels), semi-supervised (TEA graph for spatial context), and self-supervised (VQVAE, Resolution-Based Distillation). Supervised methods achieve better interpretability but require costly pixel-level annotation.
Pages 5-6
AI-Driven Classification of RCC Subtypes Achieves AUC Above 0.93

Supervised CNN architectures dominate RCC subtype classification. The standard pipeline begins with pathologists manually annotating regions of interest (ROIs) at the pixel level in WSIs, followed by segmenting these into patches for classification with data augmentation to address small dataset sizes. Across the reviewed studies, deep learning models consistently achieved strong performance in distinguishing RCC subtypes.

Key results from individual studies: Fenstermaker et al. achieved 0.979 accuracy classifying ccRCC, pRCC, and chRCC using a CNN. Abdeltawab et al. built a pyramidal model with three CNNs, reaching 0.957 accuracy for classifying fat, renal parenchyma, ccRCC, and clear cell papillary RCC. Zhu et al. used ResNet-18 to classify five categories (ccRCC, pRCC, chRCC, renal oncocytoma, and normal tissue) and achieved an AUC of 0.98 with a mean F1-score of 0.92, using Gradient-weighted Class Activation Mapping (Grad-CAM) for interpretability. Tabibu et al. employed pretrained ResNet-18 and ResNet-34 with DAG-SVM to handle data imbalance, reaching 0.941 accuracy.

Transfer learning approaches: Marostica et al. integrated VGG16, InceptionV3, and ResNet50 with multiomics and clinical data, achieving an AUC of 0.953 for the best model in classifying benign regions, chRCC, ccRCC, and pRCC. Yasukochi et al. used a deep convolutional neural network (DCNN) trained via transfer learning to predict clear versus eosinophilic phenotypes, achieving an AUC of 0.929 on the independent validation set. Chen et al. extracted 346 quantitative image features (shapes, sizes, textures, pixel intensity distributions, and proximity relations) and achieved an AUC of 0.970 in the test cohort and 0.814 in the external validation cohort.

A notable observation across these studies is that many achieved high classification accuracy but did not explicitly describe the specific features driving their models. Studies that did provide feature descriptions, such as extracting tumor morphology, cell nucleus characteristics, and cytoplasm features, offered greater interpretability and clinical relevance.

TL;DR: AI models achieved AUC values above 0.93 for RCC subtype classification: 0.98 (ResNet-18), 0.970 (LASSO with 346 features), 0.953 (VGG16/InceptionV3/ResNet50 fusion), and 0.929 (DCNN for eosinophilic phenotype). Accuracy ranged from 0.941 to 0.979 across studies.
Pages 6-7
Automated Grading of ccRCC: AUC of 0.89 to 0.96 Using Nuclear Features

The WHO/ISUP grading system for ccRCC relies on morphological characteristics of the nucleus, specifically the presence of nucleoli at different magnifications. Automated grading pipelines built on machine learning follow a structured sequence: identifying ROIs, segmenting nuclei, computing numerical descriptors of nuclei features, selecting features, and classifying into grade categories. Most studies used fivefold or tenfold cross-validation and image augmentation strategies to mitigate the impact of small datasets.

Machine learning approaches: Kruk et al. extracted numerical descriptors of nuclei including texture, morphometry, color, and histogram descriptions, using ensemble SVM classifiers with multiple feature selection methods (Fisher selection, genetic algorithm, random forest, correlation feature selection, and fast correlation-based filter). They achieved 0.904 accuracy for Fuhrman grading. Tian et al. extracted 72 nuclei 2D histological features (9 morphological, 15 intensity-based, and 48 texture-based features) and reached an AUC of 0.96 with 0.89 accuracy for classifying ccRCC as low or high grade. Holdbrook et al. focused on the location of prominent nucleoli using histogram of polar gradient (HPG), enhanced HPG, exclusive component analysis features, and raw pixel intensity values, achieving correlation with an existing multigene assay-based scoring system (R = 0.59).

Deep learning approaches: Fenstermaker et al. achieved 0.984 accuracy for Fuhrman grading using a CNN, though specific features were not mentioned. Chanchal et al. proposed RCCGNet with a novel shared channel residual (SCR) block that shares information between different layers and strengthens local semantic features, achieving an F1-score of 0.8906 and accuracy of 0.9014 with visualization of nuclear morphology, nucleolar prominence, and nuclear membrane irregularities using Grad-CAM. Khoshdeli et al. compared a shallow CNN with GoogLeNet, achieving F1-scores of 0.92 and 0.99 respectively for classifying six tissue categories.

The grading studies consistently relied on nuclear features as the primary discriminative signal, which aligns directly with the clinical WHO/ISUP grading criteria. This alignment between extracted features and established pathological standards is a strength for interpretability, as pathologists can directly verify the relevance of the features the model uses.

TL;DR: Automated ccRCC grading achieved AUC 0.96, accuracy up to 0.984 (CNN) and 0.904 (ML ensembles). Key features include 72 nuclei descriptors (morphological, intensity, and texture), nuclear morphology visualization via Grad-CAM, and GoogLeNet reaching F1-score of 0.99 for tissue classification.
Pages 7-8
Predicting Molecular Aberrations from Histology: AUC 0.70 to 0.89

Molecular-driven categorizations of RCC have become increasingly important with the Fifth WHO classification, but traditional morphological diagnosis from microscopic pathology sections often fails to capture tumor molecular heterogeneity. Gene expression analysis remains costly and technically challenging for routine clinical use. AI offers an alternative by using deep learning algorithms to infer molecular tumor subtypes from conventional histology, exploring correlations between morphological features and molecular characteristics in tissue images.

EMT subtype classification: Chen et al. developed an epithelial-mesenchymal transition (EMT) gene signature to classify ccRCC into epithelial (Epi) and mesenchymal (Mes) subtypes using transfer learning on InceptionV3 with SGD optimization. They identified distinct morphological patterns: the Epi subtype showed looser arrangement with big cell gaps, absent or inconspicuous nucleoli, and pink granular eosinophilic cytoplasm, while the Mes subtype was densely packed with arborizing vasculature, large multinucleate cells with empty cytoplasm, and abundant immune infiltration. The model achieved an AUC of 0.84, accuracy of 0.749, specificity of 0.722, and sensitivity of 0.753, though the sample size was relatively small and training used a single dataset.

Gene mutation prediction: Acosta et al. developed a deep learning model using VGG19 to identify mutations in BAP1, PBRM1, and SETD2 in ccRCC, extracting 36 nuclear features quantifying nuclear size, shape, color, texture, and Haralick texture features. The model achieved AUC values of 0.77 to 0.84 for BAP1 prediction. Cheng et al. designed 52 differential image features related to nucleus size and roundness to distinguish TFE3 Xp11.2 translocation-associated RCC from other types, achieving an AUC of 0.886. Marostica et al. used a multimodal DCNN integrating WSIs, multiomics, and clinical data to predict genomic aberrations (KRAS, CAN, WT1, EGFR, VHL, and others), achieving area under the precision-recall curve (AUPR) greater than 0.7.

This field remains in its early stages, and more data and in-depth studies are needed to validate the accuracy and reliability of AI's association with RCC molecular characteristics. The ability to predict molecular subtypes directly from H&E-stained slides could significantly reduce the need for expensive molecular testing, but current models have not yet been widely validated in multicenter external cohorts.

TL;DR: AI predicted molecular aberrations from histology with AUC 0.70-0.89: BAP1 mutations (AUC 0.77-0.84, VGG19), TFE3 translocation (AUC 0.886, 52 nucleus features), and EMT subtypes (AUC 0.84, InceptionV3). This could reduce reliance on expensive molecular testing, but multicenter validation is lacking.
Pages 8-9
Survival and Recurrence Prediction: AUC Over 0.78 with Multimodal Integration

Prognostic prediction in RCC typically follows a three-step workflow: extracting features from WSIs, applying machine learning classifiers for risk scoring, and predicting survival outcomes using regression models such as Cox regression, Lasso-regularized Cox regression, and logistic regression. The reviewed studies demonstrate that combining pathomics features with clinical and genomic data consistently outperforms single-modality approaches.

Overall survival (OS) prediction: Wessels et al. established both univariate and multivariate logistic regression models based on CNN predictions combined with clinicopathological parameters, achieving AUROC of 0.88 in the validation group. Their CNN prediction model showed a hazard ratio of 3.69 (95% CI: 2.60-5.23, P < 0.01) in univariable Cox regression, with features focused on nucleus and nucleolus morphology accompanied by inflammatory reactions. Marostica et al. used CNNs coupled with multitask logistic regression and Grad-CAM visualization, distinguishing longer-term from shorter-term survivors (log-rank test P = 0.02). Gui et al. predicted ccRCC OS with AUC values of 0.787, 0.780, and 0.823 at 3, 5, and 7 years respectively.

Disease-free survival (DFS) and recurrence: Chen et al. developed a machine learning-based pathomics signature (MLPS) using QuPath digital pathology software for cell and nuclear segmentation. They extracted 43 pathological features, primarily nucleus parameters (circularity, min caliper) and intensity parameters (Hematoxylin OD mean, Eosin OD), achieving AUC values of 0.895, 0.90, 0.885, and 0.859 at 1, 3, 5, and 10 years of DFS prediction. Gui et al. combined WSI scores with six single nucleotide polymorphisms (SNPs) and the Leibovich score to construct a multimodal recurrence scoring system that outperformed single-modality scores for predicting recurrence-free interval.

Topological and microenvironment features: Cheng et al. extracted nucleus features, topological features from the renal tumor microenvironment, and eigengenes from functional genomics data to predict ccRCC prognosis. Their Lasso-Cox risk index was an independent prognostic factor (P = 2.31e-4, hazard ratio = 2.26, log-rank test P = 0.014). In a separate study on pRCC, the same team used topological features including histogram of co-occurrence of nucleus patterns and bag of edge histogram (BOEH) features, achieving an AUC of 0.78. They found that topological features were superior to traditional clinical features and cell morphology in predicting patient outcomes.

TL;DR: Prognosis prediction achieved AUC over 0.78 across studies: AUROC 0.88 for OS (CNN, hazard ratio 3.69), DFS prediction AUC 0.859-0.90 at 1-10 years (MLPS with 43 features), and ccRCC recurrence AUC 0.787-0.823 at 3-7 years. Multimodal integration consistently beat single-modality models.
Pages 9-10
Computational Constraints, Validation Gaps, and the Path Toward Multimodal AI

Computing and data limitations: A fundamental tension exists between limited computing resources and the demands of robust AI models for large datasets and high processing power. Improving computing infrastructure and upgrading scanning equipment remain ongoing challenges for computer engineering. While large databases have partially addressed the problem of insufficient data at single research centers, extracting meaningful information from huge databases remains difficult, and the curse of dimensionality in high-throughput sequencing data leads to serious overfitting problems.

Validation and quality concerns: Data quality across databases cannot be fully guaranteed, and the lack of multicenter external validation remains a critical weakness affecting research reliability. Most studies reviewed were trained and tested on data from single institutions or used publicly available datasets like TCGA without independent external validation. The review itself has limitations: the authors only included English-language articles from PubMed and Web of Science, potentially missing relevant work published in other languages or indexed in other databases. The total number of collected studies (28) was relatively small.

Clinical integration challenges: Beyond technical hurdles, applying AI to clinical pathology involves ethical considerations, workflow optimization, acceptance by pathologists, and clear delineation of responsibilities when AI contributes to clinical decisions. Current research on RCC primarily focuses on the three major histological subtypes, while effectively identifying rarer molecular aberrations and accurately predicting prognosis remain key challenges. Given the rarity of cases with specific molecular subtypes, multicenter collaborative studies are particularly necessary.

Future directions: The authors highlight integration of multimodal data from pathology, radiomics, genomics, and proteomics as a promising frontier. Model fusion techniques combining pathomics, radiomics, genomics, and clinical data through early, late, or hybrid fusion strategies have already shown improved performance. The application of advanced AI technologies such as Large Language Models is expected to contribute significantly. The authors' own follow-up plan is to develop systematic research to find more unified and scientifically rigorous feature extraction methods to break through the interpretability bottleneck.

TL;DR: Key limitations include lack of multicenter external validation, small dataset sizes, overfitting from high-dimensional data, and only 28 studies meeting inclusion criteria. Future directions point toward multimodal fusion of pathomics, radiomics, genomics, and proteomics data, plus Large Language Model integration and standardized feature extraction methods.
Citation: Li MY, Pan Y, Lv Y, Ma H, Sun PL, Gao HW.. Open Access, 2025. Available at: PMC11802434. DOI: 10.3389/fonc.2025.1516264. License: cc by.