This 2020 review from the World Journal of Gastroenterology examines how artificial intelligence, specifically machine learning (ML) and deep learning (DL), is being applied across the full clinical pathway of hepatocellular carcinoma (HCC). HCC is the most common primary liver cancer, and the American Cancer Society estimated 42,810 new cases of liver and intrahepatic bile duct cancer in the United States for 2020, with 30,160 deaths. The sheer volume of imaging, clinical, and histological data generated across HCC diagnosis and treatment creates a natural use case for AI-driven analysis.
Key AI concepts: The authors distinguish between ML and DL. ML is a branch of AI that learns from a pre-labeled dataset, building algorithms to recognize patterns and produce predictive models. Techniques such as support vector machines (SVM), artificial neural networks (ANNs), and classification and regression trees all fall under this umbrella. DL is a more advanced subset of ML that uses multi-layered neural network architectures, most notably convolutional neural networks (CNNs), which have proven especially effective for analyzing radiological images.
HCC has a unique clinical characteristic that makes it especially amenable to AI: it is one of the few solid tumors that can be diagnosed radiologically, without histological confirmation, when imaging findings show hyperenhancement at the arterial phase and washout at portal or late phases in a cirrhotic patient. This means that image analysis, precisely the domain where CNNs excel, sits at the center of HCC diagnosis. The review covers AI applications in ultrasound, CT, MRI, PET, histopathology, treatment response prediction, and survival estimation.
The authors note that most existing studies are retrospective and suffer from database bias, and that prospective, multicenter validation is needed before AI tools can be integrated into routine clinical practice. They also raise cost-effectiveness, regulatory approval, and ethical considerations as barriers to real-world deployment.
Abdominal ultrasound is the frontline screening tool for HCC, recommended by clinical practice guidelines for regular surveillance of patients with hepatic cirrhosis. However, ultrasound interpretation is operator-dependent and subject to significant interobserver variability. Several groups have applied AI to improve diagnostic yield from ultrasound imaging.
Liver disease staging: Bharti et al. proposed an ANN model to classify four stages of liver disease from ultrasound images: normal liver, chronic liver disease, cirrhosis, and HCC. The model achieved a classification accuracy of 96.6%. Liu et al. took a different approach, designing an algorithm focused on the morphology of the liver capsule to detect cirrhosis, even in early stages before conventional findings (nodular liver outline, enlarged porta, splenomegaly) become obvious. Their model achieved an area under the curve (AUC) of 0.968 for cirrhosis detection.
Lesion characterization: Schmauch et al. designed a DL system to detect and classify space-occupying liver lesions as benign or malignant. After supervised training on a database of 367 images paired with radiological reports, the algorithm detected lesions with a mean receiver operating characteristic (ROC) of 0.93 and characterized them with an ROC of 0.916. This system, if validated, could substantially augment the diagnostic capability of standard B-mode ultrasound.
Contrast-enhanced ultrasound (C-US): Guo et al. demonstrated that DL applied to liver lesion behavior observed during three C-US phases (arterial, portal, and late) improved the accuracy, sensitivity, and specificity of lesion characterization beyond conventional visual assessment. This multi-phase analysis approach mirrors how radiologists evaluate dynamic imaging, but with the potential for more consistent and quantitative results.
When ultrasound identifies a suspicious liver lesion, dynamic contrast-enhanced CT or MRI is the next step for precise characterization. Liver nodules that show classic HCC features (arterial hyperenhancement, portal/late-phase washout) in a cirrhotic patient can be diagnosed without biopsy. But many nodules exhibit indeterminate behavior, requiring either biopsy or close follow-up. AI aims to reduce this diagnostic ambiguity.
Indeterminate nodules on CT: Mokrane et al. retrospectively analyzed 178 cirrhotic patients with liver nodules that the Liver Reporting and Data System (LI-RADS) criteria could not definitively classify, necessitating biopsy. Of those biopsied, 77% proved malignant. Using DL to classify nodules as HCC or non-HCC achieved an AUC of 0.70. While modest, this suggests a role for AI in triaging indeterminate lesions. Yasaka et al. trained an ANN on over 55,000 image sets to classify liver masses on contrast-enhanced CT into five categories: classic HCC, other malignancies (cholangiocarcinoma, hepatocholangiocarcinoma, metastasis), indeterminate/dysplastic nodules, hemangiomas, and cysts. The system achieved high accuracy, particularly for distinguishing malignant from benign lesions.
Tumor recurrence and segmentation: Vivanti et al. described an automated detection method for tumor recurrence on follow-up CT, based on initial tumor appearance, CT behavior, and tumor load quantification, achieving an accuracy of 86% for identifying true recurrences. Li et al. proposed a CNN for liver tumor segmentation on CT images, achieving 82.67% +/- 1.43% accuracy, outperforming traditional segmentation techniques and supporting more precise treatment planning.
MRI applications: Hamm et al. developed and validated a DL system based on CNN for classifying MRI liver lesions, reporting 92% accuracy, 92% sensitivity, 98% specificity, and an average computation time of just 5.6 milliseconds. Jansen et al. built an automated classification system incorporating MRI sequences and patient risk factors, cataloguing lesions as adenoma, cyst, hemangioma, HCC, or metastasis. Their sensitivity/specificity values were: adenoma 0.80/0.78, cyst 0.93/0.93, hemangioma 0.84/0.82, HCC 0.73/0.56, and metastasis 0.62/0.77. Zhang et al. also reported promising results training a CNN on MRI in 20 patients for liver tissue classification.
PET imaging: Preis et al. evaluated the yield of 18F-FDG PET/CT (fluorine-18 fluorodeoxyglucose positron emission tomography/computed tomography) using a neural network to analyze liver uptake of 18F combined with patient demographics and laboratory data. The model achieved high sensitivity and specificity for detecting liver malignancy that was not identified visually by radiologists. While this study primarily targeted metastatic liver disease, where 18F-FDG PET/CT has greater clinical utility than for primary HCC, it demonstrated that AI could serve as a complementary tool for radiologists interpreting PET scans.
Histopathological classification: The histological differentiation of liver tumors is critical for treatment planning and prognosis, but can be challenging even for expert pathologists. Kiani et al. prospectively evaluated whether a DL assistant improved pathologists' ability to distinguish HCC from cholangiocarcinoma. The study assessed 11 pathologists and found that the AI tool did not change their mean diagnostic accuracy. This is a notable negative result, suggesting that simply overlaying AI predictions onto existing pathologist workflows does not automatically improve performance.
By contrast, Liao et al. demonstrated that a deep CNN trained on histopathological images could perform automated diagnosis of HCC, distinguishing healthy tissue from tumor tissue and identifying certain biological predictors from the images. This approach focuses on full automation rather than decision support, which may represent a more effective paradigm for AI in digital pathology. The divergent outcomes between the Kiani and Liao studies highlight an important distinction: AI as a standalone diagnostic engine versus AI as an overlay on human judgment, with each approach suited to different clinical scenarios.
Early tumor recurrence after surgical resection of HCC is associated with poor prognosis, making preoperative risk stratification essential. AI-based models have been developed to predict two key outcomes: the presence of vascular microinvasion (VMI) before surgery, and post-resection survival. VMI is an independent predictive factor for recurrence, but standard radiological techniques cannot directly diagnose it preoperatively.
VMI prediction with radiomics: Multiple groups have built radiomic signatures to predict VMI status. Xu et al. achieved an AUC of 0.90 for VMI prediction using contrast-enhanced CT radiomics in 495 patients. Ma et al. reported an AUC of 0.73 (157 patients) using a similar CT-based approach. Zhou et al. analyzed contrast-enhanced MRI in 46 patients, achieving an AUC of 0.918, sensitivity of 92%, and specificity of 66%. Dong et al. took a different route, using grayscale ultrasound-based radiomic algorithms to predict VMI in 322 patients, achieving an AUC of 0.73 with a sensitivity of 91.9%. This ultrasound-based approach is notable because it avoids radiological exposure and is less costly than CT or MRI-based methods.
Recurrence prediction: Ji et al. created predictive models for recurrence after surgical resection using radiomic analysis of contrast-enhanced CT images from 470 patients across multiple institutions, achieving a C-index of 0.633 to 0.699. When clinical data was incorporated alongside imaging features, the model supported personalized risk stratification for individual HCC management.
Survival prediction: Saillard et al. drew up a predictive model of survival after resection using DL on digitalized histological slides from 194 patients, attaining a C-index of 0.78. Schoenberg et al. conducted a prospective study of 180 patients and built a predictive model analyzing 26 preoperative routine clinical variables, also obtaining a predictive value of 0.78. These two studies converge on the same performance level through very different data inputs (histology vs. clinical variables), suggesting that the C-index of approximately 0.78 may represent a practical ceiling for current approaches.
Transcatheter arterial chemoembolization (TACE) is the standard treatment for intermediate-stage (BCLC stage B) HCC. Selecting patients who will actually benefit from TACE is critical for avoiding unnecessary procedures and their associated side effects. AI models have been developed to predict TACE response using various imaging modalities and, in some cases, genomic data.
CT-based prediction: Morshid et al. built a fully automated ML algorithm combining quantitative CT image features with pretreatment clinical data, achieving a prediction accuracy of 74.2% when BCLC stage and image features were used together, compared to lower accuracy from BCLC staging alone. Peng et al. validated a residual CNN to predict TACE response using CT images from 789 patients across three hospitals, achieving an accuracy of 84.3% and an AUC of 0.97 for predicting complete response. This is one of the largest and strongest results in the field.
Contrast-enhanced ultrasound and MRI: Liu et al. constructed a DL radiomics-based model using quantitative analysis of C-US cine recordings from 130 patients, achieving an AUC of 0.93 (95% CI: 0.80-0.98) for predicting TACE response. Abajian et al. studied 36 patients who underwent MRI before TACE, developing a predictive model with 78% accuracy, 62.5% sensitivity, and 82% specificity. Additionally, Mahringer-Kunz et al. built an ANN using the parameters from three conventional prediction scores (ART, ABCR, and SNACOR) to predict one-year survival after TACE in 282 patients, achieving an AUC of 0.77, 78% sensitivity, and 81% specificity, outperforming the individual conventional scores.
Genomic approaches and RFA: Ziv et al. explored genetic mutation analysis using SVM to predict tumor response after TACE, though this was a small retrospective study of only 17 patients with a prediction accuracy of 70%. For radiofrequency ablation (RFA), Liang et al. built a predictive model of HCC recurrence based on SVM in 83 patients, achieving an AUC of 0.69, sensitivity of 67%, and specificity of 86%. Notably, this was one of the few prospective studies in the entire review.
Beyond treatment-specific outcomes, AI has been applied to predict overall survival in HCC patients independent of any particular therapy. Dong et al. leveraged emerging evidence on the relationship between abnormalities in DNA methylation and HCC to build a survival prediction model. Using SVM to analyze DNA methylation data from 377 HCC samples, they constructed three risk categories to predict overall survival and achieved a mean 10-fold cross-validation score of 0.95.
This result is notable for several reasons. First, the performance metric (0.95 cross-validation score) is exceptionally high compared to other survival prediction models in the review, which typically achieved C-indices in the 0.70-0.78 range. Second, the approach moves beyond imaging-based features into molecular and epigenetic data, suggesting that genomic-level information may carry stronger prognostic signals for HCC than radiological or clinical variables alone.
However, the cross-validation score should be interpreted cautiously. A 10-fold cross-validation within a single dataset does not provide the same level of confidence as external validation on an independent cohort. Over-fitting is a real concern, particularly with high-dimensional methylation data where the number of features can vastly exceed the number of samples. No external validation was reported, so this result requires independent confirmation before it can be considered clinically actionable.
Retrospective bias: Nearly all studies reviewed were retrospective in design. Only one study (Liang et al., RFA recurrence prediction, 83 patients) was prospective. Retrospective studies carry inherent selection bias, and the databases used may not represent the diversity of real-world patient populations. The authors specifically call out the risk that biased datasets can affect the accuracy and interpretability of AI models, limiting their acceptance in clinical practice.
Small sample sizes and single-center design: Many studies relied on small cohorts, sometimes as few as 17 to 46 patients. Single-center data compounds the generalizability problem, as imaging protocols, patient demographics, and disease prevalence can differ substantially between institutions. The strongest study in the review (Peng et al., TACE prediction with CNN) used 789 patients across three hospitals, which illustrates the scale and multi-center design needed for credible AI model development.
Unresolved clinical questions: The review identifies several areas where AI could have high impact but remains understudied. These include the characterization of indeterminate hepatic lesions (where the best DL model achieved only AUC 0.70), the differential diagnosis between HCC and cholangiocarcinoma (where AI assistance did not improve pathologist accuracy), and the analysis of HCC behavior in cirrhotic versus non-cirrhotic patients. The differentiation of primary liver tumors from metastatic lesions, and prediction of response to percutaneous therapies, also represent open challenges.
Toward clinical integration: The authors emphasize that larger comparative studies are needed, specifically trials that measure the performance of medical professionals with AI support against professionals without it. Beyond accuracy, cost-effectiveness analysis, regulatory pathway development, and ethical frameworks for AI-assisted decision-making must be addressed. Health-care professionals also need formal training to understand both the strengths and limitations of AI before it is incorporated into daily liver cancer management.
The conclusion is balanced: AI represents one of the most relevant advances in medicine, with clear utility for processing and analyzing the enormous volume of HCC-related data. But AI is here to support human intelligence, not replace it, and medical protocols must remain rigorously transparent. The gap between promising retrospective results and validated clinical tools remains substantial.