Quantitative analysis of artificial intelligence on liver cancer: A bibliometric analysis

PMC 2022 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Mapping Two Decades of AI Research in Liver Cancer

This bibliometric analysis set out to quantitatively map the entire landscape of artificial intelligence research applied to liver cancer, covering publications from 2003 through early 2022. The authors searched the Web of Science Core Collection (WoSCC) database and, after systematic keyword searches and manual screening, collected 1,724 papers for analysis, including 1,547 original articles and 177 reviews. The goal was to identify research progress, hotspots, and emerging trends in AI for liver cancer by examining publication patterns, country and institutional contributions, author networks, journal distributions, and keyword clusters.

Clinical context: Liver cancer ranks 7th in global incidence and 4th in mortality. The 5-year survival rate ranges from only 5% to 30%, making it a serious global health challenge. Accurate screening of early-stage and high-risk patients, combined with rational treatment decisions for advanced disease, remain critical unmet needs. AI, built on both traditional machine learning (support vector machines, random forests) and modern deep learning (convolutional neural networks), has been increasingly applied to address these gaps since the earliest studies appeared in 2003.

Historical arc: The field began in 2003, when Hussain et al. constructed a predictive system using Fisher's linear classifier to predict early recurrence in hepatocellular carcinoma (HCC) patients. Early work focused on simple gene and molecular data analyses. As medical imaging became standardized, AI research shifted toward extracting high-throughput features from CT, ultrasound, and MRI scans to build intelligent diagnostic models. Deep learning based on CNNs has been the primary driver of progress since approximately 2012.

The authors used three specialized bibliometric tools: VOSviewer (version 1.6.18) for cooperation network visualization, CiteSpace (version 6.1.R1) for dual-map journal analysis and citation burst detection, and the online SRplot platform for in-depth keyword analysis. Five researchers independently participated in searching, downloading, and analyzing publications to ensure accuracy and reproducibility.

TL;DR: A bibliometric study analyzed 1,724 papers (1,547 articles, 177 reviews) on AI in liver cancer from the WoSCC database, spanning 2003 to 2022. Liver cancer's 5-30% five-year survival rate drives urgency. The field began in 2003, accelerated sharply from 2017, and is dominated by CNN-based deep learning on imaging data.
Pages 2-3
Search Strategy, Screening Criteria, and Analytical Tools

The authors used the Web of Science Core Collection as their sole data source and constructed a detailed Boolean search query combining liver cancer terms (liver, hepatic, cancer, tumor, carcinoma, HCC) with AI-related terms (artificial intelligence, deep learning, convolutional neural networks, machine learning, computer-aided, Bayesian networks, supervised learning, unsupervised clustering, ensemble learning, and others). The retrieval was carried out on January 18, 2022. This initial search yielded 2,111 papers.

Manual screening process: Papers were restricted to those written in English, focused on the field of liver cancer, and involving AI technologies. All 2,111 papers were categorized as relevant, uncertain, or excluded. Papers marked as uncertain were discussed by three senior authors (XH, LL, and MX) to determine inclusion. Unlike a systematic review, the bibliometric analysis required only abstract screening, with full-text review performed only when necessary. After excluding 193 papers that did not meet criteria, 1,724 papers were retained for final analysis.

Visualization and analysis: VOSviewer was used to construct cooperation network maps between countries, institutions, and authors, as well as co-citation networks. CiteSpace generated dual maps showing the relationship between citing and cited journal fields and produced citation burst rankings to identify references with sudden spikes in citation frequency. The online SRplot platform was used for in-depth keyword analysis across multiple dimensions: disease type, data modality, clinical goals, and AI methods. Microsoft Excel 2019 was used for basic tabulation of targeted variables.

The study analyzed the top 10 rankings for countries, institutions, authors, journals, and cited references. Total link strength (TLS), a measure of collaboration intensity between nodes in a bibliometric network, was used throughout the cooperation analyses to quantify the strength of relationships.

TL;DR: WoSCC was the sole database. A Boolean search on January 18, 2022 yielded 2,111 papers, which were manually screened down to 1,724. VOSviewer mapped cooperation networks, CiteSpace analyzed journal relationships and citation bursts, and SRplot performed multi-dimensional keyword analyses across disease type, data modality, clinical goals, and AI methods.
Pages 3-4
Global Growth Patterns: China Leads in Volume, the USA Leads in Impact

Research on AI in liver cancer began in 2003 and increased steadily each year, with a sharp acceleration from 2017 onward. Publications from 2017 through early 2022 account for almost 70% of all 1,724 papers. As of the search date, the collected papers had been cited 27,049 times in total, with an overall H-index of 67 and an average of 15.69 citations per paper. This growth trajectory reflects the broader explosion of deep learning applications in medicine following the 2012 ImageNet breakthrough and the subsequent adoption of CNNs in medical imaging.

Country contributions: A total of 75 countries/regions published related articles. China led in raw output with 608 publications (35.33%), followed by the USA with 470 (27.31%), India with 129 (7.49%), Germany with 122 (7.09%), and Japan with 118 (6.86%). Italy (105), England (75), South Korea (75), Canada (74), and France (73) rounded out the top 10. However, the USA ranked first in H-index (49), total citations (10,228), and average citations per paper (21.76). China, despite having the most publications, had a lower H-index (38) and fewer total citations (7,298), suggesting that many Chinese publications have limited global impact.

Collaboration patterns: The top five total link strengths for international collaboration were associated with the USA, China, India, Italy, and Canada. China's leading publication count aligns with its high incidence of liver cancer, which creates both clinical urgency and abundant patient data. The gap between China's volume and the USA's citation impact may reflect differences in study design quality, journal selection, and research focus.

France, despite ranking 10th in publication count (73 papers), had the second-highest average citations per paper (21.11), closely trailing the USA. Italy similarly demonstrated high per-paper impact (20.68 average citations). This suggests that smaller but more focused research programs in European countries are producing highly cited, influential work.

TL;DR: Publications surged from 2017, with nearly 70% of all 1,724 papers published in the last five years. China led with 608 papers (35.33%) but the USA dominated in impact: H-index 49, 10,228 total citations, 21.76 average citations per paper. Total citations across all papers reached 27,049 with an overall H-index of 67.
Pages 4-5
Who Is Driving AI in Liver Cancer Research

Top institutions: Over 2,000 institutions participated in AI liver cancer research. The League of European Research Universities led with 109 publications (H-index 25, 2,746 total citations, 25.49 average per paper). Sun Yat Sen University ranked second with 62 articles, and Zhejiang University third with 58. Among American institutions, the University of Texas System (57 papers, H-index 17), Harvard University (42 papers), and Stanford University (40 papers, H-index 19) were the most productive. Sun Yat Sen University had the highest institutional total link strength (TLS = 187), followed by Zhejiang University (TLS = 173) and the Mayo Clinic (TLS = 124), indicating strong collaborative networks.

Collaboration insight: The authors found that cooperation between medical and engineering institutions, such as the partnership between Sun Yat Sen University and the Chinese Academy of Sciences (the 2nd and 5th most productive institutions), produces stronger research output. Similarly, Fudan University and Shanghai Jiaotong University showed strong collaborative ties. This cross-disciplinary pattern confirms that the integration of medicine with engineering is critical for advancing AI in liver cancer.

Top authors: A total of 9,916 authors and 37,290 co-cited authors were included. The three most productive authors were Jasjit S. Suri (USA, 18 papers, 456 total citations), Luca Saba (Italy, 17 papers, 371 citations), and Udyavara Rajendra Acharya (Singapore, 15 papers, 519 citations). Acharya had the highest co-citation TLS at 2,274, indicating his work is among the most frequently referenced in the field. Among co-cited authors, 78 had more than 45 citations each.

Top journals: All related studies were published across 585 journals. Frontiers in Oncology led with 50 papers (2.90%), followed by European Radiology (45 papers, 2.61%) and Scientific Reports (41 papers, 2.38%). Four of the top 10 journals had a Journal Citation Ranking (JCR) of Q1, including Scientific Reports, Medical Physics, Cancers, and Computers in Biology and Medicine. The dual-map analysis revealed four main citation paths, with citing papers concentrated in molecular biology/immunology, medicine/clinical, and neurology/sports/ophthalmology, while cited papers clustered in molecular biology/genetics, health/nursing/medicine, and dermatology/dentistry/surgery.

TL;DR: The League of European Research Universities led with 109 papers and 2,746 citations. Jasjit S. Suri was the most prolific author (18 papers). Frontiers in Oncology was the top journal (50 papers). A total of 9,916 authors published across 585 journals. Medicine-engineering collaboration between institutions proved essential for high-impact research.
Pages 5-7
What AI in Liver Cancer Actually Studies: Diseases, Data Types, and Clinical Goals

Disease focus: The in-depth keyword analysis revealed that most articles focused on liver cancer broadly, with hepatocellular carcinoma (HCC) being the most widely studied single disease type. The distribution of publications by disease category was: liver cancer (30.76%), HCC specifically (33.52%), cirrhosis (11.10%), fatty liver disease (9.45%), liver fibrosis (8.39%), liver transplantation (4.41%), and hepatectomy (2.37%). The prominence of chronic liver diseases like cirrhosis and fibrosis reflects their role as precursors to liver cancer, making them important targets for early detection and prevention.

Data modalities: Computed tomography (CT) was the most commonly used data type at 46.79% of studies, followed by ultrasound (23.58%), magnetic resonance imaging (MRI, 22.83%), and biopsy (6.79%). CT's dominance stems from its fast acquisition speed, cost-effectiveness, and established role as the basis for clinical treatment strategies in liver cancer guidelines. MRI, while highly informative, is more expensive and less accessible. Ultrasound, despite being widely used clinically for screening (recommended every six months for high-risk patients), was less common in AI research because image acquisition depends heavily on the operator's technique and machine model, and image resolution is relatively low.

Cross-analysis of disease and data type: CT was primarily used for liver cancer and HCC research. Ultrasound was the first choice for fatty liver disease studies, given its high sensitivity for diffuse fatty liver, convenience, and safety. Biopsy was most often used in liver fibrosis research, as histopathological examination of liver biopsy remains the gold standard for fibrosis diagnosis. Although ultrasound elastography and magnetic resonance elastography (MRE) are effective non-invasive fibrosis assessment tools, a unified MRE liver elasticity value for fibrosis across different etiologies had not yet been established.

Clinical goals: Approximately three-quarters of all papers focused on diagnosis, classification, segmentation, or prediction, with considerably less attention to prognosis. The diagnosis and differential diagnosis of liver cancer on medical imaging were the dominant research priorities. However, the clinical diagnosis of liver cancer is a comprehensive process, especially because focal liver lesions can be atypical. Dysplastic nodules in cirrhotic livers, for example, have strong malignant potential and are difficult to distinguish from early liver cancer on imaging alone, often requiring evaluation of clinical biomarkers like alpha-fetoprotein and abnormal prothrombin.

AI methods: Convolutional neural networks (CNNs) were the predominant technical approach, with a minority of studies exclusively using more traditional methods like support vector machines and decision trees. Most segmentation tasks employed U-Net architectures, which have demonstrated strong performance in medical image segmentation, particularly when training data is limited. Traditional machine learning methods (SVM, random forest) were concentrated in earlier research, while deep learning methods dominated from 2012 onward.

TL;DR: HCC was the most studied disease (33.52%). CT dominated as the data source (46.79%), followed by ultrasound (23.58%) and MRI (22.83%). Three-quarters of papers addressed diagnosis, classification, segmentation, or prediction. CNNs were the primary AI method, with U-Net favored for segmentation tasks.
Pages 7-8
Citation Bursts Reveal Shifting Research Frontiers Over Time

CiteSpace identified the top 25 references with the strongest citation bursts, which signal papers that experienced sudden spikes in citation frequency and therefore mark emerging research hotspots. The explosion of citations in this field began in 2003 and a large cluster of co-citation references were concentrated in the period from 2015 to 2019, confirming that this was the period of most intense activity and foundational work in AI for liver cancer.

Key reference milestones: Early foundational papers included work on gene expression profiling for HCC recurrence prediction (2003) and hepatitis B-related metastatic HCC prediction using supervised machine learning (2003). The period from 2016 to 2018 saw bursts from landmark deep learning papers, including Shin et al.'s 2016 work on CNN architectures and transfer learning for computer-aided detection, Kermany et al.'s 2018 Cell paper on image-based deep learning for medical diagnosis, and Chaudhary et al.'s 2018 deep learning-based multi-omics integration for liver cancer survival prediction. These papers collectively established the methodological foundations that most subsequent studies built upon.

Journal field relationships: The dual-map analysis of citing and cited journals revealed that papers in this field draw on knowledge from molecular biology and genetics, health and nursing sciences, and surgery, while the citing (output) literature concentrates in molecular biology/immunology, clinical medicine, and neurology-related fields. This cross-field pattern underscores that AI in liver cancer is inherently interdisciplinary, bridging computer science, radiology, oncology, and molecular biology.

The citation burst analysis also highlights that treatment-related and prognosis-related research received relatively fewer strong bursts compared to diagnostic work, reflecting the field's concentration on imaging-based diagnosis and the relative scarcity of studies addressing treatment decision-making with AI.

TL;DR: Citation bursts clustered between 2015 and 2019, marking the peak of foundational work. Landmark papers on CNNs (Shin, 2016), image-based deep learning (Kermany, 2018), and multi-omics survival prediction (Chaudhary, 2018) drove the field. Diagnostic research generated the strongest bursts, while treatment and prognosis studies lagged behind.
Pages 8-9
Multi-Type Data Fusion and Treatment Decision-Making Remain Underexplored

Despite rapid growth in AI liver cancer research, the bibliometric analysis identified several significant gaps. First, very few studies combined multiple types of data, such as genetic data, molecular data, imaging data, and clinical indicators, into integrated models. Most AI research operated within a single data modality. However, clinical diagnosis of liver cancer is inherently a multi-modal process that requires synthesizing imaging findings with biomarkers like alpha-fetoprotein and abnormal prothrombin, especially for distinguishing early-stage HCC from dysplastic nodules in cirrhotic livers.

Treatment and prognosis underserved: Studies on treatment planning and prognosis were notably underrepresented relative to diagnostic work. Most treatment-related studies focused narrowly on survival prediction after a specific surgical method, such as radiofrequency ablation or transarterial chemoembolization (TACE). Modern liver cancer therapy increasingly integrates multiple neoadjuvant and adjuvant strategies, which have dramatically improved survival for advanced HCC patients. However, AI-assisted approaches for dividing patient populations, identifying novel biomarkers, and making precision treatment decisions tailored to individual patients had not yet emerged in the literature.

Ultrasound and contrast-enhanced ultrasound: Although ultrasound is widely used clinically as a screening tool for high-risk patients (recommended every six months), it was underrepresented in AI research compared to CT. Contrast-enhanced ultrasound has now been included as a recommended imaging modality for liver cancer diagnosis in clinical guidelines and is also used in the development and evaluation of ultrasound-guided radiofrequency ablation. The authors suggest that future AI research should pay more attention to ultrasound's clinical role.

Pathological and genetic data: Few studies used pathological, genetic, or other non-imaging clinical data. The high cost of genetic examination and the difficulty of implementing AI in multi-omics research are likely contributing factors. However, as genomic sequencing becomes more affordable and multi-omics integration methods mature, this is expected to become a growing area of investigation.

TL;DR: Multi-type data fusion (combining imaging, genetics, molecular, and clinical data) is rare but critically needed. Treatment decision-making and prognosis studies are underrepresented compared to diagnosis (which accounts for three-quarters of papers). Contrast-enhanced ultrasound and multi-omics approaches are identified as key areas for future growth.
Pages 9-10
Database Constraints and the Path Toward Multimodal AI

Single-database limitation: The study relied exclusively on the Web of Science Core Collection and included only English-language articles. This means publications indexed solely in Scopus, PubMed, or non-English databases were missed, potentially leading to the omission of relevant studies, particularly from countries where liver cancer research may be published in local languages. The authors acknowledge this may affect the comprehensiveness of the analysis.

Keyword screening imperfection: Despite a comprehensive Boolean search strategy, keyword screening may not have captured every relevant study. AI terminology evolves rapidly, and some papers using novel or unconventional technical terms may have been excluded. Additionally, because bibliometric analysis relies primarily on abstract screening rather than full-text review, some relevant studies with non-standard abstracts could have been missed.

Snapshot in time: The data retrieval was performed on January 18, 2022, meaning any publications after that date were not included. Given that the field was growing at approximately 70% of cumulative output since 2017, the research landscape is likely to have changed substantially since the data cutoff. Citation counts and H-indices are also dynamic metrics that shift over time, so the rankings presented reflect a specific moment rather than a permanent state.

Future directions: The authors conclude that multi-type data fusion analysis, combining imaging, genetic, molecular, and clinical data into integrated AI models, represents the most promising and underdeveloped research direction. The development of multimodal treatment plans for liver cancer, leveraging AI to synthesize diverse data sources for personalized therapeutic decision-making, could become the major trend of future research. Imaging will remain an indispensable tool, but the field must move beyond single-modality diagnosis toward comprehensive, multi-data-source approaches that address the full clinical pathway from screening through treatment planning and prognosis.

TL;DR: Key limitations include WoSCC-only and English-only coverage, imperfect keyword screening, and a January 2022 data cutoff. The most important future direction is multi-type data fusion, integrating imaging, genetics, molecular data, and clinical indicators into comprehensive AI models for personalized liver cancer diagnosis and treatment.
Citation: Xiong M, Xu Y, Zhao Y, et al.. Open Access, 2023. Available at: PMC9978515. DOI: 10.3389/fonc.2023.990306. License: cc by.