Breast cancer remains one of the most prevalent and complex diseases in oncology, with diverse subtypes that respond differently to treatment protocols. Accurate diagnosis, prognosis, and prediction of treatment outcomes are essential for effective management, yet traditional microscopic examination is limited by well-documented intra- and inter-observer variability. In the era of artificial intelligence (AI), machine learning (ML) and deep learning (DL) algorithms are enhancing the ability of histopathologists to make more accurate and reproducible diagnoses across the full diagnostic pipeline.
The growing complexity of breast cancer reporting: Modern breast cancer pathology reports now include far more than basic tumor type and grade. The emergence of biomarkers evaluable through immunohistochemistry (IHC), the inclusion of tumor infiltrating lymphocyte (TIL) percentage, and the assessment of treatment effects in synoptic reports have rendered the reporting process increasingly detailed and labor-intensive. Evaluation of these parameters is relatively subjective, creating a strong need for standardized methods and objective tools that ensure consistency and reliability across institutions and pathologists.
What AI brings to the table: AI encompasses ML and DL techniques that provide robust tools for analyzing complex datasets and uncovering patterns that may be imperceptible to human observers. In breast cancer care, these applications range from automating histopathological analysis to predicting treatment outcomes. By addressing the need for reproducibility and leveraging the vast datasets generated from histological slides, AI can augment the capabilities of histopathologists and oncologists, leading to enhanced accuracy and efficiency in breast cancer management.
Scope of this review: This systematic review covers the current state of AI in breast pathological analysis across its diagnostic, prognostic, and predictive aspects. The authors examine AI techniques for cancer detection and classification, histological grading, biomarker quantification, lymph node metastasis detection, and molecular prediction, while addressing the clinical implications and challenges that must be resolved for broader clinical implementation.
Accurate classification of breast cancer is critical because each subtype responds differently to treatment protocols. Misclassification can lead to suboptimal treatment decisions and compromised patient outcomes. AI models have demonstrated remarkable success in this area. Cruz-Roa et al. and Fondon et al. showed AI's potential in detecting invasive ductal carcinoma within surrounding breast parenchyma. Han et al. further illustrated how AI algorithms can distinguish between ductal, lobular, mucinous, and papillary morphology, as well as benign proliferative lesions of both stroma and epithelium, achieving 93.2% accuracy across multi-class histopathology images.
Distinguishing early-stage lesions: Sandbank et al. developed an algorithm capable of distinguishing between low- and high-grade in situ ductal and lobular carcinoma, differentiating in situ from invasive carcinoma, and separating atypical ductal hyperplasia from ductal carcinoma in situ. This addresses one of the most critical challenges in histopathology, as these early-stage lesions carry different prognostic implications and require different treatment pathways. The algorithm achieved an impressive AUC of 0.99 on a dataset of 436 patients, reducing the likelihood of both overtreatment and undertreatment.
AI for histological grading: Cancer grading is widely recognized as one of the most important prognostic factors. However, intra- and inter-observer variability makes histological grading far from perfect. AI algorithms capable of stratifying tumors based on features beyond traditional morphology offer a promising solution. DL models have enhanced accuracy, reproducibility, and efficiency across all three grading subsections: mitotic figure count, tubule formation, and nuclear grading. These improvements directly translate to more accurate patient prognoses and better-informed treatment decisions.
Tubule formation and mitosis counting: Romo-Bucheli et al. demonstrated the potential of DL classifiers in identifying tubule formation in estrogen receptor-positive breast cancer whole slide images, with findings showing strong correlation between the tubule formation indicator and genetic risk categories at 89% accuracy. For mitosis counting, one of the most time-consuming tasks for pathologists at all expertise levels, Balkenhol et al. reported a correlation coefficient of R = 0.810 (95% CI: 0.76-0.86) for DL-based automated counting, and Pantanowitz et al. showed significant improvements in accuracy, precision, and sensitivity in tumor proliferation rate assessment.
Nuclear grading involves assessing nuclear size, shape, and pleomorphism, and it is notoriously subjective due to the variations in human interpretation. Distinguishing nuclear grade 1 from 2, or grade 2 from 3, is particularly challenging because the differences can be subtle. As a result, grade 2 has become something of a "safety net" for many pathologists when they are uncertain. This subjectivity introduces variability into the diagnostic process, directly impacting both grading accuracy and prognostic evaluations.
Stratifying Nottingham Histological Grade 2: A significant advance lies in using DL models to refine the stratification of intermediate Nottingham Histological Grade (NHG) 2 cases, which historically pose challenges due to their variability and intermediate prognostic value. Wang et al. analyzed whole-slide histopathology images with AI models trained on over 1,000 patients and achieved an AUC of 0.91 (95% CI: 0.88-0.93) in identifying subtle morphological patterns that differentiate NHG 2 tumors into lower- and higher-risk groups, mirroring the characteristics of NHG 1 and NHG 3. This approach offers prognostic insights comparable to molecular assays but is faster, more cost-effective, and uses routine Hematoxylin and Eosin (H&E) slides.
Matching pathologist performance: Mantrala et al. confirmed that AI could match human performance in grading nuclear pleomorphism, achieving 65.9% accuracy in concordance with experienced pathologists. While this number may appear modest, it reflects the inherent difficulty of the task, as even expert pathologists frequently disagree on nuclear grade assignments. Their work showed that AI could successfully detect key morphological attributes of the nucleus, provide survival stratification across various patient cohorts, and mitigate inconsistencies among pathologists.
A complement, not a replacement: These AI tools are not designed to replace the human eye but rather to enhance the histopathologist's ability to detect subtle changes that can significantly impact the course of treatment. Deep CNN models like the one by Zewdie et al. achieved 96.75% accuracy in classifying breast cancer types and grades on a dataset of 82 samples. This integration supports more informed clinical decision-making and facilitates personalized treatment strategies, ultimately improving patient care and outcomes.
Accurate and objective assessment of biomarkers plays a vital role in breast cancer diagnosis, prognosis prediction, and treatment planning. The success of targeted therapies and endocrine therapy relies heavily on the precise quantification of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). Traditional evaluation methods are subjective and prone to errors. AI algorithms have evolved from basic IHC evaluation tasks, such as counting positive cells in manually selected regions, to fully integrated systems that handle tumor area detection, cell detection, cell quantification, and staining intensity assessment.
ER, PR, and HER2 results: Multiple groups have developed algorithms with performance comparable to expert histopathologists. Abele et al. reported agreement rates of 87.6% for Ki-67 and 89.4% for ER/PR across 204 patients using CNN-based analysis. Shafi et al. validated automated ER status determination with a Pearson correlation of r = 0.72 on 97 patients. For HER2, Hartage et al. validated their algorithm on 612 patients, showing high correlation with fluorescent in situ hybridization (FISH) results with a Cohen's kappa of 0.71. Li et al. demonstrated that digital image analysis of HER2 IHC could predict response to anti-HER2 neoadjuvant chemotherapy, revealing a significant correlation with pathological complete response rates (odds ratio = 136.08, p = 0.002).
Ki-67 proliferation assessment: Ki-67 is a well-established prognostic marker for breast cancer, but traditional assessment involves manual counting, which is time-consuming and error-prone. Boden et al. demonstrated that AI-based Ki-67 assessment achieved a high correlation with manual counts (Cohen's kappa = 0.84) on 200 patients using deep CNN-based object detection. Dy et al. further improved accuracy with an error rate of just 0.6% across 420 patients. Unlike manual counting, AI provides comprehensive analysis of the entire slide rather than selected fields, offering a more objective and robust approach.
PD-L1 scoring: AI-assisted programmed death-ligand 1 (PD-L1) scoring has garnered significant attention for its potential to standardize and enhance the accuracy of IHC-based evaluations. While its application is better established in non-small cell lung cancer, where dual-scale categorization-based DL methods have shown high concordance rates with pathologists, initial multi-institutional studies in breast cancer show promise. AI-assisted models have boosted concordance from moderate to excellent levels, aiding in overcoming the subjectivity of human evaluation when scoring tumor-infiltrating immune cells, which is key in determining patient eligibility for immunotherapy.
AI has transformed how tumor infiltrating lymphocytes (TILs) and the broader tumor microenvironment (TME) are assessed, particularly in breast cancer. TILs are key immune response markers that play a critical role in the prognosis of HER2-positive and triple-negative breast cancer (TNBC). Traditionally, TIL evaluation was subjective and prone to variability, but AI offers a standardized and objective approach that provides a consistent evaluation of the immune response within the TME.
Quantifying spatial organization: AI-powered methods can quantify the spatial organization and interactions of TILs with other immune and tumor cells, which is vital when stratifying patients for immunotherapy. Studies have shown that AI-driven analysis of H&E and multiplex IHC images enhances the ability to predict treatment responses, such as pathological complete response (pCR) to chemotherapy, especially in HER2-positive and TNBC subtypes. AI models developed for this purpose have demonstrated higher accuracy in predicting pCR compared to manual assessments by histopathologists, underscoring the potential of AI to guide personalized treatment strategies.
Mapping the immune landscape: AI plays a critical role in advancing our understanding of the TME by identifying organizations and interactions that are difficult for human observers to discern. This includes quantifying the presence and behavior of immune cells like TILs, as well as mapping their interactions with tumor cells. This deeper analysis provides a more comprehensive understanding of the immune landscape, which is essential for optimizing treatment plans and enhancing the precision of immunotherapies.
Clinical significance: The ability to objectively and reproducibly assess TILs has direct implications for patient care. In TNBC, higher TIL levels are associated with better prognosis and improved response to chemotherapy. By removing the subjectivity from TIL assessment, AI enables more consistent patient stratification for clinical trials and treatment decisions, ensuring that patients who would benefit most from immunotherapy or chemotherapy combinations are correctly identified.
The accurate detection of lymph node metastasis is a key factor in staging and treatment planning for breast cancer. For small occult tumor foci in lymph nodes, traditional pathological assessment can be tricky and often requires additional immunohistochemistry (IHC) studies, consuming both time and resources. AI offers promising solutions for more precise detection, potentially eliminating the need for additional IHC steps in many cases.
Performance of DL algorithms: Liu et al. developed a DL algorithm for identifying metastatic cancer cells in sentinel lymph node biopsies. The algorithm achieved impressive performance in detecting metastases, even for small foci, and demonstrated robustness when faced with common tissue sample variations such as staining differences and tissue preparation artifacts. The algorithm demonstrated high sensitivity with low false positives, significantly reducing missed metastases compared to traditional methods. Steiner et al. evaluated the impact of DL assistance on histopathologists' evaluations, finding that the AI model significantly improved diagnostic accuracy for challenging micrometastases while reducing both errors and review time.
Macro- and micrometastases: AI models trained on large datasets of H&E-stained slides demonstrated high sensitivity and specificity in detecting lymph node metastases, significantly reducing false negatives. Importantly, these models accurately identified both macro- and micrometastases, leading to more precise diagnoses. Bandi et al. explored continual learning strategies for cancer-independent detection of lymph node metastases across breast, colon, and head-and-neck cancers, demonstrating high accuracy without requiring cancer-specific retraining.
Radiomics integration: Radiomics presents a complementary AI-driven approach for improving axillary lymph node staging in breast cancer, leveraging medical imaging to create predictive models with high sensitivity, specificity, and efficiency. Despite its potential to replace invasive procedures, limited validation and retrospective study designs highlight the need for robust clinical trials. When combined with AI-powered pathology tools, radiomics can integrate seamlessly into digital pathology workflows, offering a scalable solution for precise diagnosis and treatment planning.
The application of AI now extends beyond traditional histopathological analysis into molecular-level prediction. Farahmand et al. used AI to predict HER2 status directly from H&E sections with high accuracy, which is vital for determining eligibility for targeted therapies such as trastuzumab. Wang et al. demonstrated AI's ability to predict BRCA mutation status from histological images, indicating its potential for genetic risk assessment and personalized medicine. These advances hold promise for identifying patients carrying BRCA1 and BRCA2 mutations who are at high risk for developing hereditary breast cancer.
Molecular subtyping and recurrence risk: Several studies have shown promise in detecting molecular subtypes, particularly in distinguishing the basal-like subtype from luminal-A. Whitney et al. demonstrated that computer-extracted nuclear morphology features from routine H&E-stained images could accurately predict Oncotype DX risk categories for ER-positive breast cancer patients, achieving an area under the curve of up to 0.83 in distinguishing between low and high recurrence risk groups. This method offers a faster, cost-effective, and nondestructive alternative to molecular assays, particularly valuable in resource-limited settings where access to molecular testing may be constrained.
Homologous recombination deficiency (HRD): HRD status is crucial for determining optimal treatment, especially regarding platinum-based chemotherapies and PARP inhibitors. Traditional molecular methods to identify HRD status are time-consuming, costly, and require specialized equipment. AI-powered tools now use H&E slides to predict HRD status directly, analyzing tissue samples with high accuracy and often surpassing traditional methods. By automating the detection process, AI enables faster, more scalable, and more accessible HRD testing, potentially expanding the pool of patients who can benefit from targeted therapies.
PIK3CA/AKT pathway alterations: DL models have shown progress in detecting actionable genetic alterations directly from H&E-stained slides. In triple-negative breast cancer, DL models have proven highly effective in predicting PIK3CA mutations. These models use convolutional neural networks to analyze thousands of image files from histopathology slides, recognizing patterns linked to genetic alterations and enabling real-time molecular prediction. This positions AI as a valuable tool for advancing pathology practices and reducing the costs and turnaround times associated with traditional molecular testing.
Algorithmic bias and generalizability: AI models trained on limited datasets may not generalize well to diverse populations, resulting in disparities in diagnostic accuracy for underrepresented demographic groups. Ensuring diverse, representative, and well-annotated datasets is vital to avoid bias and deliver equitable AI-driven diagnostics. Validation in diverse clinical settings is equally important to ensure that AI tools perform consistently across different laboratories, imaging systems, and staining techniques. Improperly calibrated AI tools or over-reliance on AI without adequate human oversight could lead to misdiagnoses, particularly in borderline or equivocal cases.
Training and workforce development: Integrating AI into pathology workflows necessitates a strategic approach that includes specialized training for histopathologists and laboratory personnel. Practitioners must become proficient in using AI-assisted tools, interpreting AI-generated insights, and understanding the limitations of these systems. Institutions must invest in educational programs and workshops to ensure a smooth transition into AI-enhanced diagnostics. Without this investment, the gap between AI's potential and its practical clinical benefit will persist.
Cost and infrastructure: While AI has the potential to improve efficiency and accuracy, the initial investment in infrastructure, software licensing, and continuous updates can be substantial. Pathology laboratories will need to conduct cost-benefit analyses to determine the financial viability of AI integration and explore funding or reimbursement models to support implementation. AI tools must also be compatible with various digital pathology platforms, whole slide imaging systems, and laboratory information management systems to facilitate seamless data exchange.
Regulatory compliance and the human-AI partnership: AI-driven diagnostic tools must meet strict guidelines set by regulatory bodies such as the FDA and CE marking authorities to ensure patient safety, reliability, and ethical use. The future of breast cancer pathology lies in a synergistic relationship between AI and pathologists, where algorithms operate as an adjunct to the pathologist rather than as a final decision maker. Human-in-the-loop systems offer an augmented diagnostic assistant or second reader. Pathologists, with their clinical expertise and nuanced understanding of patient care, remain essential for guiding AI model development, interpreting insights, and ensuring ethical application in clinical practice.