Diffuse large B-cell lymphoma (DLBCL) is the most common type of B-cell lymphoma, encompassing a diverse and incompletely characterized group of clinico-pathological entities. A subgroup of 5 to 15% of DLBCL cases harbors a MYC oncogene rearrangement. When MYC rearrangement co-occurs with BCL2 and/or BCL6 rearrangement, the disease is classified as high-grade B-cell lymphoma (commonly referred to as "double-hit" or "triple-hit" lymphoma), which carries a significantly worse prognosis under standard R-CHOP chemotherapy and may require alternative treatment regimens such as dose-adjusted EPOCH-R.
Clinical necessity of genetic testing: Current diagnostic guidelines require genetic testing for MYC rearrangement in all DLBCL patients. If MYC is positive, additional testing for BCL2 and BCL6 rearrangements follows. These tests, typically performed using fluorescence in situ hybridization (FISH), are expensive, time-consuming, and not universally available. The authors hypothesized that a deep learning algorithm could predict MYC rearrangement directly from standard hematoxylin and eosin (H&E)-stained tissue slides, potentially eliminating the need for molecular testing in the majority of predicted-negative cases.
Core premise: Many pathological classifications rest on the principle that genetic changes in tumor cells are reflected in aberrant transcription, altered protein expression, and often characteristic morphological features. Several morphologic variants of DLBCL are recognized, though their clinical relevance remains under investigation. This study tested whether a trained computer algorithm could detect the subtle morphological signatures of MYC rearrangement that may be invisible to the human eye.
The authors assembled a cohort of 245 patients diagnosed with DLBCL from 11 hospitals in the Netherlands. All H&E glass slides and MYC FISH test results were collected, with slides cut and stained at Radboud University Medical Center, where FISH was performed and interpreted between 2015 and 2019. Three hospitals contributed the most cases: Hospital A (Radboud), Hospital B (contributing 23% of cases), and Hospital C (Rijnstate Hospital, contributing 16%).
Morphological classification: Each case was assigned to one of four morphological categories where possible: (1) high-grade morphology, including blastoid and Burkitt-like patterns; (2) centroblastic, including large centrocytic, lobated, and elongated variants; (3) immunoblastic, defined as containing at least 10% immunoblasts; and (4) anaplastic, including cases with very large, polymorphic, or Reed-Sternberg-like cells. The presence of a strong inflammatory component (histiocytes, eosinophils, or small lymphocytes) and extensive fibrosis were also recorded. In 31 cases, adequate morphological classification was not possible due to crush artifacts or poor fixation.
Molecular subtyping: The germinal center B-cell (GC) or activated B-cell (non-GC) subtype was recorded based on expression of CD10, BCL6, and MUM-1 using the Hans algorithm. EBV status was determined by in situ hybridization for EBER. An additional external validation set of 42 H&E slides from Rijnstate Hospital, with an equal proportion of MYC-rearranged and non-rearranged cases (21 each), was included to test generalization across different tissue processing and staining protocols.
The internal set of 245 whole-slide images (WSIs) was randomly divided into three subsets: a training set of 140 WSIs (31 MYC-positive, 109 MYC-negative), a tuning set of 31 WSIs (9 MYC-positive, 22 MYC-negative), and an internal validation set of 74 WSIs (20 MYC-positive, 54 MYC-negative). All slides were digitized using a Pannoramic 250 Flash II scanner (3DHistech, Hungary) at a pixel size of 0.24 micrometers, corresponding to 20x objective magnification.
Annotation process: For the training and tuning sets, CD20 immunohistochemically stained slides from the departmental archive were used to define tumor areas. Medical students, trained and supervised by a pathologist, digitally annotated tumor regions and artifacts on the slides. This dual-stain annotation approach helped ensure that the algorithm learned from genuine tumor tissue rather than background or artifact regions.
Algorithm architecture: The pipeline combined deep learning with classical machine learning. A U-Net neural network was trained on small patches extracted from annotated WSIs. For each pixel in a slide, the U-Net output a probability for the presence of MYC rearrangement, generating a translocation likelihood map across the entire slide. A separate deep learning model was applied as a preprocessing step to eliminate artifacts such as tissue folds, ink marks, and staining inconsistencies. Finally, a Random Forest (RF) classifier aggregated the pixel-level predictions into a binary whole-slide classification of MYC-positive or MYC-negative.
Evaluation design: Performance was evaluated as a binary whole-slide classification task using receiver-operating characteristic (ROC) analysis on both the internal validation set (74 cases) and the external validation set (42 cases). This two-stage evaluation tested whether the model could generalize across tissue processed and stained at different hospitals.
The algorithm successfully identified MYC-positive DLBCL areas within whole-slide images of tissue biopsies and resections. Across both validation sets combined, the overall sensitivity for detecting MYC rearrangement reached 0.93, meaning the model correctly flagged 93% of truly MYC-rearranged cases. The specificity was 0.52, meaning 48% of MYC-negative cases were incorrectly flagged as positive (false positives).
Internal vs. external performance: The internal validation set yielded a sensitivity of 0.90 and specificity of 0.52, while the external validation set achieved a sensitivity of 0.95 and specificity of 0.53. The slightly higher sensitivity on external data is notable because those slides were processed and stained at a different hospital (Rijnstate), suggesting the model generalized well despite variability in tissue fixation, staining, and processing across the 11 contributing institutions.
False negatives: Only 3 false negative cases were observed across the combined validation and test sets of 116 cases. All three were lymph node biopsies from three different hospitals. The low false-negative rate is clinically important because missing a MYC-rearranged case could lead to inappropriate treatment selection. The false-positive rate ranged from 17% to 33% across the three hospitals with the most cases (A, B, and C), with a total false-positive rate of 31% (36 out of 116 cases).
The authors analyzed morphological and molecular characteristics across MYC-positive cases, false-positive cases, and true MYC-negative cases to understand what the algorithm was detecting. Among the 41 MYC-positive cases, the distribution of morphological subtypes was: high-grade 24%, centroblastic 46%, immunoblastic 12%, and anaplastic 5%. In comparison, false-positive cases showed a different profile: high-grade 6%, centroblastic 64%, immunoblastic 19%, and anaplastic 19%.
Germinal center phenotype enrichment: Approximately 85% of MYC-positive cases exhibited a germinal center (GC) phenotype, compared to 50% of false-positive cases and 48% of MYC-negative cases. This enrichment of GC phenotype in the MYC-positive group is expected, likely driven by the presence of "double-hit lymphoma" cases that carry additional BCL6 or BCL2 rearrangements alongside MYC. The tissue type (lymph node vs. extranodal) did not clearly influence the false-positive rate, with 42% and 38% false-positive rates respectively.
EBV status: EBV status was not comprehensively available for all cases, but among the 5 known EBV-positive DLBCL cases, 2 were false positives and 1 was MYC-positive. Due to the small number of EBV-positive cases, no definitive conclusions could be drawn about the relationship between EBV status and algorithm performance. The higher proportion of high-grade morphology in MYC-positive cases compared to MYC-negative cases (24% vs. 7%) aligns with known biology, as MYC rearrangement is associated with more aggressive tumor morphology.
The practical value of the algorithm lies in its use as a prescreening tool rather than a definitive diagnostic. In the proposed workflow, the algorithm would analyze H&E slides first, and only cases predicted to be MYC-positive would be referred for confirmatory FISH testing. Cases predicted as MYC-negative would skip the genetic test entirely. Applied to the validation cohorts, this approach would have sent only 66% of cases (41 MYC-positive plus 36 false-positive cases out of 116 total) for FISH confirmation, saving approximately 34% of genetic tests.
Comparison to immunohistochemistry prescreening: An alternative prescreening approach uses immunohistochemistry (IHC) for c-myc protein expression. Published data show that IHC-based prescreening achieves a lower sensitivity of 0.88 compared to the algorithm's 0.93, with a comparable specificity of 0.52. The algorithm therefore outperforms IHC-based prescreening on the critical metric of sensitivity, reducing the risk of missed MYC-rearranged cases that would otherwise go untreated with appropriate therapy.
Cost and speed advantages: The algorithm operates on standard H&E-stained slides that are already prepared as part of routine diagnostic workup, requiring no additional staining or reagents. Once a slide is digitized, analysis can be completed rapidly, compared to the multi-day turnaround time for FISH testing. In resource-limited settings where FISH may not be available, this prescreening approach could help prioritize which cases most urgently need molecular testing.
Sample size constraints: The total cohort of 287 cases (245 internal, 42 external) is relatively small for a deep learning study. With only 60 MYC-positive cases in the training set and validation sets combined, the model had limited exposure to the full spectrum of MYC-rearranged morphology. The class imbalance, with MYC-positive cases representing roughly 25% of the cohort, further compounds this challenge. Larger, more balanced datasets would likely improve both sensitivity and specificity.
Moderate specificity: The specificity of approximately 0.52 means that nearly half of MYC-negative cases were falsely flagged as positive. While this is acceptable for a prescreening tool (where confirmatory FISH testing follows), it limits the cost savings and could create bottlenecks in laboratories that are already resource-constrained. The false-positive rate of 31% across the combined validation sets means that a substantial number of unnecessary FISH tests would still be ordered.
Incomplete EBV and molecular data: EBV status was not available for all cases, preventing a thorough analysis of EBV's effect on algorithm performance. Similarly, the study did not explore whether the algorithm's predictions correlated with specific molecular subtypes beyond the GC/non-GC distinction. In 31 cases, morphological classification was impossible due to tissue quality issues (crush artifacts, poor fixation), highlighting the real-world variability in pathology specimens that any clinical-grade algorithm must handle.
Single-scanner digitization: All slides were digitized using a single scanner model (Pannoramic 250 Flash II at 20x magnification). Scanner-to-scanner variability in color reproduction, resolution, and compression can affect deep learning model performance. Validation on slides digitized with different scanners would be necessary before broader clinical deployment.
This study represents the first proof-of-principle that a conventional H&E slide of DLBCL contains morphologic information sufficient to predict the presence of genetic changes. The authors highlight that larger studies will be necessary to improve both sensitivity and specificity, and to investigate whether DLBCL with different molecular backgrounds, as defined by recent genomic classification efforts, can also be detected from morphology alone.
Beyond MYC detection: The ultimate clinical ambition is to train algorithms that can predict a DLBCL patient's response to a specific treatment regimen without any knowledge of the tumor's molecular background. If H&E morphology alone proves insufficient for this task, combining morphological features with clinical data (patient age, stage, performance status) or other biological markers could create more powerful multimodal prediction models.
Scaling and validation needs: Achieving clinical-grade performance will require training on substantially larger and more diverse datasets, including cases from multiple countries and scanner platforms. Prospective validation studies, where the algorithm is tested in real-time clinical workflows alongside standard FISH testing, would be needed to confirm that prescreening does not compromise diagnostic accuracy in practice. Integration with digital pathology platforms already being adopted in many hospitals could accelerate clinical translation.