Artificial Neural Networks Predicted the Overall Survival and Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using a Pancancer Immune-Oncology Panel

Cancers 2022 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-3
Why DLBCL Needs Better Prognostic Tools, and How AI Can Help

Diffuse large B-cell lymphoma (DLBCL) is one of the most common subtypes of non-Hodgkin lymphoma, accounting for roughly 25% of all NHL cases in Western and Asian countries. It is a heterogeneous disease with diverse histological features, genetic alterations, and clinical outcomes. With standard rituximab-based chemotherapy (R-CHOP), DLBCL is curable in approximately 50% of cases. The critical clinical challenge is identifying, at the time of diagnosis, which patients will respond well to treatment and which will not.

Current prognostic tools: The International Prognostic Index (IPI), which incorporates age, LDH level, ECOG performance status, clinical stage, and number of extranodal sites, is the standard clinical risk stratification tool. Additionally, molecular subtyping based on gene expression profiling divides DLBCL into germinal center B-cell-like (GCB), activated B-cell-like (ABC), and unclassified subtypes. These subtypes carry distinct prognostic and therapeutic implications. The gold standard for molecular subtyping was originally the "lymphochip" microarray (requiring frozen tissue), but the Lymph2Cx assay on the nCounter NanoString platform now enables classification from formalin-fixed paraffin-embedded tissue (FFPET).

The pancancer immune-oncology panel: This study used the nCounter pancancer immune profiling panel, which measures 730 immune-oncology genes and 40 housekeeping genes. The panel covers B-cells, T-cells, CD4+ Th1 cells, regulatory T-lymphocytes, CD8+ cytotoxic T-lymphocytes, dendritic cells, macrophages, mast cells, neutrophils, and NK cells. This comprehensive immune coverage makes it well suited for investigating how the tumor immune microenvironment influences prognosis and molecular subtypes in DLBCL.

Artificial neural networks: The authors applied two types of neural networks: the multilayer perceptron (MLP) and the radial basis function (RBF). Both are supervised feedforward architectures with an input layer (gene expression values), one or more hidden layers, and an output layer. The MLP can model more complex nonlinear relationships, while the RBF (with a single hidden layer) is typically faster. Since neural networks function as "black-box" models, the study also employed explainable AI (XAI) techniques, combining conventional statistics and other machine learning methods to interpret and validate the neural network outputs.

TL;DR: DLBCL accounts for ~25% of NHL and is curable in ~50% of cases with R-CHOP. This study used a 730-gene pancancer immune profiling panel analyzed with multilayer perceptron and radial basis function neural networks to predict overall survival and molecular subtypes in 106 DLBCL cases from Tokai University Hospital.
Pages 3-6
Patient Cohort, Gene Expression Profiling, and Neural Network Design

Patient cohort: The study included 106 patients from Tokai University Hospital, diagnosed between 2006 and 2016 (74% from 2008 to 2016) following the 2016 WHO classification criteria. The cohort was 62.5% male with a male-to-female ratio of 1.67. Age ranged from 23 to 97 years, with 67.3% older than 60. IPI distribution was: low 27.8%, low-intermediate 30.9%, high-intermediate 14.4%, and high 18.6%. Most patients received R-CHOP (72.4%) or R-CHOP-like therapy (22.4%). By the Lymph2Cx assay, molecular subtypes were GCB 49%, ABC 29.8%, and unclassified 21.2%. EBER positivity (EBV infection) was found in 19.4% of cases.

Gene expression data: RNA was extracted from whole tissue sections of FFPE blocks containing more than 70% tumor cells and analyzed on the nCounter pancancer immune profiling panel (NanoString Technologies). Gene expression values were normalized, calibrated, and log2-transformed. The housekeeping gene normalization used the formula: log2((normData/hkGeomMeans)). All 730 immune-oncology genes served as predictor variables (covariates) for the neural networks.

MLP architecture: The database was partitioned with 70% assigned to training and 30% to testing (no holdout partition). For overall survival prediction, the hidden layer contained 6 nodes with hyperbolic tangent activation. For three-class molecular subtype prediction (GCB, ABC, Unspecified), the hidden layer had 11 nodes. For two-class subtype prediction (GCB vs. ABC+Unspecified), the hidden layer had 14 nodes. All output layers used softmax activation with cross-entropy error. Covariate rescaling was standardized. The stopping rule was set at 1 consecutive step with no decrease in error.

Sensitivity analysis and explainability: Independent variables were ranked by their normalized importance, which measures how much the model's predicted value changes across different values of the variable. This ranking identified the top 20 genes driving each prediction. To make the black-box neural network results more interpretable, the authors also performed multivariate Cox regression, gene set enrichment analysis (GSEA), logistic regression, and decision tree analyses. Synaptic weights were exported for reproducibility.

TL;DR: 106 DLBCL patients (62.5% male, 67.3% age >60, 72.4% treated with R-CHOP) were profiled with 730 immune-oncology genes. MLPs with 6 to 14 hidden nodes used hyperbolic tangent activation and softmax output. A 70/30 train/test split was used, and sensitivity analysis ranked gene importance for explainability.
Pages 7-10
The MLP Predicted Overall Survival with an AUC of 0.898

The multilayer perceptron classified patients into alive or dead survival outcomes using all 730 genes. The training set included 72 of 105 cases and the testing set included 33 cases. The network achieved 84.7% correct classification in training and 81.8% in testing, with an area under the curve (AUC) of 0.898 for both the alive and dead categories. Only 15.3% of predictions were incorrect in training.

Top predictive genes: The 10 most important genes by normalized importance were CD55 (1.000), ARG1 (0.982), SPANXB1 (0.949), CTAG1B (0.946), IFNA8 (0.853), CASP1 (0.851), IL2 (0.834), TNFSF12 (0.819), ANP32B (0.795), and CTSG (0.784). Other notable genes in the top 20 included REL and CD8A, both with known roles in lymphoma pathogenesis.

Multivariate Cox regression: To explain the neural network findings, the top 20 MLP genes were subjected to backward conditional Cox regression. The final model (Step 14) retained seven genes. Four were associated with favorable survival: ARG1, TNFSF12, REL, and NRP1 (hazard ratios 0.3 to 0.5). Three were associated with poor survival: IFNA8, CASP1 (HR = 3.8), and CTSG (hazard ratios 1.0 to 2.1). Gene set enrichment analysis (GSEA) confirmed enrichment toward the dead phenotype, corroborating the neural network's identification of high-risk genes.

Risk stratification: Using a risk-score formula based on the 7-gene or 20-gene expression profiles, patients were divided into high-risk and low-risk groups with significantly different overall survival (p < 0.001). The high-risk group showed elevated CD163 expression (a marker of M2-like tumor-associated macrophages, 1.7 vs. 0.4, p = 0.002) and higher MYD88 expression (a marker of NF-kappa-B activation, 1.2 vs. 0.9, p = 0.008), linking poor prognosis to the immunosuppressive tumor microenvironment.

TL;DR: The MLP predicted survival with AUC = 0.898 and 84.7% training accuracy. Seven key genes emerged from Cox regression: ARG1, TNFSF12, REL, and NRP1 (favorable, HR 0.3-0.5) and IFNA8, CASP1, and CTSG (poor, HR 1.0-3.8). High-risk patients had elevated CD163 (p = 0.002) and MYD88 (p = 0.008).
Pages 11-13
Seven-Gene Signature Validated Across DLBCL Subtypes and an Independent Cohort of 414 Cases

The prognostic value of the 7-gene signature was tested across clinically defined DLBCL subgroups. It maintained predictive power within IPI low/low-intermediate and high-intermediate/high strata, within EBER-negative cases, within both MYC translocation-positive and -negative cases, and within DLBCL not otherwise specified (NOS). However, the signature did not retain significance in the small EBER-positive subgroup or in the 11 cases of high-grade B-cell lymphoma with MYC and BCL2/BCL6 rearrangements, suggesting these entities may have distinct biological drivers.

Multivariate analysis with clinical variables: When the 7-gene set was combined with IPI and EBV status in a Cox regression model, the gene signature remained the strongest predictor (p < 0.001, HR = 3.6, 95% CI: 1.8-7.1), while IPI (p = 0.055, HR = 1.9) and EBER (p = 0.054) were borderline. Adding molecular subtypes to the model yielded: 7-gene set p < 0.001, HR = 2.3; EBER p = 0.036, HR = 2.3; IPI p = 0.134; and molecular subtypes p = 0.107. In the full model including high-grade B-cell lymphoma status, only the 7-gene set (p < 0.001, HR = 5.4) and EBER (p = 0.006, HR = 5.3) retained independent significance.

Neural network-based multivariate analysis: A separate MLP was built with 15 input variables (the 7 genes plus IPI, EBER, molecular subtypes, and HGBL status). This nonlinear multivariate model achieved AUC = 0.880 and 84.4% correct classification. The normalized importance ranking confirmed ARG1 (100%), REL (63.5%), and CTSG (54.2%) as the top predictors, with the clinical variables IPI (6.7%) and HGBL (8.7%) contributing far less than the gene-based features.

External validation: The risk-score formula was applied to the GSE10846 series of 414 DLBCL cases from Europe and North America. Using the 20-gene signature, the two risk groups showed significantly different survival (log rank p < 0.0001, HR = 3.6). Using the 7-gene signature alone, the separation was also highly significant (p < 0.0001, HR = 2.4). This external validation confirmed that the gene signature is reproducible across independent populations.

TL;DR: The 7-gene signature was an independent prognostic factor (HR = 5.4, p < 0.001) outperforming IPI and molecular subtypes. It validated in 414 independent DLBCL cases (p < 0.0001, HR = 2.4 for the 7-gene model and HR = 3.6 for the 20-gene model), confirming cross-population reproducibility.
Pages 14-16
Near-Perfect Prediction of DLBCL Molecular Subtypes with AUC Up to 1.0

Three-class prediction (GCB, ABC, Unspecified): The MLP achieved 98.7% correct classification in training and 81.5% in testing when predicting all three Lymph2Cx-defined subtypes. The AUC was 0.995 for GCB, 0.994 for ABC, and 0.989 for Unspecified. This exceptionally high discrimination suggests the 730 immune-oncology genes contain strong signals that distinguish the molecular origins of DLBCL subtypes.

Two-class prediction (GCB vs. ABC+Unspecified): The MLP performed even better in this binary classification, achieving 100% correct classification in training and 96.4% in testing. The AUC was a perfect 1.0 for both GCB and ABC+Unspecified categories. The hidden layer used 14 nodes with hyperbolic tangent activation. The cross-entropy error was only 0.594, and incorrect predictions dropped to less than 0.0001% in training.

Key subtype-discriminating genes: For the binary classification, the top 10 genes by normalized importance were CD37 (1.000), STAT6 (0.867), ATF2 (0.830), ROPN1 (0.819), C4B (0.814), NOTCH1 (0.805), CTAG1B (0.797), ICAM3 (0.796), CEACAM1 (0.783), and NOD2 (0.773). Among these, STAT6 and REL were associated with the GCB subtype, while CD37, GNLY, CD46, and IL17B were associated with the ABC/Unspecified phenotype. Binary logistic regression confirmed CD37 (OR = 2.9, p = 0.004), GNLY (OR = 2.7, p < 0.001), and IL17RB (OR = 1.5, p = 0.012) as positively associated with ABC+Unspecified, while STAT6 (OR = 0.1, p = 0.009) and REL (OR = 0.3, p = 0.022) were inversely associated.

GSEA findings: Gene set enrichment analysis for the molecular subtype prediction produced a distinctive sinusoidal-like enrichment plot, with some genes enriched toward the GCB phenotype and others toward the ABC+Unspecified phenotype. This bidirectional pattern reflects the biological reality that these subtypes arise from different stages of B-cell differentiation and carry distinct transcriptional programs.

TL;DR: The MLP predicted molecular subtypes with AUC = 0.99 for three-class and AUC = 1.0 for binary classification (100% training, 96.4% testing accuracy). STAT6 and REL marked the GCB subtype, while CD37 (OR = 2.9), GNLY (OR = 2.7), and IL17RB (OR = 1.5) marked the ABC+Unspecified subtype.
Pages 17-19
Multiple ML Algorithms Confirmed Findings, and MAPK3 Was Validated at the Protein Level

Comparison of machine learning techniques: Beyond the MLP, the authors tested logistic regression, discriminant analysis, support vector machines (SVM), CHAID trees, C5 trees, C&R trees, KNN algorithm, and Bayesian networks. For overall survival, logistic regression, discriminant analysis, and SVM all achieved 100% accuracy using all 730 genes. Among decision trees, CHAID achieved 97.1% accuracy using only 10 genes (RUNX1, TBK1, ATF1, CSF2, CXCL14, SMAD2, POU2F2, ADORA2A, FCGR2B, and CXCR1), and C5 achieved 96.2% with 12 genes. For molecular subtype prediction, the same trio (logistic regression, discriminant analysis, SVM) reached 100%, while C5 and CHAID trees achieved 96.2% using just 7 and 8 genes, respectively.

Restricted gene-set modeling: When only the top 20 MLP-identified genes were used as inputs, the Bayesian network was the most accurate model at 93% overall accuracy for survival prediction. This demonstrates that the MLP successfully identified the most informative genes, as a small subset could still drive high-accuracy predictions across diverse algorithms.

Radial basis function (RBF) comparison: The RBF neural network was also tested on all tasks. Its performance for survival prediction was poor (AUC = 0.628), substantially below the MLP's 0.898. However, the RBF achieved acceptable AUCs of 0.83 and 0.85 for molecular subtype predictions. The MLP's superior performance in all tasks confirmed that its ability to model more complex nonlinear relationships was essential for this dataset.

MAPK3 immunohistochemistry validation: MAPK3 (ERK1), identified as an important gene in the neural network's subtype prediction, was validated at the protein level in a separate tissue microarray of 96 DLBCL cases. Phospho-p44/42 MAPK immunohistochemistry revealed staining in 64.4% of cases (33.3% at 1+ and 31.1% at 2+). High MAPK3 expression was significantly associated with the GCB phenotype (odds ratio of non-GCB = 0.543, 95% CI: 0.3-0.96, p = 0.037). MAPK3 also correlated with LMO2, a known germinal center marker (OR = 2.8, 95% CI: 1.1-7.2, p = 0.039). Interestingly, despite showing a macrophage-like staining pattern, MAPK3 did not correlate with M2-like tumor-associated macrophage markers (CD163, CSF1R, TNFAIP8, CASP8, PD-L1, PTX3, and IL-10; all p > 0.05).

TL;DR: Logistic regression, discriminant analysis, and SVM all reached 100% accuracy with 730 genes. CHAID trees achieved 97.1% survival accuracy with just 10 genes. The RBF neural network underperformed the MLP (AUC 0.628 vs. 0.898). MAPK3 protein expression was validated in 96 cases, confirming association with GCB subtype (p = 0.037) and LMO2 (OR = 2.8, p = 0.039).
Pages 20-22
Single-Center Design, Small Sample Size, and Black-Box Interpretability Challenges

Small, single-center cohort: The primary discovery cohort comprised only 106 patients from a single institution (Tokai University Hospital). While the authors validated findings in an external cohort of 414 cases (GSE10846), the initial training and testing were performed on a relatively small dataset. With 730 gene predictors and only 106 samples, there is inherent risk of overfitting, particularly for models like logistic regression, discriminant analysis, and SVM that achieved perfect 100% accuracy on the full gene set. The 70/30 train/test split without a separate holdout partition further limits confidence in generalization estimates.

Retrospective design: All data were collected retrospectively. The patients were diagnosed over a 10-year span (2006-2016), during which treatment protocols, diagnostic criteria, and supportive care may have evolved. No prospective validation was performed, and the study did not account for potential temporal biases in treatment response or survival outcomes.

Neural network interpretability: Despite the authors' commendable use of XAI approaches (Cox regression, GSEA, decision trees, Bayesian networks), the core MLP model remains a black box. The synaptic weights, such as those reported for CD55 (-0.441, -0.204, 0.168, 0.199, -0.458, -0.733 across six hidden nodes), are not directly interpretable in biological terms. The relationship between ranked gene importance in the MLP and actual causal significance in lymphoma biology cannot be established from this study design alone.

Data availability and reproducibility: The raw gene expression data from Tokai University are not publicly available due to patient data protection policies and are only accessible upon reasonable request. This limits independent verification of the neural network models. Additionally, exact replication requires matching the same random number initialization, data order, variable order, and procedure settings, which adds practical barriers to reproducibility.

Limited subgroup analyses: The 7-gene signature did not retain prognostic significance in EBER-positive cases (only 19 of 98 cases) or in high-grade B-cell lymphoma with MYC/BCL2/BCL6 rearrangements (only 11 cases). These subgroups were likely too small for meaningful statistical power, and the signature's applicability to these biologically distinct entities remains uncertain.

TL;DR: Key limitations include a small single-center discovery cohort (106 cases, 730 genes), retrospective design over a 10-year span, risk of overfitting (100% accuracy models with 730 predictors), restricted raw data availability, and insufficient power for EBV-positive (n = 19) and HGBL (n = 11) subgroup analyses.
Pages 23-25
Toward Prospective Validation, Larger Cohorts, and Clinical Integration of AI-Driven Gene Signatures

Multi-center prospective validation: The most immediate next step would be prospective validation of the 7-gene prognostic signature in large, multi-center cohorts that include diverse populations beyond Japan and the GSE10846 European/North American series. Such studies could determine whether the signature adds clinical utility beyond the IPI and molecular subtyping in real-world treatment decisions, particularly for selecting patients who may benefit from intensified or novel therapies.

Integration with clinical risk models: While this study showed the 7-gene signature outperformed IPI in multivariate analysis (HR = 5.4 vs. IPI at 6.7% normalized importance in the MLP model), a composite score combining genomic and clinical variables could yield superior risk stratification. Future work could integrate the gene signature with IPI, cell-of-origin classification, MYC/BCL2 rearrangement status, and circulating biomarkers into a unified predictive framework, potentially using ensemble machine learning approaches.

Explainable AI in hematopathology: The authors emphasized that XAI is essential for clinical trust. Advances in model-agnostic explanation methods, such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), could provide per-patient feature attributions that are more informative than the global normalized importance rankings used here. Applying these newer XAI techniques to the MLP's predictions could help clinicians understand why a specific patient is classified as high-risk.

Deeper biological investigation: The study revealed intriguing associations between high-risk genes and the tumor microenvironment, specifically the link between the high-risk group and M2-like tumor-associated macrophages (CD163) and NF-kappa-B activation (MYD88). The MAPK3 association with GCB subtype, despite showing macrophage-like histological expression without correlating with macrophage markers, deserves further functional investigation. Understanding these biological mechanisms could open therapeutic avenues, particularly in the context of checkpoint inhibitor therapy and microenvironment-targeting strategies in DLBCL.

TL;DR: Future priorities include multi-center prospective validation of the 7-gene signature, integration with composite clinical-genomic risk models, application of newer XAI methods (SHAP, LIME) for per-patient interpretability, and functional studies of the CD163/MYD88 microenvironment links and the MAPK3-GCB association to identify potential therapeutic targets.
Citation: Carreras J, Hiraiwa S, Kikuti YY, et al.. Open Access, 2021. Available at: PMC8699516. DOI: 10.3390/cancers13246384. License: cc by.