CNN Bladder Wall Segmentation in CT Urography

European Radiology 2019 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-3
Why Bladder Wall Segmentation Matters and What This Study Sets Out to Do

Clinical motivation: Bladder cancer is among the most common malignancies, with the American Cancer Society estimating 76,900 new diagnoses and 16,390 deaths in 2017 alone. CT urography (CTU) is a primary imaging modality that captures the bladder, kidneys, and ureters in a single scan, but interpreting these exams is labor-intensive. A typical CTU scan contains approximately 300 slices (range: 200 to 600), every one of which must be inspected for lesions. This workload, combined with numerous benign anomalies that can mimic cancer, produces large inter-radiologist variability, with reported sensitivity for bladder cancer detection ranging from 64% to 97%.

The segmentation problem: Before a computer-aided detection (CAD) system can identify bladder lesions, it must first accurately delineate the bladder wall. Under-segmentation risks excluding lesions from the search region entirely, while over-segmentation introduces false positives from non-bladder structures. This study by Gordon et al. at the University of Michigan tackles both the inner wall (boundary between bladder interior and wall) and the outer wall (boundary between wall and surrounding tissue). Segmenting these two surfaces is considerably harder than segmenting the bladder as a whole because the wall itself is often only a few pixels thick, and bladders can be partially filled with contrast material, fully filled, or unfilled, creating inconsistent boundaries.

Prior approaches and their limitations: Previous bladder segmentation methods include level sets by Hadjiiski et al. and the CLASS (Conjoint Level Set Analysis and Segmentation System) by Cha et al., as well as MR-based techniques by Li et al., Duan et al., and Ma et al. Most earlier MR studies used small cohorts of no more than 22 patients. The CLASS method improved results but required two manually marked bounding boxes as starting points. In a pilot study, Cha et al. showed that a DL-CNN with level sets outperformed CLASS and other standard methods for whole-bladder segmentation.

Study goal: This paper extends the pilot work to segment not just the outer bladder contour but both the inner and outer bladder walls. The authors train a deep-learning convolutional neural network (DL-CNN) to generate a bladder wall likelihood map and then use that map to guide cascaded level set propagation. The method is evaluated on a dataset of 172 CTU cases (81 training, 91 testing) containing approximately 16,000 total slices, making it substantially larger than most previous segmentation studies in this domain.

TL;DR: CTU interpretation is time-consuming and has wide variability in sensitivity (64-97%). This study develops a DL-CNN combined with cascaded level sets to segment both inner and outer bladder walls across 172 CTU cases (~16,000 slices), addressing a critical prerequisite for automated bladder cancer detection.
Pages 3-4
Patient Cohort, Case Distribution, and Reference Standard

Dataset composition: The study uses 172 CTU cases collected retrospectively with IRB approval from the Abdominal Imaging Division at Michigan Medicine. These cases were split into 81 training cases and 91 test cases, balanced by difficulty based on the appearance and shape of each bladder. In the training set, 42 bladders contained focal mass-like lesions (40 malignant), 21 had wall thickenings (16 malignant), and 18 were normal. The test set included 42 focal mass-like lesions (42 malignant), 36 wall thickenings (23 malignant), and 13 normal bladders.

Contrast fill status: A major source of segmentation difficulty is the variable presence of intravenous contrast material in the bladder. In the training set, 61 bladders were partially filled with IV contrast, 8 were fully filled, and 12 had no contrast. In the test set, 84 were partially filled, 4 were fully filled, and 3 were unfilled. The boundary between contrast-filled regions, the bladder wall, and surrounding structures varies drastically depending on contrast status, creating weak edges that challenge both human and automated segmentation.

Reference standard: Both the inner and outer bladder walls were hand-outlined on each 2D axial slice by an abdominal radiologist with over 20 years of experience, using an in-house graphical tool called MiViewer. These slice-by-slice outlines formed 3D surface contours for both walls. The 172 bladders contained a total of approximately 16,000 slices, averaging about 100 slices per bladder. This comprehensive manual delineation served as the ground truth for training the DL-CNN and evaluating segmentation performance.

TL;DR: 172 CTU cases (81 training, 91 test) were manually outlined by an expert radiologist across ~16,000 slices. The dataset includes a mix of malignant lesions, wall thickenings, and normal bladders with variable contrast fill, providing a realistic and challenging benchmark for automated segmentation.
Pages 4-6
Network Design and Bladder Wall Likelihood Map Generation

Architecture overview: The DL-CNN is based on Cuda-Convnet, originally developed by Krizhevsky. It consists of five main layers: two convolutional layers, two locally connected convolutional layers, and one fully connected layer, followed by a Softmax output. The first two convolutional layers each contain 64 kernels of size 5x5, and each is followed by a pooling layer (using overlapping pooling to reduce overfitting) and a local response normalization layer. The locally connected layers contain 64 and 32 kernels of 3x3, respectively. Unlike standard convolutional layers, locally connected layers apply different kernels at each spatial location, allowing position-specific feature extraction.

Activation and normalization: All neurons use the ReLU (Rectified Linear Unit) activation function, f(x) = max(0, x), which converges faster than the sigmoid function. Local response normalization is applied with parameters N = 9, s = 0.001, and epsilon = 0.75, values demonstrated to be effective in the original Krizhevsky architecture and confirmed through the authors' own experimentation. The Softmax layer outputs a single value in [0, 1] representing the likelihood that an input ROI belongs to the bladder wall.

ROI extraction and labeling: Approximately 240,000 ROIs of 16x16 pixels were extracted from the training cases, exactly half labeled as within the bladder wall and half as not. The labeling used a "jittering" scheme: only the central 8x8 pixel region of each 16x16 ROI was evaluated against the hand-outlined contours. If 70% of this central region fell between the inner and outer wall outlines, the ROI was labeled as wall. ROIs with 95% or more of their central area inside the bladder interior, or less than 10% inside the outer wall boundary, were labeled as non-wall. This approach allowed thin walls to be captured without excessive noise.

Likelihood map generation: After training for 1,500 iterations (approximately 7 to 8 hours on an Nvidia Tesla K20 GPU), the DL-CNN was applied voxel-by-voxel within a rectangular volume of interest (VOI) enclosing the bladder. For each voxel, the centered 16x16 ROI was fed through the network, producing a likelihood score. The collection of all scores formed a 3D bladder wall likelihood map where brighter pixels indicated higher probability of being within the wall. Generating a single likelihood map took approximately 4 minutes during deployment.

TL;DR: The DL-CNN uses a 5-layer Cuda-Convnet architecture with ReLU activations and Softmax output, trained on ~240,000 balanced ROIs (16x16 pixels). A "jittering" labeling scheme using the central 8x8 region captures thin walls accurately. The network produces a 3D likelihood map in ~4 minutes per case.
Pages 5-7
Cascaded Level Sets for Inner and Outer Wall Contour Extraction

Four-stage pipeline: After generating the DL-CNN likelihood map, the system segments the bladder walls using a cascaded level set method with four stages: (a) preprocessing, (b) initial segmentation, (c) 3D level set segmentation, and (d) 2D level set refinement. The preprocessing stage applies smoothing, anisotropic diffusion, gradient filters, and rank transforms to create gradient vector images that guide level set propagation.

Initial segmentation: First, the likelihood map is thresholded at h = 0.85 (determined experimentally via histogram analysis) to create a binary mask separating wall from non-wall voxels. An ellipsoid with axes 1.5 times the VOI dimensions is placed at the centroid of the resulting mask, and its intersection with the mask defines the object region. This ellipsoid prevents leakage into surrounding structures like the pelvic bone, which often receives high likelihood scores. Morphological dilation (2-voxel-radius sphere), 3D flood fill, and erosion then connect neighboring components and extract the initial segmentation surface.

Cascading 3D level sets: Four sequential 3D level sets propagate the initial contour toward the bladder walls. The level set equation combines three terms: an advection term (drives the contour toward high-gradient regions), a propagation term (controls expansion or contraction based on local pixel information), and a curvature term (maintains smooth contour shape). The first three level sets use the original CTU volume as the gradient image. The crucial fourth level set incorporates the DL-CNN likelihood map directly into the energy equation, which is what enables differentiation between inner and outer walls. The propagation coefficient is positive for outer wall segmentation (expanding outward) and negative for inner wall segmentation (contracting inward).

2D refinement: As a final step, 2D level sets are applied to each slice of the 3D segmented object, using the 3D contours as initial conditions, to refine the segmentation on a per-slice basis. The entire level set cascade runs twice: once to extract the outer wall contour and once for the inner wall, using different parameter settings for each. The level set computation takes approximately 2 to 5 minutes per contour.

TL;DR: A four-stage cascaded level set method uses the DL-CNN likelihood map (thresholded at 0.85) to propagate contours toward inner and outer bladder walls. The fourth level set integrates the likelihood map directly, with positive propagation for outer wall expansion and negative for inner wall contraction. 2D level sets then refine each slice, taking 2-5 minutes per contour.
Pages 7-8
Performance Metrics: Volume Intersection, Volume Error, and Average Distance

Volume intersection ratio: This metric measures the overlap between the computer-segmented volume and the radiologist's reference volume. It is calculated as the ratio of the intersection of the two volumes to the reference volume (R3D = (VR intersection VU) / VR). A score of 100% means the segmented volume perfectly covers the reference. This metric is computed independently for the inner wall contour, the outer wall contour, and the shell region between them.

Volume error and absolute volume error: The volume error is the signed difference between the reference and segmented volumes divided by the reference volume (E3D = (VR - VU) / VR). A positive error indicates under-segmentation (the computer contour is too small), while a negative error indicates over-segmentation (the contour extends beyond the reference). The absolute volume error |E3D| captures the magnitude of deviation regardless of direction, providing a clearer picture of overall accuracy since positive and negative errors can cancel out in the signed metric.

Average distance (AVDIST): This is the bidirectional average of minimum Euclidean distances between the reference and segmented contour surfaces. For every voxel on the reference contour, the minimum distance to the segmented contour is computed, and vice versa. The two resulting averages are combined to produce AVDIST. This metric is particularly informative for thin structures like the bladder wall because it captures spatial displacement directly in millimeters.

Lesion intersection: To evaluate whether the segmentation reliably encloses bladder lesions (a requirement for downstream CAD), the volume intersection between each lesion and the outer wall contour was also computed. This determines what percentage of each lesion falls within the segmented region, which directly impacts the sensitivity of any subsequent lesion detection step.

TL;DR: Performance is measured by volume intersection ratio (overlap with reference), signed and absolute volume error (under- vs. over-segmentation), average bidirectional distance in mm, and lesion intersection ratio (percentage of lesions enclosed by the outer wall contour).
Pages 8-10
Segmentation Performance for Inner Wall, Outer Wall, and Lesion Enclosure

Inner wall performance: On the training set, the inner wall contour achieved a volume intersection of 90.2 +/- 8.7%, volume error of 4.3 +/- 18.2%, absolute volume error of 12.6 +/- 13.7%, and average distance of 3.0 +/- 1.6 mm. On the test set, the corresponding values were 87.2 +/- 10.5%, 5.3 +/- 28.2%, 15.6 +/- 24.0%, and 3.2 +/- 1.7 mm. The positive volume error indicates a slight tendency toward under-segmentation for the inner wall, meaning the contour tends to sit slightly inside the true wall boundary.

Outer wall performance: The outer wall showed stronger results. The training set achieved volume intersection of 93.2 +/- 5.8%, volume error of 7.2 +/- 12.3%, absolute volume error of 10.4 +/- 9.6%, and average distance of 3.0 +/- 1.2 mm. The test set achieved 89.5 +/- 9.8%, 6.2 +/- 20.5%, 14.6 +/- 15.6%, and 3.5 +/- 2.0 mm. The consistently better outer wall results reflect the fact that the outer boundary typically has stronger contrast against surrounding structures than the often ambiguous inner wall.

Combined wall shell: When evaluating the thin shell between the inner and outer contours, performance dropped substantially: volume intersection was 61.0 +/- 11.3% (training) and 54.6 +/- 10.4% (test), with absolute volume errors of 34.5 +/- 37.3% and 25.1 +/- 15.8%, respectively. This is expected because the wall is only a few pixels thick, so even small contour displacements produce large percentage errors in the narrow shell region.

Lesion enclosure: The DL-CNN-assisted level sets enclosed 80.3 +/- 23.8% of lesion volume in the training set and 81.6 +/- 16.6% in the test set. Of all lesions, 70.2% had volume intersection ratios above 75% with the computer-segmented contours, compared to 89.1% for the hand-outlined reference. While lesion enclosure is not yet at radiologist level, the majority of lesions are captured within the segmented region, supporting the method's utility as a first step in a CAD pipeline.

TL;DR: The outer wall achieved 89.5% volume intersection and 3.5 mm average distance on the test set; the inner wall reached 87.2% and 3.2 mm. The thin wall shell had lower overlap (54.6%) due to its narrow width. Lesion enclosure averaged 81.6% on the test set, with 70.2% of lesions above 75% overlap.
Pages 10-12
DL-CNN with Level Sets vs. DL-CNN Alone, CLASS, and ITK-SNAP

DL-CNN with vs. without level sets: A critical comparison shows the complementary roles of the DL-CNN and level sets. Without level sets, the DL-CNN tended to over-segment the inner wall (volume error: -33.8 +/- 36.1% on training) but under-segment the outer wall (volume error: 24.3 +/- 8.8% on training). With level set refinement, these errors were significantly reduced (P < 0.01): inner wall volume error improved to 4.3 +/- 18.2%, and outer wall absolute volume error dropped from 24.3% to 10.4 +/- 9.6%. Average distance also improved from 5.2 +/- 1.6 mm to 3.0 +/- 1.2 mm for the outer wall in training.

Likelihood map in fourth level set only vs. all level sets: The authors tested using the DL-CNN likelihood map in all four 3D level sets versus only in the fourth. For the outer wall, restricting the likelihood map to the fourth level set was significantly better (P < 0.01) in both training and test sets. The test set outer wall volume intersection improved from 76.1 +/- 11.9% (all level sets) to 89.5 +/- 9.8% (fourth only), and average distance decreased from 4.7 +/- 2.4 mm to 3.5 +/- 2.0 mm. For the inner wall, the difference was not statistically significant.

DL-CNN vs. CLASS: Compared to the CLASS level set system (which does not use deep learning), the DL-CNN method achieved significantly better outer wall volume intersection on the full test set: 89.5 +/- 9.8% versus 84.0 +/- 11.4%. On the same dataset used in a prior study by Cha et al., the current method achieved volume intersection ratios of 93.7% and 89.5% for training and test, compared to 84.2% and 78.0% with the earlier approach (P < 0.01).

DL-CNN vs. ITK-SNAP: On a subset of 30 bladders, the DL-CNN achieved 94.4 +/- 3.2% volume intersection and 3.0 +/- 1.2 mm average distance for the outer wall, compared to 78.8 +/- 16.3% and 5.2 +/- 2.6 mm for ITK-SNAP and 79.0 +/- 8.2% and 3.5 +/- 1.3 mm for CLASS. The improvements over both ITK-SNAP and CLASS were statistically significant (P < 0.01) for volume intersection and absolute volume error.

TL;DR: Level sets corrected DL-CNN over-segmentation of the inner wall and under-segmentation of the outer wall (P < 0.01). Using the likelihood map only in the fourth level set was optimal for the outer wall (89.5% vs. 76.1%). The DL-CNN method significantly outperformed both CLASS (84.0%) and ITK-SNAP (78.8%) for outer wall segmentation.
Pages 12-15
Challenges with Thin Walls, Prostate Overlap, and Paths Forward

ROI size trade-offs: The authors experimented with 8x8, 16x16, and 32x32 pixel ROI sizes. The 8x8 ROIs produced excessive noise in the likelihood maps and gaps in thin wall regions. The 32x32 ROIs generated walls that were far too thick, unable to represent the actual narrow wall structure. The chosen 16x16 ROIs provided a middle ground with acceptable noise levels and sufficient spatial resolution, with only a minor increase in training time (from 5.5 to 6.5 hours vs. 32x32) and no difference in deployment time (~4 minutes per map).

Thin wall evaluation problem: While the inner and outer contours individually achieved good overlap with the reference standard, the wall shell between them showed much lower metrics (54.6% volume intersection on test). This is inherent to thin structures: even small contour displacements of 1-2 pixels translate to large percentage errors in the narrow region. Cases with individually good inner and outer contour performance still showed noticeably poorer wall shell metrics due to these small spatial shifts.

Prostate and lesion challenges: In male patients, the prostate often protrudes into the bladder and has a similar CT appearance to the bladder wall, causing the DL-CNN-assisted level sets to sometimes incorrectly segment prostate tissue as part of the outer wall. For lesions, the inner wall contours tended to pass through lesions rather than around them, even though the likelihood maps typically included lesions accurately. This means the method may underestimate wall thickness at lesion sites, which has implications for downstream tumor detection.

Thick-slice limitation: For CT scans with thick slices (5 mm) and unusually small bladders, the level sets could not adapt quickly enough to account for rapid anatomical changes between slices. The authors propose developing an automated triage stage to detect these outlier cases and apply optimized parameter sets. Additionally, the reference standard was based on outlines from a single radiologist; using consensus outlines from multiple radiologists would reduce bias and provide more robust evaluation.

Future directions: The authors conclude that DL-CNN-assisted level set segmentation is a viable first step toward automated bladder lesion detection in CTU. Ongoing work focuses on improving wall segmentation accuracy, particularly for lesion inclusion within wall contours, and developing the subsequent steps of a full CAD pipeline for bladder cancer diagnosis and treatment planning. This work was supported by NIH grant U01CA179106.

TL;DR: Key limitations include thin wall evaluation sensitivity to small displacements, prostate overlap in male patients, and inner contours passing through lesions. Thick-slice CT with small bladders also posed difficulties. Future work targets improved lesion enclosure, multi-radiologist reference standards, and development of a complete CAD pipeline for bladder cancer detection.
Citation: Gordon MN, Hadjiiski LM, Cha KH, et al.. Open Access, 2019. Available at: PMC6367014. DOI: 10.1002/mp.13326. License: Open Access.