Performance enhancement method by using the probabilistic estimation for kidney tumor segmentation

20260 citationsJournal Articlegold Open Access

Authors

Wonjoong Cheon · St. Mary's Hospital

Meangee Kim · Proton (Malaysia)

Mira Han · Seoul Metropolitan Government

Se Byeong Lee · Proton (Malaysia)

Dongho Shin · Proton (Malaysia)

Young Kyung Lim · Proton (Malaysia)

Jong Hwi Jeong ·

Abstract

Purpose Ensemble methods can enhance segmentation performance, but their effectiveness depends on the integration strategy. We investigated whether the STAPLE algorithm’s probabilistic framework could effectively leverage model diversity from different loss functions to improve kidney tumor segmentation accuracy compared to individual models and conventional soft voting. Methods We utilized CT scans from 210 patients in the KiTS19 dataset with expert-annotated kidney and tumor structures. Five model variants were developed using the nnU-Net framework: two 2D U-Nets and three 3D U-Nets, each trained with different hybrid loss functions ( L CE+Dice , L TopK+Dice , L CE+GDice ). Five approaches were compared: individual 2D U-Net, individual 3D U-Net, majority voting ensemble, soft voting ensemble, and STAPLE ensemble. Models underwent 5-fold cross-validation, and performance was evaluated using DSC, JI, HD95, precision, and recall on 63 test patients. Statistical significance was assessed using Wilcoxon signed-rank tests with Benjamini–Hochberg correction. Generalizability was evaluated on liver tumor segmentation using the LiTS17 dataset. Results In KiTS19 tumor segmentation, individual model DSCs ranged from 0.64 ± 0.27 (2D models) to 0.70 ± 0.24 (3D models). Majority voting achieved DSC of 0.70 ± 0.27 and soft voting achieved 0.71 ± 0.26, while STAPLE reached 0.74 ± 0.23 (adjusted p&lt;0.05). JI improved from 0.53-0.59 (individual models) to 0.63 ± 0.24 (STAPLE). HD95 decreased to 11.81 ± 13.43 with STAPLE. Precision and recall reached 0.88 ± 0.20 and 0.72 ± 0.24, respectively. In LiTS17 liver tumor segmentation, STAPLE similarly outperformed soft voting (DSC: 0.76 ± 0.10 vs. 0.71 ± 0.18, adjusted p&lt;0.05). Conclusions The STAPLE algorithm achieved superior performance in primary segmentation metrics compared to individual models, majority voting, and soft voting (STAPLE &gt; soft voting &gt; majority voting), demonstrating the benefits of probabilistic ensemble methods for kidney tumor segmentation. Stratified analysis revealed that STAPLE’s advantage was most pronounced for medium-sized tumors, where performance variability was reduced by 45%. The approach showed consistent effectiveness in liver tumor segmentation, suggesting potential for broader clinical applications.

Topics & Keywords

Advanced Neural Network Applications Renal cell carcinoma treatment Advanced Radiotherapy Techniques

Publication Details

Published in: Frontiers in Oncology

Volume 16

DOI: 10.3389/fonc.2026.1764408

Field-Weighted Citation Impact: 0.00

Command Palette

Performance enhancement method by using the probabilistic estimation for kidney tumor segmentation

Authors

Abstract

Topics & Keywords

Publication Details