Correlation analysis between deep learning-based feature extraction of spiculation signs in lung cancer and pathological subtypes

20260 citationsJournal Articlegold Open Access

Authors

Abstract

To achieve automated feature extraction of spiculation signs from lung cancer CT images and establish accurate associations with pathological subtypes, this work retrospectively included CT data from 640 pathologically confirmed lung cancer patients and constructed an analytical framework based on the ConvNeXt deep learning model. Features of spiculation signs were extracted using an improved 7 × 7 depthwise separable convolution combined with a hybrid attention mechanism. After a three-stage screening process to identify key features, an attention-based deep belief network (DBN) was employed to construct an association model between the extracted features and pathological subtypes. To assess model stability, this study employed 5-fold cross-validation on the training set based on a 7:2:1 stratified split, achieving an average accuracy of 88.6% and an average AUC of 0.938. Additionally, through 10 repeated random sampling validations, the model obtained an average accuracy of 88.9% and an average AUC of 0.940, which fully verified its generalization capability. Model performance was evaluated through comparative and ablation experiments. Results demonstrated that the spiculation features extracted by ConvNeXt significantly outperformed those from residual network 50 (ResNet50) and traditional radiomics methods in terms of repeatability (intraclass correlation coefficient, ICC = 0.91), discriminative power (analysis of variance F -value = 28.7), and robustness (coefficient of variation = 8.7%), with all improvements being statistically significant ( P < 0.05). The attention-based DBN model achieved a pathological subtyping accuracy of 89.1% on the test set, with a mean area under the receiver operating characteristic curve (mean-AUC) of 0.942 and a mean F1-score of 0.883.To address the issue of class imbalance in the dataset, this study reported separate performance metrics for each pathological subtype. The accuracy and AUC for adenocarcinoma were 0.94 and 0.97, respectively; for squamous cell carcinoma, they were 0.91 and 0.95; for small cell carcinoma, 0.85 and 0.90; and for large cell carcinoma, 0.80 and 0.86. The application of weighted cross-entropy loss improved the F1-score for diagnosing large cell carcinoma by 0.12 compared to the unoptimized version, effectively enhancing the diagnostic performance for rare subtypes. Representing a performance improvement of over 10% compared to the extreme gradient boosting (XGBoost) model ( P < 0.05). Ablation studies confirmed that the attention mechanism contributed 8.4% to the overall performance gain. Furthermore, a typical case-level analysis was conducted in conjunction with SHAP value feature importance ranking. This analysis clarified the model's core decision-making basis for each subtype: adenocarcinoma relied on short spicule density, squamous cell carcinoma on the proportion of long spicules, and small cell carcinoma on spicule grayscale entropy. This significantly improved the model's clinical interpretability. This study successfully accomplishes high-quality feature extraction of spiculation signs and establishes a high-accuracy association with pathological subtypes, thereby providing an objective basis for the precise diagnosis of lung cancer and potentially reducing reliance on invasive biopsies.

Topics & Keywords

Radiomics and Machine Learning in Medical Imaging Lung Cancer Diagnosis and Treatment AI in cancer detection

UN Sustainable Development Goals

Reduced inequalities

Publication Details

Published in: Journal of Radiation Research and Applied Sciences

Volume 19, Issue 2, pp. 102310-102310

DOI: 10.1016/j.jrras.2026.102310

Field-Weighted Citation Impact: 0.00