Search for a command to run...
Objective Apply bioinformatics combined with machine learning algorithms to screen potential biomarkers of neonatal sepsis (NS) and explore the correlation between biomarkers and immune cells. Methods Expression profiling data containing NS samples and control samples were downloaded from the Gene Expression Omnibus (GEO) database. First, the differentially expressed genes (DEGs) were found. Then gene set enrichment analysis (GSEA) was carried out to reveal the significantly enriched pathways, and the core genes within these pathways were aggregated. Subsequently, weighted gene co-expression network analysis (WGCNA) was used to identify the gene modules significantly correlated with NS, from which the key genes were screened. The feature genes were selected by the use of three machine learning algorithms: least absolute shrinkage and selection operator (LASSO) regression, support vector machine recursive feature elimination (SVM-RFE), and random forest cross-validation (RF-CV), and then the potential biomarkers of NS were identified by logistic regression analyses. The dataset was randomly split into training and test sets. A logistic regression model with biomarkers was constructed using the training set, and its diagnostic value was evaluated separately on both the training and test sets. In addition, immune infiltration analysis was performed using immune cell deconvolution algorithm CIBERSORT to explore the correlation between biomarkers and immune cells. Results Overall, after overlapping the filtered DEGs, the core genes of GSEA, and the key genes of WGCNA, we detected 14 intersecting genes. Three machine learning algorithms further selected 4 feature genes, and logistic regression analyses identified ACSL1 and CD3D as potential biomarkers of NS. The classification accuracies of the logistic regression model were 0.883 and 0.902 on the training set and test set, respectively. The area under the curve (AUC) of the receiver operating characteristic curve (ROC) was 0.961 and 0.966, and the calibration curves were close to the ideal curve. It should be noted that these AUC values were estimated within this discovery GEO cohort and require confirmation in independent external cohorts. Immune infiltration analysis showed significant changes ( P < 0.05) in the infiltration abundance of 13 immune cell types, and a strong correlation (>0.6) existed between ACSL1, CD3D and neutrophils, naive CD4 + T cells, and CD8 + T cells. Conclusion ACSL1, CD3D are potential biomarkers of NS and may play an important role in the pathogenesis of NS by modulating immune cell functions such as neutrophils, naive CD4 + T cells, and CD8 + T cells. These conclusions should be regarded as exploratory biomarker discovery rather than diagnostic readiness.