Search for a command to run...
Background: Breast cancer (BC) is one of the most diagnosed malignancies and a leading cause of cancer-related mortality among women worldwide, thereby posing a substantial threat to women’s health worldwide. However, clinically robust diagnostic biomarkers with high sensitivity and specificity, as well as well-validated molecular targets for targeted therapy, remain limited. Methods: BC transcriptomic data from seven GEO datasets and the TCGA-BRCA cohort (n = 1231) were integrated for analysis. After batch-effect correction, candidate genes were screened through DEA, WGCNA, and PPI networks analysis. An ensemble machine learning (ML) framework incorporating 127 algorithmic combinations was constructed, and SHAP analysis was applied to identify hub genes. Further analyses included functional enrichment, immune infiltration, miRNA regulatory network analysis, and SMR analysis. The expression patterns were validated using single-cell transcriptome data. Drug repositioning analysis and AI-assisted virtual screening were performed to prioritize compounds with favorable drug-like properties. The predicted binding modes of candidate compounds with CHEK1 were assessed by molecular docking. Results: Thirty core genes were obtained through differential expression, WGCNA, and PPI screening. Integrated ML (127 algorithms) determined the optimal model (AUC = 0.919), and SHAP identified nine feature genes, among which CHEK1 and KIF23 showed preliminary diagnostic potential across four external cohorts (AUC: 0.625–0.938). Functional enrichment indicated that both are enriched in the cell cycle and p53 pathways, closely associated with BRCA1/ATR; immune infiltration revealed significant correlations with macrophages and CD8+ T cells, with hsa-miR-15a-5p and hsa-miR-607 being common upstream regulatory miRNAs. SMR analysis supported a causal relationship between CHEK1 expression and BC genetic susceptibility (p_SMR < 0.05, p_HEIDI > 0.05); single-cell analysis confirms its heterogeneous expression. AI-assisted virtual screening identified 25 A-grade computational candidate compounds from 171 candidates. Molecular docking suggested that Olaparib and LY294002 can form favorable interactions with the CHEK1 active pocket. Conclusions: The study identified CHEK1 as a key diagnostic gene for BC through 127 ML algorithms and SMR causal inference. By combining AI-assisted virtual screening and molecular docking, computational candidate compounds targeting CHEK1 were prioritized. These findings represent hypothesis-generating in silico predictions and require experimental validation before any therapeutic conclusions can be drawn.