Search for a command to run...
Effective feature selection is critical for building robust and interpretable predictive models, particularly in medical applications where identifying risk factors in the most extreme patient strata is essential. Traditional methods often focus on average associations, potentially overlooking predictors whose importance is concentrated in the tails of the data distribution. In this study, we introduce a novel, computationally efficient supervised filter that leverages a Gumbel copula implied upper-tail concordance score ([Formula: see text], a monotone transformation of Kendall's τ) to rank features by their tendency to be simultaneously extreme with the positive class. We evaluated this method against four standard baselines (Mutual Information, mRMR, ReliefF, and L1/Elastic-Net) across four classifiers on two diabetes datasets: a large-scale public health survey (CDC, [Formula: see text]) and a classic clinical benchmark (PIMA, [Formula: see text]). Our analysis included comprehensive statistical tests, permutation importance, and robustness checks. On the CDC dataset, our method was the fastest selector and reduced the feature space by ≈52%. While this resulted in a minimal but statistically significant performance trade-off compared to using all 21 features, our filter significantly outperformed standard filters (Mutual Information, mRMR) and was statistically indistinguishable from the strong ReliefF baseline. On the PIMA dataset (8 predictors), our method's ranking produced the numerically highest ROC-AUC, despite paired DeLong tests showing no statistically significant differences versus strong baselines. PIMA thus serves as a ranking-only sanity check that our upper-tail criterion behaves sensibly in a low-dimensional clinical setting. Across both datasets, the Gumbel-[Formula: see text] selector consistently identified clinically coherent and impactful predictors. We conclude that feature selection via upper-tail dependence is an efficient and interpretable screening approach that can complement standard feature-selection baselines in public health and clinical risk prediction.