Simple yet Effective Ensemble Feature Selection Using Hierarchical Binning

20260 citationsJournal Articlegold Open Access

Authors

Jinho Park · Electronics and Telecommunications Research Institute

Dohun Kim · Electronics and Telecommunications Research Institute

Wonjong Kim · Electronics and Telecommunications Research Institute

Abstract

Feature selection is essential for improving classification performance and reducing overfitting in high-dimensional learning tasks. However, conventional importance-based methods often suffer from instability, model bias, and sensitivity to threshold settings. To address these limitations, we propose EFSHB (Ensemble Feature Selection using Hierarchical Binning), a hybrid ensemble framework that integrates importance-based sorting, bin-level greedy evaluation, iterative hierarchical refinement, and union-based integration of model-wise selected features. At each iteration, five tree-based models independently perform bin-wise greedy selection, and their selected subsets are merged through a union operation to form the feature set for the next iteration. This iterative process progressively refines the feature space while mitigating model-specific bias and promoting robust predictive performance across heterogeneous models. EFSHB was evaluated on nine high-dimensional benchmark datasets, including biomedical gene-expression, synthetic, proteomics, and speech-feature data. Across all datasets, EFSHB achieved the highest or near-highest classification accuracy, outperforming traditional Greedy Feature Selection (GFS), binning-based GFS (GFSB), and hierarchical binning GFS (GFSHB). On average, EFSHB improved accuracy for all classifiers, achieving mean gains of 14.0% over GFS and 13.3% over GFSHB. EFSHB also provided balanced feature reduction by avoiding excessive feature retention while preserving complementary informative features identified across models. In terms of computational efficiency, EFSHB reduced average feature selection time from 266 min (GFS) to 11 min, corresponding to a 24-fold speed-up. These results demonstrate that EFSHB achieves robust predictive performance and high computational efficiency, making it suitable for diverse high-dimensional applications.

Topics & Keywords

Gene expression and cancer classification Machine Learning in Bioinformatics Bioinformatics and Genomic Networks

Publication Details

Published in: Applied Sciences

Volume 16, Issue 7, pp. 3404-3404

DOI: 10.3390/app16073404

Field-Weighted Citation Impact: 0.00

Command Palette

Simple yet Effective Ensemble Feature Selection Using Hierarchical Binning

Authors

Abstract

Topics & Keywords

Publication Details