Search for a command to run...
Tree ensembles—Random Forests (RFs) and Gradient Boosting Machines (GBMs)—often stabilize before all trees are evaluated. We study early stopping as a nonparametric change-point problem on prediction increments. The P2-STOP method family monitors a robust interquartile-range (IQR) scale of prediction increments online and stops when a relative-scale criterion is met. The default variant uses a rolling-window exact-quantile estimator (O(w) memory), which provides a clean finite-sample stopping guarantee; a full-prefix P2 streaming approximation (O(1) memory) is available as a memory-light alternative. The stopping rule applies to both RFs and GBMs without model-specific distributional assumptions. On four RF benchmarks (MNIST, Covertype, HIGGS, and Credit Card Fraud), P2-STOP achieves 44.8% mean work reduction (range: 0.7–71.7%) with an accuracy change from −0.53 to +0.02 percentage points versus full-ensemble inference. On XGBoost (T=500), work reduction is dataset-dependent (41.4% on Covertype up to 89.0% on Credit Card), with corresponding accuracy trade-offs. Under random-tree contamination conditions (5%, 15%, and 25%), performance remains stable, whereas IQR-versus-standard-deviation baseline differences are mixed rather than uniformly dominant. Designed for compiled inference engines (e.g., C++/Numba), P2-STOP translates theoretical work reduction into consistent wall-clock speedups (4.14×–4.82× versus compiled full RF on MNIST/Covertype/HIGGS for T=500). Native Python implementations serve purely as logical baselines due to loop overhead, while Credit Card exhibits the expected slowdown when work reduction is near zero. All comparisons use five seeds with 95% confidence intervals and seed-level paired tests. With only five seeds, inferential power is limited, and p-values should be interpreted cautiously. Relative to the Dirichlet RF baseline, our contribution is not larger RF-specific work reduction; it is a robust nonparametric IQR-scale stopping criterion, cast as a change-point/sequential-inference problem, that works as a post hoc wrapper across RF and GBM settings.