Search for a command to run...
Deployed machine learning models frequently encounter degradation in predictive accuracy when the statistical properties of incoming data evolve over time, a condition known as data drift.This phenomenon can manifest in several forms, most notably concept drift, which occurs when the functional relationship linking predictor variables to the outcome changes, thereby undermining model reliability.Conventional drift detection strategies often rely on aggregate performance indicators or univariate distributional summaries, approaches that may overlook nuanced yet consequential shifts in the data-generating mechanism.Within the broader machine learning operations (MLOps) framework, continuous model monitoring has emerged as a critical practice for safeguarding the stability and dependability of production systems (Biecek, 2019;Mougan & Nielsen, 2023).datadriftR is an open-source R package designed to address these challenges by providing real-time detection of data drift in univariate streaming data.The package implements a comprehensive suite of widely recognized statistical methods for monitoring distributional changes, including error-rate-based detectors that track classification performance (DDM (Gama et al., 2004), EDDM (Baena-Garc a et al., 2006)), Hoeffding-bound methods that employ adaptive windowing to detect mean shifts (HDDM-A and HDDM-W (Fr as-Blanco et al., 2015)), adaptive windowing for change detection (ADWIN (Bifet & Gavald, 2007)), a sliding-window Kolmogorov-Smirnov test for distribution comparison (KSWIN (Raab et al., 2020)), the cumulative-sum-based Page-Hinkley test for detecting persistent shifts (Page, 1954), histogram-based Kullback-Leibler divergence monitoring for measuring distributional divergence (Kullback & Leibler, 1951), and a functional profile comparison method for analyzing temporal patterns (Kobyliska & others, 2023).
Published in: The Journal of Open Source Software
Volume 11, Issue 119, pp. 9481-9481
DOI: 10.21105/joss.09481