datadriftR: An R package for data drift detection

20260 citationsJournal Articlediamond Open Access

Authors

Ugur Dar · Eskisehir Technical University

Mustafa Çavuş · Eskisehir Technical University

Abstract

Deployed machine learning models frequently encounter degradation in predictive accuracy when the statistical properties of incoming data evolve over time, a condition known as data drift.This phenomenon can manifest in several forms, most notably concept drift, which occurs when the functional relationship linking predictor variables to the outcome changes, thereby undermining model reliability.Conventional drift detection strategies often rely on aggregate performance indicators or univariate distributional summaries, approaches that may overlook nuanced yet consequential shifts in the data-generating mechanism.Within the broader machine learning operations (MLOps) framework, continuous model monitoring has emerged as a critical practice for safeguarding the stability and dependability of production systems (Biecek, 2019;Mougan & Nielsen, 2023).datadriftR is an open-source R package designed to address these challenges by providing real-time detection of data drift in univariate streaming data.The package implements a comprehensive suite of widely recognized statistical methods for monitoring distributional changes, including error-rate-based detectors that track classification performance (DDM (Gama et al., 2004), EDDM (Baena-Garc a et al., 2006)), Hoeffding-bound methods that employ adaptive windowing to detect mean shifts (HDDM-A and HDDM-W (Fr as-Blanco et al., 2015)), adaptive windowing for change detection (ADWIN (Bifet & Gavald, 2007)), a sliding-window Kolmogorov-Smirnov test for distribution comparison (KSWIN (Raab et al., 2020)), the cumulative-sum-based Page-Hinkley test for detecting persistent shifts (Page, 1954), histogram-based Kullback-Leibler divergence monitoring for measuring distributional divergence (Kullback & Leibler, 1951), and a functional profile comparison method for analyzing temporal patterns (Kobyliska & others, 2023).

Topics & Keywords

Data Stream Mining Techniques Data Analysis with R Time Series Analysis and Forecasting

Publication Details

Published in: The Journal of Open Source Software

Volume 11, Issue 119, pp. 9481-9481

DOI: 10.21105/joss.09481

Field-Weighted Citation Impact: 0.00

Command Palette

datadriftR: An R package for data drift detection

Authors

Abstract

Topics & Keywords

Publication Details