How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets

20260 citationsJournal Article

Authors

Abstract

We introduce and analyze active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalizing the market clearing as an optimization problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared with benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world data sets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared with conventional methods. Our proposal comprises an easy-to-implement practical solution for optimizing data acquisition in resource-constrained environments. History: Ding Yu served as the senior editor for this article Funding: Pierre Pinson acknowledges the support of UKRI through the Global NSFUKRI Centre EPICS [Electric Power Innovation for a Carbon-free Society Centre–EPICS-UK; Grant EP/Y025946/1]. Data Ethics & Reproducibility Note: The code and data are available at https://github.com/xiwenhuang123/Active_learning_market_IJDS/tree/main and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2025.0093 ).

Topics & Keywords

Machine Learning and Algorithms Data Stream Mining Techniques Advanced Bandit Algorithms Research

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: INFORMS Journal on Data Science

DOI: 10.1287/ijds.2025.0093

Field-Weighted Citation Impact: 0.00