A Reinforcement Learning-Based Framework for Tariff-Aware Load Shifting in Energy-Intensive Manufacturing

20260 citationsJournal Articlegold Open Access

Authors

Jersson X. Leon-Medina · Banco de la República Colombia

Mario Eduardo González González Niño · Pedagogical and Technological University of Colombia

Claudia Patricia Siachoque Siachoque Celys · Banco de la República Colombia

Bernardo Umbarila Umbarila Suarez · Banco de la República Colombia

Francesc Pozo · Universitat Politècnica de Catalunya

Abstract

Optimizing energy-intensive manufacturing under time-varying electricity tariffs requires scheduling strategies that reduce cost without compromising operational feasibility. This study is grounded in readily available industrial sensing: we exclusively use time-series measurements of aggregated active power and energy at the main distribution board of a quicklime production plant. We propose a tariff-aware load-shifting framework in which a Proximal Policy Optimization (PPO) reinforcement learning agent is trained in a custom Gymnasium environment to apply discrete consumption scaling actions constrained to 80-125% of a baseline profile during the operating shift (08:00-16:00), explicitly accounting for demand-charge exposure in the TOU peak window (13:00-15:00). The reward design combines instantaneous electricity cost with cumulative energy-tracking penalties and terms associated with operational constraints. Multi-day validation over N=30 working days shows consistent economic benefits, with a median total cost reduction on the order of 10% (narrow IQR) driven by reduced peak-window energy and demand peaks. However, the script-based binary compliance indicators (viol_energy, viol_prod_min) reveal deviations from the energy-balance criterion and occasional minimum-production shortfalls under the tolerances used, highlighting the cost-production trade-off and the need for stricter constraint handling for industrial deployment. In addition, we benchmark against dynamic programming (DP), an alternative RL policy (DQN), and a greedy heuristic (GREEDY), comparing cost; operational performance; and, when applicable, computational efficiency, which positions PPO as a competitive alternative among the considered methods. Overall, this work demonstrates how learning-based decision making can be coupled with real-world industrial sensing infrastructures, providing a data-driven tariff-aware scheduling layer for industrial energy management under practical constraints.

Topics & Keywords

Smart Grid Energy Management Energy Efficiency and Management Integrated Energy Systems Optimization

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: Sensors

Volume 26, Issue 6, pp. 1858-1858

DOI: 10.3390/s26061858

Field-Weighted Citation Impact: 0.00