TokenPowerBench: Benchmarking the Power Consumption of LLM Inference

20260 citationsJournal Articlediamond Open Access

Authors

Chenxu Niu · Texas Tech University

Wei Emma Zhang · Texas Advanced Computing Center

Jie Li · Texas Tech University

Yongjian Zhao · Texas Tech University

Tai Wang · Texas Tech University

Xi Wang

Yong Ru Chen · Texas Tech University

Abstract

Large language model (LLM) services now answer billions of queries per day, and industry reports show that inference, not training, accounts for more than 90% of total power consumption. However, existing benchmarks focus on either training/fine-tuning or performance of inference and provide little support for power consumption measurement and analysis of inference. We introduce TokenPowerBench, the first lightweight and extensible benchmark designed for LLM-inference power consumption studies. The benchmark combines a declarative configuration interface covering model choice, prompt set, and inference engine, a measurement layer that captures GPU-, node-, and system-level power without specialized power meters, and a phase-aligned metrics pipeline that attributes energy to the prefill and decode stages of every request. These elements make it straightforward to explore the power consumed by an LLM inference run; furthermore, by varying batch size, context length, parallelism strategy and quantization, users can quickly assess how each setting affects joules per token and other energy-efficiency metrics. We evaluate TokenPowerBench on four of the most widely used model series (Llama, Falcon, Qwen, and Mistral). Our experiments cover from 1 billion parameters up to the frontier-scale Llama3-405B model. Furthermore, we release TokenPowerBench as open source to help users to measure power consumption, forecast operating expenses, and meet sustainability targets when deploying LLM services.

Topics & Keywords

Big Data and Digital Economy Green IT and Sustainability Machine Learning in Materials Science

Publication Details

Published in: Proceedings of the AAAI Conference on Artificial Intelligence

Volume 40, Issue 38, pp. 32582-32590

DOI: 10.1609/aaai.v40i38.40535

Command Palette

TokenPowerBench: Benchmarking the Power Consumption of LLM Inference

Authors

Abstract

Topics & Keywords

Publication Details