Model-Based Learning of Whittle Index

20260 citationsJournal Article

Authors

Joël Charles-Rebuffé · Institut polytechnique de Grenoble

Nicolas Gast · Institut polytechnique de Grenoble

Bruno Gaujal · Institut polytechnique de Grenoble

Abstract

Restless multi-armed bandits (RMABs) can model several resource allocation problems. To cite only a few applications, RMABs are used to solve optimization problems related to stochastic scheduling in a queue opportunistic scheduling machine maintenance [5], healthcare or recommendation systems. A RMAB is composed of N independent arms, each of them being a Markov process that evolves in discrete time. At each time instant the decision maker chooses a set of arms to be activated, with the goal of maximizing rewards cumulated across all arms, while not being able to activate more than a certain number of arms at a time. Introduced by Gittins and Jones, the Gittins index policy and its further extension the Whittle index policy consists of associating a number – called index – to each state of each arm, and then to activate the arms with the highest indices in priority. Indices can be computed for all arms that satisfy a broad condition called indexability. If this and other index based policies are not optimal in general, they have been shown to be asymptotically optimal when the number of arms goes to infinity, under indexability and a technical condition that is often verified in practice. Moreover, they work extremly well in practice.

Topics & Keywords

Advanced Bandit Algorithms Research Reinforcement Learning in Robotics Risk and Portfolio Optimization

Publication Details

Published in: ACM SIGMETRICS Performance Evaluation Review

Volume 53, Issue 4, pp. 103-104

DOI: 10.1145/3797823.3797853

Field-Weighted Citation Impact: 0.00

Command Palette

Model-Based Learning of Whittle Index

Authors

Abstract

Topics & Keywords

Publication Details