Search for a command to run...
Restless multi-armed bandits (RMABs) can model several resource allocation problems. To cite only a few applications, RMABs are used to solve optimization problems related to stochastic scheduling in a queue opportunistic scheduling machine maintenance [5], healthcare or recommendation systems. A RMAB is composed of N independent arms, each of them being a Markov process that evolves in discrete time. At each time instant the decision maker chooses a set of arms to be activated, with the goal of maximizing rewards cumulated across all arms, while not being able to activate more than a certain number of arms at a time. Introduced by Gittins and Jones, the Gittins index policy and its further extension the Whittle index policy consists of associating a number – called index – to each state of each arm, and then to activate the arms with the highest indices in priority. Indices can be computed for all arms that satisfy a broad condition called indexability. If this and other index based policies are not optimal in general, they have been shown to be asymptotically optimal when the number of arms goes to infinity, under indexability and a technical condition that is often verified in practice. Moreover, they work extremly well in practice.
Published in: ACM SIGMETRICS Performance Evaluation Review
Volume 53, Issue 4, pp. 103-104