A Model predictive control approach for asymptotic optimality of Restless Multi-armed Bandits

Organiser:
Rahul Vaze
Date:
Wednesday, 29 Oct 2025, 16:00 to 17:00
Venue:
A-201 (STCS Seminar Room)
Category:
Abstract

The classical restless multi-armed bandit problem comprises of $N$ bandit arms (slot machines) each with an associated state. A player chooses a collection of at most $\alpha N $ arms to play, the remaining arms are left untouched. At the next instance each arm evolves to a new state according to a stochastic transition process leaving the player with a reward depending on the player's action. Concretely, the arms provide a reward that is dependent on both the state and action of the arm at each time instance. This reward process is repeated over an infinite time horizon. The player wishes to maximize the average reward obtained from this process. Restless Multi-armed Bandits are known to be PSPACE hard combinatorial problems. Such problems find applications in a number of different fields including wireless communication, clinical trial design and web-crawling.This problem was first proposed by Whittle in 1988, he conjectured that a priority order based on what is called the Whittle's index will provide an asymptotically optimal solution to this problem under a condition called \emph{indexability}. Unfortunately, we now know of counter-examples where this is not true. In this talk I will introduce a relaxation to the RMAB problem with a corresponding extension to the Whittle index policy that provably gives us asymptotically optimal solutions under easy to verify conditions.

Short Bio:
Dheeraj Narasimha obtained his PhD degree from Arizona State University under Prof. Lei Ying in 2021 while working on mean-field ames for density dependent continuous time markov processes. He then went on to do a post-doctoral fellowship under Prof. Srinivas Shakkottai at Texas A & M University and then completed a second post-doc at INRIA, CNRS under Dr. Nicolas Gast where he worked on restless multi-armed bandits . His work on mean-field games has won runner-up best-paper prizes in ACM mobihoc as well as most recently in IEEE INFOCOM.The main focus of his work revolves around solving decision problems with many participants under weak interaction, be it through weak graph limits, potential games or mean field structures.