Tata Institute of Fundamental Research

Regret minimization in stochastic multi-armed bandits

Student Seminar
Speaker: Agniv Bandyopadhyay (TIFR)
Organiser: Malhar Ajit Managoli
Date: Friday, 9 Aug 2024, 16:00 to 17:00
Venue: A-201 (STCS Seminar Room)

(Scan to add to calendar)
Abstract: 
In the stochastic K-armed bandit framework, we are given K unknown distributions or arms. At a given time, we can select one arm and can observe one sample from that arm. Our goal is to maximize the reward over a finite time horizon, which is also equivalent to minimizing the regret. Regret minimization is an important aspect of many applications where we have to make sequential decisions under uncertainty and optimize for some objective, for example, clinical trials,  recommendation systems, selecting the best portfolio in a financial market, etc. We will analyze some basic regret minimizing algorithms, such as: explore-then-commit, successive reject, upper confidence bound, etc. We will also derive an information-theoretic lower bound on regret. 
 
The talk will be based on the following references: 
1. Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020, Chapter 6-7.
2. Kaufmann, Emilie. Contributions to the Optimal Solution of Several Bandit Problems. Diss. Université de Lille, 2020, Chapter 1.