Tata Institute of Fundamental Research

The Kullback-Leibler Upper Confidence Bound (KLUCB) Algorithm for regret minimization in K-armed bandits

STCS Student Seminar

Speaker:	Anirban Bhattacharjee
Organiser:	Sushant Vijayan
Date:	Friday, 25 Sep 2020, 17:15 to 18:15
Venue:	Zoom link: https://zoom.us/j/98132227553?pwd=K2cyQllKVjExdUhlRm0vc0ZHcEt0Zz09

(Scan to add to calendar)

Abstract: The K-armed bandit problem is a sequential decision making problem wherein one has to sequentially sample from a given set of K probability distributions (belonging to a known family) informally called 'arms of the bandit'. The goal is to minimize the total opportunity cost of not selecting the arm with the highest expected reward, called the regret. We shall look at the Kullback-Leibler Upper Confidence Bound (KLUCB) Algorithm for regret minimization in K-armed bandits, and see how it meets the lower bound on expected regret for our problem.