In this talk, I will present algorithms for the contextual bandit problem with generalized linear rewards. Motivated by practical situations, I will discuss batched versions of the problem and develop algorithms that can scale to very large action sets while attaining optimal regret with respect to the number of rounds and the non-linearity of the reward model. Our techniques include appropriate optimal designs constructions combined with linear optimization oracles and action scaling to account for reward nonlinearity. I will sketch our proof idea for some of the main results and also present empirical results if time permits.
Short Bio:
I am a Principal Researcher at Microsoft Research, working in the areas of Reinforcement Learning, Causal Inference and Learning Theory. I received my Ph.D. inĀ Mathematics from the California Institute of Technology in 2016, where I was advised by Prof. Eric Rains and my Integrated M.Sc. in Mathematics and Scientific Computing from Indian Institute of Technology (IIT) Kanpur in 2011. My primary research interest lies in sequential decision-making, particularly as applied to real world applications such as online advertising and recommendation systems. I focus on scenarios where decision-makers have access to both observational and interventional data. While observational data is abundant and inexpensive, interventional data, though costly, provides more granular insights into the impact of specific decisions. My research aims to develop algorithms that optimally balance the use of these data types, ensuring both theoretically sound and practically viable decision-making over extended periods. To achieve this, I employ a diverse toolkit encompassing causal inference, nonlinear and stochastic optimization, optimal experimental design etc.