Suppose you want to find the least congested route in an ad hoc network. Each link's rate is unknown and stochastic, and each time you get to see the minimum rate (i.e., bottleneck) along any route you pick. More generally, given such partial feedback, suppose you want to find the maximum multicommodity flow in a network with stochastic links. A direct approach is to treat this as a standard multi-armed bandit with #arms = #routes, but which leads to a potentially huge number of actions. Moreover, this ignores the structure of the problem -- the actual number of independent parameters (i.e., the unknown links) is much smaller.
We model a very general class of such "complex bandit" online problems, and present the natural, Bayesian-inspired Thompson sampling algorithm to solve them. A key contribution is to provide a general, frequentist bound on the regret of this algorithm which explicitly encodes the information structure of the complex bandit. This general bound is used to give non-trivial regret guarantees for a class of complex, subset selection bandit problems.
Bio: Aditya Gopalan received the Ph.D. degree in electrical engineering at The University of Texas at Austin. He received the B.Tech. and M.Tech. degrees in electrical engineering from the Indian Institute of Technology Madras in 2006. He was a summer intern at the Corporate Research and Development Center, Qualcomm Inc., in 2009. His current research interests include machine learning, network algorithms and stochastic control. He is currently a postdoctoral researcher research fellow in the Faculty of Electrical Engineering at the Technion- Israel Institute of Technology, Haifa, Israel.