Abstract:
Scheduling problems in queueing systems have been long studied under the assumption that the service rates are known a priori. We consider the problem of scheduling in multi-class parallel server systems when the service rates are unknown but can be dynamically learned through feedback from past scheduling decisions. We pose this problem in the multi-armed bandit framework with a notion of regret that is appropriate for queueing systems. I will talk about the fundamental differences in the explore-exploit tradeoff between the classical bandit problem and the queueing bandit problem along with some decision strategies for the queueing bandits.