In this talk, we consider the following online decision making problem: Given a set of uncertain alternatives how must one take decisions so that one minimizes the error probability of choosing a suboptimal alternative given a fixed sample budget. This is motivated by practical applications in diverse fields like drug trials, online advertising and sensor placement and more generally in areas of online decision making where the samples cannot be generated in a sequential manner and have a fixed limited budget. This problem is called the Fixed Budget Best Arm identification (FB-BAI) in the Multi Armed Bandit (MAB) community but has also been studied by the research communities in online decision making, simulation, operations research and information theory.
In Komiyama et. al (2022) an information theoretic lower bound and a matching algorithm assuming access to a computationally and analytically intractable oracle was proposed. As this is a difficult problem to handle in this level of generality, we instead study a special case of FB-BAI: Active Simple Hypothesis Testing (ASHT). Even in this simplified setting, optimal algorithms and their information theoretic limits are not well understood. In the ASHT setting, we place Komiyama et al's bounds in a game theoretic framework and show that it is the value function of a certain differential game. We show that this value function is the viscosity solution to an associated Hamilton-Jacobi-Isaac(HJI) Partial Differential Equation (PDE). This PDE formulation allows us to derive new bounds on error exponents, produce a counterexample to a conjecture of Komiyama et al and create a computationally tractable $\epsilon$-optimal algorithm when the action set (A) of the hypotheses is small. As the PDE approach is computationally difficult for moderate size A, we use a novel link of ASHT with approachability problems to propose a new approachability algorithm that is provably better than the best static algorithms. Further, we numerically observe that in certain problem instances, the proposed algorithm attains the optimal exponent.
We will give a broad introduction to the problem. The technical details will be kept to a minimum to make the talk more accessible.
Reference: Komiyama et al.: https://arxiv.org/pdf/2206.04646