We present a very simple and efficient sampling-based algorithm for estimating the union of sets in the streaming setting. Suppose we have a collection of sets S_1, . . . , S_M subsets of T, arriving one by one in a stream; the sets are not given explicitly to us but rather defined implicitly via the following oracles: for each set, we can know the size of the set, get a uniform sample from the set, and given a point check whether it belongs to the set. The goal is to estimate the size of the union of the sets S_1, . . . , S_M.
We present a simple algorithm that estimates the size of the union, upto a (1 + \epsilon) factor, in space complexity and update time complexity O(log(M)^2/\epsilon^2).
Our algorithm provides the first algorithm with polynomial dependence on the dimension for Klee’s measure problem in streaming setting and independent of the stream size, thereby settling the open problem of Woodruff and Tirthpura (PODS-12).
This talk will be based on works with Kuldeep Meel and Vinodchandran (PODS21, PODS22 and ESA22).