Given a data stream of m elements, the Distinct Elements problem is to estimate the number of distinct elements in the stream. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space-optimal algorithms for it. However, all the current state-of-the-art algorithms are often difficult to analyze or impractical.
I will present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with a knowledge of basic probability theory.
In addition to the simplicity, the approach has significant theoretical and practical implications: our approach allowed us to resolve the open problem of (Discrete) Klee's Measure Problem in the streaming setting and build a state-of-the-art DNF counter in practice.