Abstract: We consider the problem of interactively visualizing a distributed tabular dataset with billions of rows. Our goal is to allow a user to browse such datasets in realtime, at any resolution, with just a few mouse clicks (much like navigation in an online map service) .
We hypothesize that the sketching model of computation is tailor-made for this setting; that interesting visualizations are amenable to efficient sketching protocols, whose communication cost is bounded by the size of the desired output (which is small, since it fits on screen). In this talk, we present Hillview, an open-source tool for interactive visualization of large distributed datasets, built around this premise. We focus on the algorithmic challenges that arise in trying to render common visualizations in the sketching model [based on joint work with Mihai Budiu, Udi Weider, Marcos Aguilera, Lalith Suresh and Han Kruiger (VMware research)].