In a 2002 work, Goldreich and Ron gave sublinear time algorithms for testing graph expansion. In 2015, Czumaj, Peng, and Sohler extended these ideas to test -clusterability in roughly
time. However, no sublinear time algorithms are known that give any information about the cluster structure of a
-clusterable graph. That is, no such algorithms are known for understanding how clusters connect to each other. As a simple example, one may wonder whether it is possible to locally distinguish between the "cluster graph" forming a line from a cluster graph which forms a clique.
In this talk, I will present sublinear time algorithms for estimating the hierarchical cluster structure of -clusterable graphs. The measure we use is Dasgupta cost, which is a standard way to evaluate hierarchical clustering. Our main result is a sublinear time algorithm that approximates the Dasgupta cost of a
-clusterable graph using a small number of randomly chosen seed vertices with known cluster labels.
In particular, I will describe an algorithm that gives an approximation to the Dasgupta cost of
in roughly
time, using about
seeds. This can be viewed as a sublinear time simulation of the algorithm of Charikar and Chatziafratis (SODA 2017) on clusterable graphs.
This is joint work with Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, and Weronika Kaminska.
Short Bio:
Akash Kumar is a James R. Isaac Chair Assistant Professor in the Department of Computer Science and Engineering at IIT Bombay, where he has been on the faculty since December 2022. His research interests are in spectral graph theory and property testing. He is also one of the co-editors of the Property Testing Blog, where besides sublinear algorithms, he enjoys writing about recent developments in topics such as Boolean functions and Markov chains.