Tata Institute of Fundamental Research

$D^2$ Sampling and the $k$-means Problem

STCS Colloquium
Speaker: Amit Kumar (Indian Institute of Technology Department of Computer Science and Engineering Hauz Khas New Delhi 110016)
Organiser: Prahladh Harsha
Date: Tuesday, 29 Mar 2016, 16:00 to 17:00
Venue: A-201 (STCS Seminar Room)

(Scan to add to calendar)
Abstract:  Abstract: Given a set of points $P$ in a $d$-dimensional Euclidean space, the $k$-means clustering problem seeks to find a set $C$ of $k$ centers such that the sum over all points in $P$ of the square of the distance to the closest center in $C$ is minimized. A natural random sampling based heuristic for finding the $k$ centers is as follows -- pick the first center uniformly at random from the set of points, the next one with probability proportional to the square of the distance from the first center and so on. In this talk, I will survey algorithms for the $k$-means problem based on this random sampling technique. I will show that this idea can be used to obtain fast PTAS for this problem. We extend this result to constrained $k$-means clustering problems, where there may be additional constraints on  valid clusterings of the input points (this is joint work with Anup Bhattacharya and Ragesh Jaiswal).