Tata Institute of Fundamental Research

Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime

STCS Seminar
Speaker: Michal Koucky (Charles University Department of Computer Science Malosranske nam. 25, 118 00 Praha 1 Czech Republic)
Organiser: Arkadev Chattopadhyay
Date: Wednesday, 6 Apr 2016, 16:00 to 17:00
Venue: A-201 (STCS Seminar Room)

(Scan to add to calendar)
Abstract:  Abstract: The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings $x,y$ lying in the Boolean hypercube. The edit distance between $x$ and $y$ is defined as the minimum number of character insertion, deletion, and bit flips needed for converting $x$ into $y$. Whereas, the Hamming distance between $x$ and $y$ is the number of bit flips needed for converting $x$ to $y$.

In this talk I will present a randomized injective embedding of the editdistance into the Hamming distance with a small (qudratic) distortion. Namely, for any satisfying that their edit distance equals , the Hamming distance between the embedding of and is with high probability. This improves over the distortion ratio of $O(\log n \log^* n)$ obtained by Jowhari (2012) for small values of . Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input.

I will mention several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space and computing edit distance exactly up-to distance. This algorithm is based on kernelization for edit distance that is of independent (joint work with Diptarka Chakraborty and Elazar Goldenberg).