论文标题

近似文本到图案锤距

Approximating Text-to-Pattern Hamming Distances

论文作者

Chan, Timothy M., Golan, Shay, Kociumaka, Tomasz, Kopelowitz, Tsvi, Porat, Ely

论文摘要

我们在字符串匹配中重新审视一个基本问题:给定长度为m和长度n的文本,均在大小$σ$的字母上,计算每个位置的图案和文本之间的锤击距离。文献中已经提出了几种$(1+ε)$ - 近似算法,并使用快速傅立叶变换(FFT),并使用$ o(ε^{ - o(1)} n \ log n \ log m)$的运行时间。我们描述了一种简单的$(1+ε)$ - 近似算法,该算法更快且不需要FFT。将我们的方法与其他想法相结合会带来许多新的结果: - 我们获得了第一个线性时间近似算法;运行时间为$ O(ε^{ - 2} n)$。 - 我们获得了更快的精确算法计算所有锤距,直至给定阈值k;它的运行时间改善了对数因素的先前结果,如果$ k \ le \ sqrt m $,则是线性的。 - 我们使用矩形矩阵乘法获得了具有更好$ε$依赖性的近似算法。当图案足够长时,时间限制为$É(n)$:$ m \geε^{ - 28} $。以前的算法需要$É(ε^{ - 1} n)$时间。 - 当k不太小时,我们会获得一种真正的额定时间算法,以在$ o((n/k^{ω(1)+occ)n^{occ)n^{o(1))$ o(n/k^{ω(1)})$ o(n N^{OCC)N^{O(1)})$时间$ o(n/k^{ω(1)+occ)$ in occ sime,cops inc occ s occ的输出尺寸。该算法导致属性测试仪,如果存在确切的匹配,则返回true,如果锤距距离在每个位置都超过$δm$,则以$é(δ^^{ - 1/3} N^{2/3}+δ{2/3}+δ^{ - 1} n/m)$时间运行。 - 我们获得了一种流算法,以使用$É(ε^{ - 2} \ sqrt k)$ space报告所有大约小于K的位置。以前,流算算法以(k)空间的确切问题或$É(ε^{ - o(1)} \ sqrt m)$ space的近似问题而闻名。

We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ε)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is $O(ε^{-2}n)$. - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$. - We obtain approximation algorithms with better $ε$-dependence using rectangular matrix multiplication. The time-bound is $Õ(n)$ when the pattern is sufficiently long: $m\ge ε^{-28}$. Previous algorithms require $Õ(ε^{-1}n)$ time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Ω(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $δm$ at every location, running in $Õ(δ^{-1/3}n^{2/3}+δ^{-1}n/m)$ time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $Õ(ε^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with $Õ(ε^{-O(1)}\sqrt m)$ space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源