论文标题
重新访问广义模式匹配
Generalised Pattern Matching Revisited
论文作者
论文摘要
In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $Σ_T$, a pattern $P$ of length $m$ over an alphabet $Σ_P$, and a matching relationship $\subseteq σ_t\ timesσ_p$,并且必须返回匹配$ p $(报告)的$ t $的所有子字符串或每个$ t长度$ m $ $ m $和$ p $(计数)的每个子字符串之间的不匹配数。在这项工作中,我们改进了所有以前已知的算法的此问题,以描述输入实例的各种参数: * $ \ mathcal {d} \,$是匹配固定字符的最大字符数, * $ \ Mathcal {s} \,$是匹配字符的数量, * $ \ MATHCAL {i} \,$是与模式$ p $的$ m $字符匹配的字符间隔的总数。 $ \ Mathcal {d} \,$和$ \ Mathcal {s} \,$的新确定性上限的核心是叠加的代码的更快构造,该代码构造了[focs'97,indyk]中提出的一个开放问题,并且可以独立感兴趣。总而言之,我们演示了$ \ texttt {gpm} $的第一个下限。我们首先显示$ \ texttt {gpm} $的任何确定性或蒙特卡洛算法都必须使用$ω(\ Mathcal {s})$时间,然后继续显示组合算法的更高下限。这些界限表明,除非开发了一种新的新方法,否则我们的算法几乎是最佳的。
In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $Σ_T$, a pattern $P$ of length $m$ over an alphabet $Σ_P$, and a matching relationship $\subseteq Σ_T \times Σ_P$, and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$. At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$. We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $Ω(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.