重新访问广义模式匹配

论文标题

重新访问广义模式匹配

Generalised Pattern Matching Revisited

论文作者

Dudek, Bartłomiej, Gawrychowski, Paweł, Starikovskaya, Tatiana

论文摘要

In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $Σ_T$, a pattern $P$ of length $m$ over an alphabet $Σ_P$, and a matching relationship $\subseteq σ_t\ timesσ_p$，并且必须返回匹配$ p $（报告）的$ t $的所有子字符串或每个$ t长度$ m $ $ m $和$ p $（计数）的每个子字符串之间的不匹配数。在这项工作中，我们改进了所有以前已知的算法的此问题，以描述输入实例的各种参数： * $ \ mathcal {d} \，$是匹配固定字符的最大字符数， * $ \ Mathcal {s} \，$是匹配字符的数量， * $ \ MATHCAL {i} \，$是与模式$ p $的$ m $字符匹配的字符间隔的总数。 $ \ Mathcal {d} \，$和$ \ Mathcal {s} \，$的新确定性上限的核心是叠加的代码的更快构造，该代码构造了[focs'97，indyk]中提出的一个开放问题，并且可以独立感兴趣。总而言之，我们演示了$ \ texttt {gpm} $的第一个下限。我们首先显示$ \ texttt {gpm} $的任何确定性或蒙特卡洛算法都必须使用$ω（\ Mathcal {s}）$时间，然后继续显示组合算法的更高下限。这些界限表明，除非开发了一种新的新方法，否则我们的算法几乎是最佳的。

In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $Σ_T$, a pattern $P$ of length $m$ over an alphabet $Σ_P$, and a matching relationship $\subseteq Σ_T \times Σ_P$, and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$. At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$. We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $Ω(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题