论文标题

近似圆形图案匹配

Approximate Circular Pattern Matching

论文作者

Charalampopoulos, Panagiotis, Kociumaka, Tomasz, Radoszewski, Jakub, Pissis, Solon P., Rytter, Wojciech, Waleń, Tomasz, Zuba, Wiktor

论文摘要

We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$.在问题的决策版本中,我们将检查是否存在此类事件。除了Charalampopopoulos等人的工作外,大约CPM的所有先前结果都是平均上限或启发式方法。 [CKP $^+$,JCSS'21],他只考虑了锤距。对于大约cpm问题的报告版本,在限制距离下,我们从$ {\ cal o}(\ cal o}(n+(n/m)\ cdot k^4)$提高了[Ckp $^+$,JCSS'21]的主要算法。对于编辑距离,我们给出$ {\ cal o}(nk^2)$ - 时间算法。我们还考虑了近似CPM问题的决策版本。在锤距离下,我们获得了$ {\ cal o}(n+(n/m)\ cdot k^2 \ log k/\ log \ log \ log k)$ - 时间算法,该算法几乎与Chan等人的算法相匹配。 [CGKKP,stoc'20]对于问题的标准对应物。在编辑距离下,我们算法的$ {\ cal o}(nk \ log^2 k)$运行时间几乎与$ {\ cal o}(nk)$ landau-vishkin algorithm的运行时间[lv,j. algorithms'89]。作为垫脚石,我们提出了一个$ {\ cal o}(nk \ log^2 k)$ - 最长前缀$ k'$ - 近似匹配问题的时间算法,由Landau等人提出。 [lms,sicomp'98],对于所有$ k'\ in \ {1,\ dots,k \} $。我们给出了一个条件下限,这表明二进制字母上的锤距距离和其非圆形对应物之间的近似CPM之间的多项式分离。我们还表明,在编辑距离下的近似CPM的决策版本的强烈次级时间算法将反驳Seth。

We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to check if any such occurrence exists. All previous results for approximate CPM were either average-case upper bounds or heuristics, except for the work of Charalampopoulos et al. [CKP$^+$, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP$^+$, JCSS'21] from ${\cal O}(n+(n/m)\cdot k^4)$ to ${\cal O}(n+(n/m)\cdot k^3)$ time; for the edit distance, we give an ${\cal O}(nk^2)$-time algorithm. We also consider the decision version of the approximate CPM problem. Under the Hamming distance, we obtain an ${\cal O}(n+(n/m)\cdot k^2\log k/\log\log k)$-time algorithm, which nearly matches the algorithm by Chan et al. [CGKKP, STOC'20] for the standard counterpart of the problem. Under the edit distance, the ${\cal O}(nk\log^2 k)$ running time of our algorithm nearly matches the ${\cal O}(nk)$ running time of the Landau-Vishkin algorithm [LV, J. Algorithms'89]. As a stepping stone, we propose an ${\cal O}(nk\log^2 k)$-time algorithm for the Longest Prefix $k'$-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all $k'\in \{1,\dots,k\}$. We give a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute SETH.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源