论文标题
近似圆形图案匹配
Approximate Circular Pattern Matching
论文作者
论文摘要
We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$.在问题的决策版本中,我们将检查是否存在此类事件。除了Charalampopopoulos等人的工作外,大约CPM的所有先前结果都是平均上限或启发式方法。 [CKP $^+$,JCSS'21],他只考虑了锤距。对于大约cpm问题的报告版本,在限制距离下,我们从$ {\ cal o}(\ cal o}(n+(n/m)\ cdot k^4)$提高了[Ckp $^+$,JCSS'21]的主要算法。对于编辑距离,我们给出$ {\ cal o}(nk^2)$ - 时间算法。我们还考虑了近似CPM问题的决策版本。在锤距离下,我们获得了$ {\ cal o}(n+(n/m)\ cdot k^2 \ log k/\ log \ log \ log k)$ - 时间算法,该算法几乎与Chan等人的算法相匹配。 [CGKKP,stoc'20]对于问题的标准对应物。在编辑距离下,我们算法的$ {\ cal o}(nk \ log^2 k)$运行时间几乎与$ {\ cal o}(nk)$ landau-vishkin algorithm的运行时间[lv,j. algorithms'89]。作为垫脚石,我们提出了一个$ {\ cal o}(nk \ log^2 k)$ - 最长前缀$ k'$ - 近似匹配问题的时间算法,由Landau等人提出。 [lms,sicomp'98],对于所有$ k'\ in \ {1,\ dots,k \} $。我们给出了一个条件下限,这表明二进制字母上的锤距距离和其非圆形对应物之间的近似CPM之间的多项式分离。我们还表明,在编辑距离下的近似CPM的决策版本的强烈次级时间算法将反驳Seth。
We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to check if any such occurrence exists. All previous results for approximate CPM were either average-case upper bounds or heuristics, except for the work of Charalampopoulos et al. [CKP$^+$, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP$^+$, JCSS'21] from ${\cal O}(n+(n/m)\cdot k^4)$ to ${\cal O}(n+(n/m)\cdot k^3)$ time; for the edit distance, we give an ${\cal O}(nk^2)$-time algorithm. We also consider the decision version of the approximate CPM problem. Under the Hamming distance, we obtain an ${\cal O}(n+(n/m)\cdot k^2\log k/\log\log k)$-time algorithm, which nearly matches the algorithm by Chan et al. [CGKKP, STOC'20] for the standard counterpart of the problem. Under the edit distance, the ${\cal O}(nk\log^2 k)$ running time of our algorithm nearly matches the ${\cal O}(nk)$ running time of the Landau-Vishkin algorithm [LV, J. Algorithms'89]. As a stepping stone, we propose an ${\cal O}(nk\log^2 k)$-time algorithm for the Longest Prefix $k'$-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all $k'\in \{1,\dots,k\}$. We give a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute SETH.