论文标题

BWT-RUNS边界空间中的最佳时间RLBWT结构

An Optimal-Time RLBWT Construction in BWT-runs Bounded Space

论文作者

Nishimoto, Takaaki, Kanda, Shunsuke, Tabei, Yasuo

论文摘要

高度重复的字符串的压缩(即具有许多重复的字符串)一直是字符串处理中的一个中心研究主题,到目前为止,已经提出了许多针对这些字符串的压缩方法。其中,有效的压缩格式收集越来越多的关注是奔跑的洞穴 - 轮毂变换(RLBWT),它是一种运行的长度编码BWT,作为在后缀词典上的输入字符串的可逆置换。 RLBWT的最新结构算法在(i)非最佳计算时间或(ii)与输入字符串长度线性成比例的工作空间有关。在本文中,我们介绍\ emph {r-comp},这是BWT-RUNS边界空间中RLBWT的第一个最佳时间构建算法。也就是说,R-comp的计算复杂性为$ O(n + r \ log {r})$ time和$ o(r \ log {n})$工作空间,用于长度的输入字符串的$ n $和bwt中的均等字符串的数字$ r $。对于带有属性$ r = o(n/\ log {n})$的字符串的字符串的计算时间(即$ o(n)$)是最佳的,它适用于大多数重复的字符串。使用高度重复字符串的现实世界数据集的实验显示了R-Comp在计算时间和空间方面的有效性。

The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an efficient compression format gathering increasing attention is the run-length Burrows--Wheeler transform (RLBWT), which is a run-length encoded BWT as a reversible permutation of an input string on the lexicographical order of suffixes. State-of-the-art construction algorithms of RLBWT have a serious issue with respect to (i) non-optimal computation time or (ii) a working space that is linearly proportional to the length of an input string. In this paper, we present \emph{r-comp}, the first optimal-time construction algorithm of RLBWT in BWT-runs bounded space. That is, the computational complexity of r-comp is $O(n + r \log{r})$ time and $O(r\log{n})$ bits of working space for the length $n$ of an input string and the number $r$ of equal-letter runs in BWT. The computation time is optimal (i.e., $O(n)$) for strings with the property $r=O(n/\log{n})$, which holds for most highly repetitive strings. Experiments using a real-world dataset of highly repetitive strings show the effectiveness of r-comp with respect to computation time and space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源