C3-STISR：带有三重线索的场景文本图像超分辨率

论文标题

C3-STISR：带有三重线索的场景文本图像超分辨率

C3-STISR: Scene Text Image Super-resolution with Triple Clues

论文作者

Zhao, Minyi, Wang, Miao, Bai, Fan, Li, Bingjia, Wang, Jie, Zhou, Shuigeng

论文摘要

场景文本图像超分辨率（STISR）被视为从低分辨率场景文本图像中识别文本识别的重要预处理任务。最近的方法将识别器的反馈用作指导超分辨率的线索。但是，直接使用识别线索有两个问题：1）兼容性。它是概率分布的形式，具有STISR-像素级任务的明显模态差距； 2）不准确。它通常包含错误的信息，因此会误导主要任务并降低超分辨率性能。在本文中，我们提出了一种新颖的方法C3-STISR，该方法可以共同利用识别者的反馈，视觉和语言信息作为指导超分辨率的线索。在这里，视觉线索来自识别器预测的文本的图像，该文本的信息丰富，与STISR任务更兼容。尽管语言线索是由预先训练的字符级语言模型生成的，该模型能够纠正预测的文本。我们为三重跨模式线索设计有效的提取和融合机制，以生成全面而统一的超分辨率指导。 TextZoom上的广泛实验表明，C3-STISR在保真度和识别性能方面的表现优于SOTA方法。代码可在https://github.com/zhaominyiz/c3-stisr中找到。

Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clues to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained character-level language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https://github.com/zhaominyiz/C3-STISR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题