论文标题
AutoSTR:有效的骨干搜索场景文本识别
AutoSTR: Efficient Backbone Search for Scene Text Recognition
论文作者
论文摘要
由于文本实例的多样性和场景的复杂性,场景文本识别(STR)非常具有挑战性。通过改善预处理图像模块(如整流和脱毛)或序列翻译器,社区越来越关注以提高性能。但是,尚未广泛探索另一个关键模块,即特征序列提取器。在这项工作中,受神经体系结构搜索(NAS)的成功启发,该搜索可以比人类设计的更好的体系结构识别更好的体系结构,因此我们建议自动化的STR(AutoSTR)搜索与数据相关的骨干搜索以提高文本识别性能。首先,我们为STR设计了特定于域的搜索空间,该搜索空间既包含操作中的选择,又包含下采样路径上的约束。然后,我们提出了一种两步搜索算法,该算法将分解操作和下采样路径,以在给定的空间中进行有效的搜索。实验表明,通过搜索与数据相关的骨架,AutOSTR可以在具有更少的拖曳和模型参数的标准基准上胜过最先进的方法。
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectification and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.