论文标题
Textray:用于任意形状场景文本检测的基于轮廓的几何建模
TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection
论文作者
论文摘要
任意形状的文本检测是一项艰巨的任务,这是由于文本的复杂几何布局,例如较大的纵横比,各种尺度,随机旋转和曲线形状。大多数最先进的方法从自下而上的角度解决了这个问题,试图用简单的本地单元(例如本地盒子或像素)建模复杂的几何布局的文本实例,并使用启发式后处理生成检测。在这项工作中,我们提出了一种任意形状的文本检测方法,即Textray,该方法在单发无锚框架内进行了自上而下的基于轮廓的几何建模和几何参数学习。几何建模是在极性系统下进行的,在形状空间和参数空间之间具有双向映射方案,将复杂的几何布局编码为统一表示。为了有效学习表示形式,我们设计了一种中央加权训练策略和内容损失,该策略在几何编码和视觉内容之间建立了传播路径。 Textray在一个通过后仅一个NMS后处理的一个通行证输出简单的多边形检测。在几个基准数据集上进行的实验证明了该方法的有效性。该代码可在https://github.com/lianawang/textray上找到。
Arbitrary-shaped text detection is a challenging task due to the complex geometric layouts of texts such as large aspect ratios, various scales, random rotations and curve shapes. Most state-of-the-art methods solve this problem from bottom-up perspectives, seeking to model a text instance of complex geometric layouts with simple local units (e.g., local boxes or pixels) and generate detections with heuristic post-processings. In this work, we propose an arbitrary-shaped text detection method, namely TextRay, which conducts top-down contour-based geometric modeling and geometric parameter learning within a single-shot anchor-free framework. The geometric modeling is carried out under polar system with a bidirectional mapping scheme between shape space and parameter space, encoding complex geometric layouts into unified representations. For effective learning of the representations, we design a central-weighted training strategy and a content loss which builds propagation paths between geometric encodings and visual content. TextRay outputs simple polygon detections at one pass with only one NMS post-processing. Experiments on several benchmark datasets demonstrate the effectiveness of the proposed approach. The code is available at https://github.com/LianaWang/TextRay.