域自适应场景文本通过子类别检测

论文标题

域自适应场景文本通过子类别检测

Domain Adaptive Scene Text Detection via Subcategorization

论文作者

Tian, Zichen, Xue, Chuhui, Zhang, Jingyi, Lu, Shijian

论文摘要

大多数现有场景文本探测器都需要大规模训练数据，这两个主要因素不能很好地扩展：1）场景文本图像通常具有特定于域的特定分布； 2）收集大规模注释的场景文本图像很费力。我们研究域自适应场景文本检测，这是一项在很大程度上被忽视但有意义的任务，旨在在处理各个新域中未标记的图像时最佳传输标记的场景文本图像。具体而言，我们设计了Scast，这是一种子类别感知的自我训练技术，可缓解网络过度拟合和嘈杂的伪伪标签，以有效地自适应场景文本检测。 Scast由两个新型设计组成。对于标记的源数据，它引入了前景文本和背景内容的伪子类别，这些伪源是有助于培训具有多类检测目标的更具通用源模型。对于未标记的目标数据，它可以通过在源域中训练的二进制和子类别分类器进行协调来减轻网络过度拟合。广泛的实验表明，SCAST在多个公共基准中始终如一地达到卓越的检测性能，并且还可以很好地推广到其他域自适应检测任务，例如车辆检测。

Most existing scene text detectors require large-scale training data which cannot scale well due to two major factors: 1) scene text images often have domain-specific distributions; 2) collecting large-scale annotated scene text images is laborious. We study domain adaptive scene text detection, a largely neglected yet very meaningful task that aims for optimal transfer of labelled scene text images while handling unlabelled images in various new domains. Specifically, we design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels in domain adaptive scene text detection effectively. SCAST consists of two novel designs. For labelled source data, it introduces pseudo subcategories for both foreground texts and background stuff which helps train more generalizable source models with multi-class detection objectives. For unlabelled target data, it mitigates the network overfitting by co-regularizing the binary and subcategory classifiers trained in the source domain. Extensive experiments show that SCAST achieves superior detection performance consistently across multiple public benchmarks, and it also generalizes well to other domain adaptive detection tasks such as vehicle detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题