漫画文本检测和识别的综合金标准和基准

论文标题

漫画文本检测和识别的综合金标准和基准

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

论文作者

Soykan, Gürkan, Yuret, Deniz, Sezgin, Tevfik Metin

论文摘要

这项研究重点是改善漫画数据集中面板的光学特征识别（OCR）数据，该数据集是最大的数据集，其中包含漫画书中的文本和图像。为此，我们开发了用于OCR处理和标记漫画书的管道，并为西方漫画创建了第一个文本检测和识别数据集，称为“漫画文本+：检测”和“漫画文本+：识别”。我们评估了这些数据集上最先进的文本检测和识别模型的性能，并发现与漫画中的文本相比，单词准确性和标准化的编辑距离有了显着提高。我们还创建了一个名为“ Comics Text+”的新数据集，该数据集包含漫画数据集中的文本框提取的文本。从漫画处理模型中使用改进的漫画文本+的文本数据，从而在无需更改模型体系结构的情况下就可以在悬挂式任务上进行最新性能。漫画文本+数据集对于从事任务的研究人员（包括文本检测，识别和漫画的高级处理）（例如叙事理解，角色关系和故事产生）的高级处理可能是宝贵的资源。可以在https://github.com/gsoykan/comics_text_plus中访问所有数据和推理说明。

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题