通过使用合成数据自动解析结构化的视觉内容

论文标题

通过使用合成数据自动解析结构化的视觉内容

Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data

论文作者

Scholch, Lukas, Steinhauser, Jonas, Beichter, Maximilian, Seibold, Constantin, Yang, Kailun, Knäble, Merlin, Schwarz, Thorsten, Mädche, Alexander, Stiefelhagen, Rainer

论文摘要

作者使用结构化的视觉内容（SVC），例如图形，流程图或类似图表来说明各种概念。尽管这种描述使普通读者可以更好地理解内容，但包含SVC的图像通常不可读取机器。反过来，这不仅阻碍了自动化的知识聚集，而且还阻碍了对视力受损的人的形式表现的感知。在这项工作中，我们提出了一个合成数据集，其中包含图像形式的SVC以及地面真相。我们通过从SVC图像中自动提取图表表示的应用程序显示了此数据集的用法。这是通过通过常见的监督学习方法训练模型来完成的。由于目前没有用于SVC详细分析的大规模公共数据集，因此我们提出了合成SVC（SSVC）数据集，其中包含12,000张图像，并具有相应的边界框注释和详细的图形表示。我们的数据集可以开发强大的模型来解释SVC，同时跳过耗时的密集数据注释。我们在综合和手动注释数据上评估了我们的模型，并通过各种指标显示了合成性向真实的转移性。在这里，我们评估了此概念证明是可以扩展的，并为此任务奠定了坚实的基准。我们讨论了进一步改进的方法的局限性。我们使用的指标可以用作该域中将来比较的工具。为了进一步研究此任务，该数据集可在https://bit.ly/3jn1pjj上公开获取。

Structured Visual Content (SVC) such as graphs, flow charts, or the like are used by authors to illustrate various concepts. While such depictions allow the average reader to better understand the contents, images containing SVCs are typically not machine-readable. This, in turn, not only hinders automated knowledge aggregation, but also the perception of displayed in-formation for visually impaired people. In this work, we propose a synthetic dataset, containing SVCs in the form of images as well as ground truths. We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image. This is done by training a model via common supervised learning methods. As there currently exist no large-scale public datasets for the detailed analysis of SVC, we propose the Synthetic SVC (SSVC) dataset comprising 12,000 images with respective bounding box annotations and detailed graph representations. Our dataset enables the development of strong models for the interpretation of SVCs while skipping the time-consuming dense data annotation. We evaluate our model on both synthetic and manually annotated data and show the transferability of synthetic to real via various metrics, given the presented application. Here, we evaluate that this proof of concept is possible to some extend and lay down a solid baseline for this task. We discuss the limitations of our approach for further improvements. Our utilized metrics can be used as a tool for future comparisons in this domain. To enable further research on this task, the dataset is publicly available at https://bit.ly/3jN1pJJ

下载PDF全文

下载文献需遵守相关版权规定

论文标题