论文标题
使用文档样式指南的跨域文档布局分析分析
Cross-Domain Document Layout Analysis Using Document Style Guide
论文作者
论文摘要
文档布局分析(DLA)旨在将文档图像分解为高级语义领域(即图形,表格,文本和背景)。创建具有强大泛化功能的DLA框架是一个挑战,因为文档对象是布局,大小,宽高比,纹理等的多样性。许多研究人员通过合成数据来构建大型培训集来致力于这一挑战。但是,合成训练数据具有不同的样式和质量不稳定。此外,源数据和目标数据之间存在很大的差距。在本文中,我们提出了一个基于文档样式指导的无监督的跨域DLA框架。我们将文档质量评估和文档跨域分析整合到一个统一的框架中。我们的框架由三个组件,文档布局生成器(GLD),文档元素装饰器(GED)和文档样式判别器(DSD)组成。 GLD用于记录布局生成,GED用于记录布局元素填充,DSD用于记录质量评估和跨域指导。首先,我们将GLD应用于预测生成文档的位置。然后,我们根据美学指导设计了一种新颖的算法,以填补文档位置。最后,我们使用对比度学习来评估文档的质量评估。此外,我们设计了一种新策略,将文档质量评估组件更改为文档跨域样式指南组件。我们的框架是一个无监督的文档布局分析框架。我们通过许多实验证明了我们提出的方法取得了出色的性能。
The document layout analysis (DLA) aims to decompose document images into high-level semantic areas (i.e., figures, tables, texts, and background). Creating a DLA framework with strong generalization capabilities is a challenge due to document objects are diversity in layout, size, aspect ratio, texture, etc. Many researchers devoted this challenge by synthesizing data to build large training sets. However, the synthetic training data has different styles and erratic quality. Besides, there is a large gap between the source data and the target data. In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance. We integrated the document quality assessment and the document cross-domain analysis into a unified framework. Our framework is composed of three components, Document Layout Generator (GLD), Document Elements Decorator(GED), and Document Style Discriminator(DSD). The GLD is used to document layout generates, the GED is used to document layout elements fill, and the DSD is used to document quality assessment and cross-domain guidance. First, we apply GLD to predict the positions of the generated document. Then, we design a novel algorithm based on aesthetic guidance to fill the document positions. Finally, we use contrastive learning to evaluate the quality assessment of the document. Besides, we design a new strategy to change the document quality assessment component into a document cross-domain style guide component. Our framework is an unsupervised document layout analysis framework. We have proved through numerous experiments that our proposed method has achieved remarkable performance.