论文标题
插入关系的语义解析
Semantic Parsing of Interpage Relations
论文作者
论文摘要
文档的页面级分析一直是数字化工作中感兴趣的话题,并且已将多模式方法应用于分类和页流细分。在这项工作中,我们专注于捕获多页文档页面之间的更细微的语义关系。为此,我们将任务形式化为Interpage关系的语义解析,并提出了一种以依赖性解析文献的启发,为interpage依赖提取的端到端方法提出了一种方法。我们进一步设计了一种多任务训练方法,可以使用从页面中提取的文本和视觉特征来分割,分类和解析页面依赖项中的页面嵌入。此外,我们还结合了两种模式的功能,以获得多模式页面嵌入。据我们所知,这是第一项从多页文档中提取丰富的语义插图关系的研究。我们的实验结果表明,该方法的语义解析方法增加了41个百分点,在页面流进行分段的精度增加了33个百分点,而在天真的基线上,页面分类的45个百分点。
Page-level analysis of documents has been a topic of interest in digitization efforts, and multimodal approaches have been applied to both classification and page stream segmentation. In this work, we focus on capturing finer semantic relations between pages of a multi-page document. To this end, we formalize the task as semantic parsing of interpage relations and we propose an end-to-end approach for interpage dependency extraction, inspired by the dependency parsing literature. We further design a multi-task training approach to jointly optimize for page embeddings to be used in segmentation, classification, and parsing of the page dependencies using textual and visual features extracted from the pages. Moreover, we also combine the features from two modalities to obtain multimodal page embeddings. To the best of our knowledge, this is the first study to extract rich semantic interpage relations from multi-page documents. Our experimental results show that the proposed method increased LAS by 41 percentage points for semantic parsing, increased accuracy by 33 percentage points for page stream segmentation, and 45 percentage points for page classification over a naive baseline.