Trie：端到端文本阅读和信息提取，以了解文档的理解

论文标题

Trie：端到端文本阅读和信息提取，以了解文档的理解

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

论文作者

Zhang, Peng, Xu, Yunlu, Cheng, Zhanzhan, Pu, Shiliang, Lu, Jing, Qiao, Liang, Niu, Yi, Wu, Fei

论文摘要

由于现实世界中普遍存在的文档（例如，发票，门票，简历和传单）包含丰富的信息，因此自动文档图像理解已成为一个热门话题。大多数现有作品将问题分解为两个单独的任务：（1）用于检测和识别图像中文本的文本阅读以及（2）用于分析和从先前提取的纯文本中分析和提取关键元素的信息提取。但是，他们主要集中于改善信息提取任务，同时忽略了文本阅读和信息提取是相互关联的事实。在本文中，我们提出了一个统一的端到端文本阅读和信息提取网络，这两个任务可以相互加强。具体而言，文本阅读的多模式视觉和文本特征被融合以进行信息提取，进而，信息提取的语义有助于优化文本阅读。在三个具有不同文档图像的现实世界数据集（从固定布局到可变布局，从结构化文本到半结构化文本），我们提出的方法在效率和准确性方面大大优于最先进的方法。

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in images and (2) information extraction for analyzing and extracting key elements from previously extracted plain text. However, they mainly focus on improving information extraction task, while neglecting the fact that text reading and information extraction are mutually correlated. In this paper, we propose a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other. Specifically, the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text reading. On three real-world datasets with diverse document images (from fixed layout to variable layout, from structured text to semi-structured text), our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题