论文标题
tsr-dsaw:通过深空词的表结构识别
TSR-DSAW: Table Structure Recognition via Deep Spatial Association of Words
论文作者
论文摘要
来自摄像机捕获或扫描文档的表结构识别方法(TSR)的现有方法在由嵌套行 /列,多行文本和缺少单元格数据组成的复杂表上表现不佳。这是因为当前数据驱动的方法通过简单地训练大量数据的深度模型而起作用,并且在遇到看不见的表结构时无法推广。在本文中,我们建议训练一个深层网络,以捕获表图中存在的不同单词对之间的空间关联,以阐明表结构。我们提出了一条端到端的管道,通过深层的单词空间关联,称为TSR-DSAW:TSR,该终端是以结构化格式(例如HTML)输出表图像的数字表示。给定表映像作为输入,提出的方法首先使用文本检测网络(如Craft)检测图像中存在的所有单词,然后使用动态编程来生成单词对。这些单词对在单个图像中突出显示,随后将其喂入densenet-121分类器中,该分类器训练有素,以捕获诸如相同行,相同列,相同或无的空间关联。最后,我们对分类器输出执行后处理,以HTML格式生成表结构。我们在两个公共表图像数据集(PubTabnet and Icdar 2013)上评估了TSR-DSAW管道,并证明了对先前方法(例如Tablenet和DeepDesrt)的改进。
Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we propose to train a deep network to capture the spatial associations between different word pairs present in the table image for unravelling the table structure. We present an end-to-end pipeline, named TSR-DSAW: TSR via Deep Spatial Association of Words, which outputs a digital representation of a table image in a structured format such as HTML. Given a table image as input, the proposed method begins with the detection of all the words present in the image using a text-detection network like CRAFT which is followed by the generation of word-pairs using dynamic programming. These word-pairs are highlighted in individual images and subsequently, fed into a DenseNet-121 classifier trained to capture spatial associations such as same-row, same-column, same-cell or none. Finally, we perform post-processing on the classifier output to generate the table structure in HTML format. We evaluate our TSR-DSAW pipeline on two public table-image datasets -- PubTabNet and ICDAR 2013, and demonstrate improvement over previous methods such as TableNet and DeepDeSRT.