傅立叶文档恢复可靠的文档露水和识别

论文标题

傅立叶文档恢复可靠的文档露水和识别

Fourier Document Restoration for Robust Document Dewarping and Recognition

论文作者

Xue, Chuhui, Tian, Zichen, Zhan, Fangneng, Lu, Shijian, Bai, Song

论文摘要

最先进的文档脱水技术学会了预测文档的三维信息，这些信息很容易出现错误，同时处理具有不规则扭曲或深度变化的文档。本文介绍了FDRNET，这是一个傅立叶文档修复网络，可以以不同的和更简单的方式恢复不同扭曲的文档并改善文档识别。 FDRNET专注于在傅立叶空间中捕获大多数结构信息的高频组件，但在很大程度上没有外观降解。它通过灵活的薄板样条转换来拔出文档，该文档可以有效地处理各种变形，而无需训练中的变形注释。这些功能使FDRNET可以从少量标记的训练图像中学习，并且学到的模型可以使用复杂的几何变形来删除文档，并准确地识别恢复的文本。为了促进文档恢复研究，我们创建了一个基准数据集，该数据集由一千多个相机文档组成，这些摄像头文档具有不同类型的几何和光度变形。广泛的实验表明，FDRNET在脱水和文本识别任务上的大幅度优于最先进的实验。此外，FDRNET需要少量标记的培训数据，并且易于部署。

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth. This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions and improve document recognition in a reliable and simpler manner. FDRNet focuses on high-frequency components in the Fourier space that capture most structural information but are largely free of degradation in appearance. It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training. These features allow FDRNet to learn from a small amount of simply labeled training images, and the learned model can dewarp documents with complex geometric distortion and recognize the restored texts accurately. To facilitate document restoration research, we create a benchmark dataset consisting of over one thousand camera documents with different types of geometric and photometric distortion. Extensive experiments show that FDRNet outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks. In addition, FDRNet requires a small amount of simply labeled training data and is easy to deploy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题