论文标题
了解扫描的收据
Understanding Scanned Receipts
论文作者
论文摘要
具有理解收据的任务机器可以具有重要的应用程序,例如启用有关购买,实施费用策略的详细分析以及在大量收据中推断购买行为的模式。在本文中,我们专注于扫描的收据订单项的命名实体链接(NEL)的任务;具体而言,该任务需要将OCR的速记文本与杂货产品的知识库(KB)相关联。例如,扫描的物品“ Sto Baby菠菜”应链接到标有“简单真相有机婴儿菠菜”的目录项目。采用各种信息检索技术与统计短语检测结合使用的实验显示了有效理解扫描收据数据的希望。
Tasking machines with understanding receipts can have important applications such as enabling detailed analytics on purchases, enforcing expense policies, and inferring patterns of purchase behavior on large collections of receipts. In this paper, we focus on the task of Named Entity Linking (NEL) of scanned receipt line items; specifically, the task entails associating shorthand text from OCR'd receipts with a knowledge base (KB) of grocery products. For example, the scanned item "STO BABY SPINACH" should be linked to the catalog item labeled "Simple Truth Organic Baby Spinach". Experiments that employ a variety of Information Retrieval techniques in combination with statistical phrase detection shows promise for effective understanding of scanned receipt data.