论文标题

正常化神经DATAS E ENDEREZOS

Normalizador Neural de Datas e Endereços

论文作者

Plensack, Gustavo, Finardi, Paulo

论文摘要

任何形式的文档都呈现出各种日期和地址格式,在某些情况下,日期可以完全完整地写入,甚至具有不同类型的分离器。由于街道,社区,城市和州之间互换的可能性更大,地址中的模式障碍甚至更大。在自然语言处理的背景下,这种性质的问题是由Regex或DateParser等刚性工具来处理的,只要预期的输入是预先配置的,这些问题都是有效的。当这些算法给出意外的格式时,错误和不需要的输出就会发生。为了避免这一挑战,我们提出了一种具有深层神经网络T5状态的解决方案,该解决方案在某些情况下以高于90%的速度处理非接收日期和地址的非接管格式。有了这个模型,我们的建议将概括性的概括为标准化日期和地址的任务。我们还使用嘈杂的数据来处理这个问题,以模拟文本中可能的错误。

Documents of any kind present a wide variety of date and address formats, in some cases dates can be written entirely in full or even have different types of separators. The pattern disorder in addresses is even greater due to the greater possibility of interchanging between streets, neighborhoods, cities and states. In the context of natural language processing, problems of this nature are handled by rigid tools such as ReGex or DateParser, which are efficient as long as the expected input is pre-configured. When these algorithms are given an unexpected format, errors and unwanted outputs happen. To circumvent this challenge, we present a solution with deep neural networks state of art T5 that treats non-preconfigured formats of dates and addresses with accuracy above 90% in some cases. With this model, our proposal brings generalization to the task of normalizing dates and addresses. We also deal with this problem with noisy data that simulates possible errors in the text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源