论文标题

使用基于变形金刚的自然语言处理方法分析法院文件之间的相似性

Analysing similarities between legal court documents using natural language processing approaches based on Transformers

论文作者

de Oliveira, Raphael Souza, Nascimento, Erick Giovani Sperandio

论文摘要

人工智能(AI)的最新进展已利用有希望的结果来解决自然语言处理领域(NLP)的复杂问题,这是帮助迅速解决法律领域司法程序的重要工具。在这种情况下,这项工作针对了检测推理小组可以实现的司法文件之间相似程度的问题,该问题是通过将基于变形金刚体系结构的六个NLP技术应用于巴西司法系统中法律诉讼的案例研究。基于NLP变形金刚的模型,即Bert,GPT-2和Roberta,使用巴西葡萄牙语的通用语料库进行了预培训,然后使用21万法律程序进行了精心调整并专门针对法律部门。每个法律文档的向量表示是根据其嵌入来计算的,它们用于集群诉讼,根据该组元素到其质心之间的距离余弦来计算每个模型的质量。我们注意到,与以前的传统NLP技术相比,基于变形金刚的模型表现出更好的性能,而罗伯塔模型专门用于巴西葡萄牙语,呈现出最佳的结果。该方法也可以应用于不同语言的其他案例研究,使得在适用于法律部门的NLP领域的当前最新情况下可以促进。

Recent advances in Artificial Intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on the transformers architecture to a case study of legal proceedings in the Brazilian judicial system. The NLP transformer-based models, namely BERT, GPT-2 and RoBERTa, were pre-trained using a general purpose corpora of the Brazilian Portuguese language, and then were fine-tuned and specialised for the legal sector using 210,000 legal proceedings. Vector representations of each legal document were calculated based on their embeddings, which were used to cluster the lawsuits, calculating the quality of each model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers presented better performance when compared to previous traditional NLP techniques, with the RoBERTa model specialised for the Brazilian Portuguese language presenting the best results. This methodology can be also applied to other case studies for different languages, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源