论文标题

ASPO:使用深度学习方法

AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach

论文作者

Pathak, Dhrubajyoti, Nandi, Sukumar, Sarmah, Priyankoo

论文摘要

语音(POS)标签的一部分对于自然语言处理(NLP)至关重要。在几种资源丰富的语言中,这是一个精心研究的话题。但是,尽管存在许多历史和文学丰富的语言,但计算语言资源的发展仍处于起步阶段。印度预定语言阿萨姆人(Assamese)属于这一类别。在本文中,我们为阿萨姆人提供了一个基于深度学习(DL)的POS标记。开发过程分为两个阶段。在第一阶段,使用了几种预训练的单词嵌入式来训练多个标记模型。这使我们能够评估POS标记任务中单词嵌入式的性能。第一阶段的最佳模型被用来注释另一组新句子。在第二阶段,使用新的数据集对模型进行进一步训练。最后,我们在F1分数中获得了86.52%的标签精度。该模型可以用作进一步研究基于DL的Assamese POS标签的基线。

Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP). It is a well-studied topic in several resource-rich languages. However, the development of computational linguistic resources is still in its infancy despite the existence of numerous languages that are historically and literary rich. Assamese, an Indian scheduled language, spoken by more than 25 million people, falls under this category. In this paper, we present a Deep Learning (DL)-based POS tagger for Assamese. The development process is divided into two stages. In the first phase, several pre-trained word embeddings are employed to train several tagging models. This allows us to evaluate the performance of the word embeddings in the POS tagging task. The top-performing model from the first phase is employed to annotate another set of new sentences. In the second phase, the model is trained further using the fresh dataset. Finally, we attain a tagging accuracy of 86.52% in F1 score. The model may serve as a baseline for further study on DL-based Assamese POS tagging.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源