对单词表示模型的全面调查：从经典到最先进的单词表示语言模型

论文标题

对单词表示模型的全面调查：从经典到最先进的单词表示语言模型

A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

论文作者

Naseem, Usman, Razzak, Imran, Khan, Shah Khalid, Prasad, Mukesh

论文摘要

单词表示一直是自然语言处理史（NLP）的重要研究领域。鉴于它具有丰富的信息，并且可以在各种应用程序中广泛使用，因此必须了解这种复杂的文本数据。在这项调查中，我们探讨了不同的单词表示模型及其表达力量，从经典到现代的最先进的单词表示语言模型（LMS）。我们描述了多种文本表示方法，模型设计已在NLP的背景下开花，包括SOTA LMS。这些模型可以将大量文本转换为有效的矢量表示，以捕获相同的语义信息。此外，可以通过各种机器学习（ML）算法来利用此类表示形式来用于各种NLP相关的任务。最后，这项调查简要讨论了常用的基于ML和DL的分类器，评估指标以及这些单词嵌入在不同NLP任务中的应用。

Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题