论文标题
开发乌兹别克语言的单词嵌入
Development of Word Embeddings for Uzbek Language
论文作者
论文摘要
在本文中,我们共享为乌兹别克语语言的西里尔式变体开发单词嵌入的过程。我们工作的结果是使用高质量的Web Crawl语料库中开发的高质量的Web Crawl语料库,该单词向量是在Word2Vec,Glove和FastText算法上训练的单词矢量集。开发的单词嵌入可以在许多自然语言处理下游任务中使用。
In this paper, we share the process of developing word embeddings for the Cyrillic variant of the Uzbek language. The result of our work is the first publicly available set of word vectors trained on the word2vec, GloVe, and fastText algorithms using a high-quality web crawl corpus developed in-house. The developed word embeddings can be used in many natural language processing downstream tasks.