论文标题
Hinglisheval的Bits Pilani:使用Transformers的代码混合Hinglish文本的质量评估
BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers
论文作者
论文摘要
代码混合的文本数据包括具有多种语言的单词或短语的句子。全世界大多数多种语言社区都使用多种语言进行交流,而英语通常是其中之一。 Hinglish是由印地语和英语组成的代码混合文本,但用罗马脚本编写。本文旨在确定影响系统生成的代码混合文本数据质量的因素。对于Hinglisheval任务,提出的模型使用多语言BERT来找到合成生成和人类生成的句子之间的相似性,以预测合成产生的Hinglish句子的质量。
Code-Mixed text data consists of sentences having words or phrases from more than one language. Most multi-lingual communities worldwide communicate using multiple languages, with English usually one of them. Hinglish is a Code-Mixed text composed of Hindi and English but written in Roman script. This paper aims to determine the factors influencing the quality of Code-Mixed text data generated by the system. For the HinglishEval task, the proposed model uses multi-lingual BERT to find the similarity between synthetically generated and human-generated sentences to predict the quality of synthetically generated Hinglish sentences.