论文标题

Hinglisheval的Bits Pilani:使用Transformers的代码混合Hinglish文本的质量评估

BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers

论文作者

Furniturewala, Shaz, Kumari, Vijay, Dash, Amulya Ratna, Kedia, Hriday, Sharma, Yashvardhan

论文摘要

代码混合的文本数据包括具有多种语言的单词或短语的句子。全世界大多数多种语言社区都使用多种语言进行交流,而英语通常是其中之一。 Hinglish是由印地语和英语组成的代码混合文本,但用罗马脚本编写。本文旨在确定影响系统生成的代码混合文本数据质量的因素。对于Hinglisheval任务,提出的模型使用多语言BERT来找到合成生成和人类生成的句子之间的相似性,以预测合成产生的Hinglish句子的质量。

Code-Mixed text data consists of sentences having words or phrases from more than one language. Most multi-lingual communities worldwide communicate using multiple languages, with English usually one of them. Hinglish is a Code-Mixed text composed of Hindi and English but written in Roman script. This paper aims to determine the factors influencing the quality of Code-Mixed text data generated by the system. For the HinglishEval task, the proposed model uses multi-lingual BERT to find the similarity between synthetically generated and human-generated sentences to predict the quality of synthetically generated Hinglish sentences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源