论文标题

MLQE-PE:多语言质量估计和后编辑数据集

MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

论文作者

Fomicheva, Marina, Sun, Shuo, Fonseca, Erick, Zerva, Chrysoula, Blain, Frédéric, Chaudhary, Vishrav, Guzmán, Francisco, Lopatina, Nina, Specia, Lucia, Martins, André F. T.

论文摘要

我们提出了MLQE-PE,这是一种用于机器翻译(MT)质量估计(QE)和自动后编辑(APE)的新数据集。该数据集包含11个语言对,每个语言对的人体标签最多可用于以下格式:句子级直接评估和编辑后的工作,以及单词级的好/坏标签。它还包含后编辑的句子,以及从中提取句子的文章的标题,以及用于翻译文本的神经MT模型。

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源