论文标题
关于机器翻译质量估计的实际观点
Practical Perspectives on Quality Estimation for Machine Translation
论文作者
论文摘要
机器翻译(MT)的句子级别质量估计(QE)试图预测纠正MT输出所需的后编辑工作的翻译编辑率(TER)成本。我们描述了我们对句子级别量化宽松的看法,如行业中遇到的几种实际设置所决定的。我们发现,MT输出的消费者 - - 无论是人类还是算法的消费者 - 主要对二元质量指标感兴趣:翻译的句子是否足够,还是需要后编辑后编辑?在此的激励下,我们提出了质量分类(QC)对句子级别量化宽松的观点,我们将重点放在以高于给定阈值的精确度上最大化召回率。我们证明,尽管经典的量化量化宽松回归模型在此任务上的表现不佳,但可以通过用二进制分类替换输出回归层来重新使用它们,从而在90 \%的精度下实现50-60 \%的召回率。对于产生75-80 \%正确翻译的高质量MT系统,这确实有望大大降低后编辑工作。
Sentence level quality estimation (QE) for machine translation (MT) attempts to predict the translation edit rate (TER) cost of post-editing work required to correct MT output. We describe our view on sentence-level QE as dictated by several practical setups encountered in the industry. We find consumers of MT output---whether human or algorithmic ones---to be primarily interested in a binary quality metric: is the translated sentence adequate as-is or does it need post-editing? Motivated by this we propose a quality classification (QC) view on sentence-level QE whereby we focus on maximizing recall at precision above a given threshold. We demonstrate that, while classical QE regression models fare poorly on this task, they can be re-purposed by replacing the output regression layer with a binary classification one, achieving 50-60\% recall at 90\% precision. For a high-quality MT system producing 75-80\% correct translations, this promises a significant reduction in post-editing work indeed.