论文标题
银色:用于评估单词对齐的基于MT的银数据算法
SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment
论文作者
论文摘要
单词对齐对于各种NLP任务至关重要。因此,为其创造选择最佳方法至关重要。但是,黄金评估数据的稀缺性使选择变得困难。我们提出了SilverAlign,这是一种新方法,可以通过利用机器翻译和最小对来创建银数据来评估单词对齐器。我们表明,我们的银数据的性能与9种语言对的金基准相关,使我们的方法成为评估不同域和语言的有效资源,当时没有黄金数据。这解决了低资源语言缺少黄金数据对齐的重要方案。
Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.