论文标题
设计精确且强大的对话响应评估人员
Designing Precise and Robust Dialogue Response Evaluators
论文作者
论文摘要
已提出自动对话响应评估者作为自动指标和人类评估的替代方法。但是,现有的自动评估者仅与人类判断力相关,但它们并不强大。在这项工作中,我们建议建立一个无参考的评估者,并利用半监督培训和审议的(蒙版)语言模型的力量。实验结果表明,拟议的评估者与人类判断力达到了牢固的相关性(> 0.6),并将其推广到多样化的反应和语料库中。我们在https://github.com/zhaoting/dialog-processing中打开代码和数据。
Automatic dialogue response evaluator has been proposed as an alternative to automated metrics and human evaluation. However, existing automatic evaluators achieve only moderate correlation with human judgement and they are not robust. In this work, we propose to build a reference-free evaluator and exploit the power of semi-supervised training and pretrained (masked) language models. Experimental results demonstrate that the proposed evaluator achieves a strong correlation (> 0.6) with human judgement and generalizes robustly to diverse responses and corpora. We open-source the code and data in https://github.com/ZHAOTING/dialog-processing.