论文标题
用于评估生成文本的语言质量的Gruen
GRUEN for Evaluating Linguistic Quality of Generated Text
论文作者
论文摘要
自动评估指标对于评估生成的文本是必不可少的。迄今为止,这些指标几乎完全集中在系统输出的内容选择方面,完全忽略了语言质量方面。我们通过提出Gruen来评估语法,非差额,焦点,结构和连贯性来弥合这一差距。 Gruen利用基于BERT的模型和一类句法,语义和上下文功能来检查系统输出。与大多数需要人类参考作为输入的现有评估指标不同,Gruen无参考,仅需要系统输出。此外,它具有不受监督,确定性和适应各种任务的优势。对四个语言生成任务的七个数据集进行的实验表明,所提出的指标与人类判断高度相关。
Automatic evaluation metrics are indispensable for evaluating generated text. To date, these metrics have focused almost exclusively on the content selection aspect of the system output, ignoring the linguistic quality aspect altogether. We bridge this gap by proposing GRUEN for evaluating Grammaticality, non-Redundancy, focUs, structure and coherENce of generated text. GRUEN utilizes a BERT-based model and a class of syntactic, semantic, and contextual features to examine the system output. Unlike most existing evaluation metrics which require human references as an input, GRUEN is reference-less and requires only the system output. Besides, it has the advantage of being unsupervised, deterministic, and adaptable to various tasks. Experiments on seven datasets over four language generation tasks show that the proposed metric correlates highly with human judgments.