论文标题
概率对人们细读的预测:评估心理语言建模语言模型表现的指标
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
论文作者
论文摘要
通过在自然主义阅读时间与信息理论惊奇之间建立关系,惊奇理论(Hale,2001; Levy,2008)提供了语言模型与心理语言模型之间的自然接口。本文重新评估了Goodkind和Bicknell(2018)引起的主张,即语言模型对阅读时间建模的能力是其困惑的线性函数。通过将Goodkind和Bicknell的分析扩展到现代神经体系结构,我们表明,所提出的关系并不总是适用于长期的短期记忆网络,变压器和预训练的模型。我们介绍了一种基于人类受试者测得的固定概率的语言建模性能的替代度量,称为可预测性规范相关性。我们的新指标在语言模型质量和心理语言建模性能之间产生了更加牢固的关系,从而可以在具有不同训练配置的模型之间进行比较。
By positing a relationship between naturalistic reading times and information-theoretic surprisal, surprisal theory (Hale, 2001; Levy, 2008) provides a natural interface between language models and psycholinguistic models. This paper re-evaluates a claim due to Goodkind and Bicknell (2018) that a language model's ability to model reading times is a linear function of its perplexity. By extending Goodkind and Bicknell's analysis to modern neural architectures, we show that the proposed relation does not always hold for Long Short-Term Memory networks, Transformers, and pre-trained models. We introduce an alternate measure of language modeling performance called predictability norm correlation based on Cloze probabilities measured from human subjects. Our new metric yields a more robust relationship between language model quality and psycholinguistic modeling performance that allows for comparison between models with different training configurations.