在神经语言模型中表征逐字记忆的短期记忆

论文标题

在神经语言模型中表征逐字记忆的短期记忆

Characterizing Verbatim Short-Term Memory in Neural Language Models

论文作者

Armeni, Kristijan, Honey, Christopher, Linzen, Tal

论文摘要

当训练语言模型以预测自然语言序列时，其每一刻的预测取决于先前上下文的表示。关于先前上下文的哪种信息可以检索语言模型？我们测试了语言模型是否可以检索文本中先前发生的确切单词。在我们的范式中，语言模型（变形金刚和LSTM）处理了英文文本，其中有两次名词列表。我们将检索运行是从第一个列表减少到第二个列表。我们发现，变形金刚从第一个列表中检索了名词的身份和顺序。此外，当变压器在更大的语料库中训练并具有更大的模型深度时，它们的检索显着增强。最后，它们在代币之前的索引能力取决于学习的注意力模式。相比之下，LSTM表现出不太精确的检索，这仅限于清单界令牌和简短的中间文本。 LSTM的检索对名词的顺序不敏感，当列表在语义上相干时，它的改进。我们得出的结论是，变形金刚实施了类似于工作记忆系统的东西，这些系统可以灵活地检索跨任意延迟的单个令牌表示。相反，LSTM保持了前代币之前的更粗糙，更快速的语义要点，对最早的物品进行了加权。

When a language model is trained to predict natural language sequences, its prediction at each moment depends on a representation of prior context. What kind of information about the prior context can language models retrieve? We tested whether language models could retrieve the exact words that occurred previously in a text. In our paradigm, language models (transformers and an LSTM) processed English text in which a list of nouns occurred twice. We operationalized retrieval as the reduction in surprisal from the first to the second list. We found that the transformers retrieved both the identity and ordering of nouns from the first list. Further, the transformers' retrieval was markedly enhanced when they were trained on a larger corpus and with greater model depth. Lastly, their ability to index prior tokens was dependent on learned attention patterns. In contrast, the LSTM exhibited less precise retrieval, which was limited to list-initial tokens and to short intervening texts. The LSTM's retrieval was not sensitive to the order of nouns and it improved when the list was semantically coherent. We conclude that transformers implemented something akin to a working memory system that could flexibly retrieve individual token representations across arbitrary delays; conversely, the LSTM maintained a coarser and more rapidly-decaying semantic gist of prior tokens, weighted toward the earliest items.

下载PDF全文

下载文献需遵守相关版权规定

论文标题