在语言模型中追踪事实知识回到培训数据

论文标题

在语言模型中追踪事实知识回到培训数据

Towards Tracing Factual Knowledge in Language Models Back to the Training Data

论文作者

Akyürek, Ekin, Bolukbasi, Tolga, Liu, Frederick, Xiong, Binbin, Tenney, Ian, Andreas, Jacob, Guu, Kelvin

论文摘要

语言模型（LMS）已被证明是为了记住其培训数据中包含的大量事实知识。但是，当LM生成断言时，通常很难确定它在哪里学习了此信息以及它是否为真。在本文中，我们提出了事实追踪的问题：确定哪些培训示例教授LM来产生特定的事实主张。关于培训数据归因（TDA）的先前工作可能会提供有效的工具来识别此类示例，称为“支持者”。我们提出了第一个评估这一点的定量基准。我们比较了两个受欢迎的TDA方法的家庭 - 基于梯度和基于嵌入的方法 - 发现了很多余地。例如，这两种方法具有比完全无法访问LM的信息检索基线（BM25）较低。我们确定可能需要进一步改进的主要挑战，例如克服梯度饱和问题，并展示现有神经TDA方法的几个细微差别实施细节如何显着改善整体事实追踪性能。

Language models (LMs) have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. In this paper, we propose the problem of fact tracing: identifying which training examples taught an LM to generate a particular factual assertion. Prior work on training data attribution (TDA) may offer effective tools for identifying such examples, known as "proponents". We present the first quantitative benchmark to evaluate this. We compare two popular families of TDA methods -- gradient-based and embedding-based -- and find that much headroom remains. For example, both methods have lower proponent-retrieval precision than an information retrieval baseline (BM25) that does not have access to the LM at all. We identify key challenges that may be necessary for further improvement such as overcoming the problem of gradient saturation, and also show how several nuanced implementation details of existing neural TDA methods can significantly improve overall fact tracing performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题