在变压器的知识归因中找到模式

论文标题

在变压器的知识归因中找到模式

Finding patterns in Knowledge Attribution for Transformers

论文作者

Juneja, Jeevesh, Agarwal, Ritu

论文摘要

我们分析了知识神经元框架，以将事实和关系知识归因于变压器网络中的特定神经元。我们为实验使用12层多语言BERT模型。我们的研究揭示了各种有趣的现象。我们观察到，大多数事实知识可以归因于网络的中层和更高层（$ \ ge 6 $）。进一步的分析表明，中间层（$ 6-9 $）主要负责关系信息，这进一步完善了实际的事实知识或最后几层中的“正确答案”（$ 10-12 $）。我们的实验还表明，该模型以不同的语言处理提示，但代表同样的事实，同样，提供了更多语言预训练有效性的进一步证据。将归因方案应用于语法知识时，我们发现语法知识比事实知识更分散在神经元中。

We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge to particular neurons in the transformer network. We use a 12-layer multi-lingual BERT model for our experiments. Our study reveals various interesting phenomena. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network($\ge 6$). Further analysis reveals that the middle layers($6-9$) are mostly responsible for relational information, which is further refined into actual factual knowledge or the "correct answer" in the last few layers($10-12$). Our experiments also show that the model handles prompts in different languages, but representing the same fact, similarly, providing further evidence for effectiveness of multi-lingual pre-training. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题