论文标题

在变压器的知识归因中找到模式

Finding patterns in Knowledge Attribution for Transformers

论文作者

Juneja, Jeevesh, Agarwal, Ritu

论文摘要

我们分析了知识神经元框架,以将事实和关系知识归因于变压器网络中的特定神经元。我们为实验使用12层多语言BERT模型。我们的研究揭示了各种有趣的现象。我们观察到,大多数事实知识可以归因于网络的中层和更高层($ \ ge 6 $)。进一步的分析表明,中间层($ 6-9 $)主要负责关系信息,这进一步完善了实际的事实知识或最后几层中的“正确答案”($ 10-12 $)。我们的实验还表明,该模型以不同的语言处理提示,但代表同样的事实,同样,提供了更多语言预训练有效性的进一步证据。将归因方案应用于语法知识时,我们发现语法知识比事实知识更分散在神经元中。

We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge to particular neurons in the transformer network. We use a 12-layer multi-lingual BERT model for our experiments. Our study reveals various interesting phenomena. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network($\ge 6$). Further analysis reveals that the middle layers($6-9$) are mostly responsible for relational information, which is further refined into actual factual knowledge or the "correct answer" in the last few layers($10-12$). Our experiments also show that the model handles prompts in different languages, but representing the same fact, similarly, providing further evidence for effectiveness of multi-lingual pre-training. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源