论文标题

通用变压器的注意力流动

Attention Flows for General Transformers

论文作者

Metzger, Niklas, Hahn, Christopher, Siber, Julian, Schmitt, Frederik, Finkbeiner, Bernd

论文摘要

在本文中,我们研究了变压器模型中输入令牌的计算,从而影响了其预测。我们将一种方法正式化了一种方法,可以从纯编码变压器模型的注意值中构建流网络,并将其扩展到包括自动回归解码器在内的一般变压器体系结构。我们表明,在流网络构造上运行MaxFlow算法会产生Shapley值,这决定了玩家在合作游戏理论中的影响。通过将流动网络中的输入令牌解释为播放器,我们可以计算它们对总体注意力流的影响,从而导致解码器的决定。此外,我们提供了一个计算和可视化任意变压器模型的注意力流动的库。我们显示了实施对自然语言处理和推理任务培训的各种模型的实用性。

In this paper, we study the computation of how much an input token in a Transformer model influences its prediction. We formalize a method to construct a flow network out of the attention values of encoder-only Transformer models and extend it to general Transformer architectures including an auto-regressive decoder. We show that running a maxflow algorithm on the flow network construction yields Shapley values, which determine the impact of a player in cooperative game theory. By interpreting the input tokens in the flow network as players, we can compute their influence on the total attention flow leading to the decoder's decision. Additionally, we provide a library that computes and visualizes the attention flow of arbitrary Transformer models. We show the usefulness of our implementation on various models trained on natural language processing and reasoning tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源