论文标题

引导注意力与变压器进行自我监督的学习

Guiding Attention for Self-Supervised Learning with Transformers

论文作者

Deshpande, Ameet, Narasimhan, Karthik

论文摘要

在本文中,我们提出了一种简单有效的技术,可以通过双向变压器进行有效的自学学习。最近的研究激发了我们的方法,该研究表明,训练有素的模型中的自我发场模式包含大多数非语言规律性。我们提出了一个计算有效的辅助损失函数,以指导注意力头以符合此类模式。我们的方法对实际训练的实际训练目标不可知,并且与基线相比,在下游任务上可以更快地收敛,并且在下游任务上具有更好的性能,从而实现了最新的状态,从而导致了低资源设置。令人惊讶的是,我们还发现,注意力头的语言特性不一定与语言建模性能相关。

In this paper, we propose a simple and effective technique to allow for efficient self-supervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pre-training objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, achieving state of the art results in low-resource settings. Surprisingly, we also find that linguistic properties of attention heads are not necessarily correlated with language modeling performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源