论文标题
引导注意力与变压器进行自我监督的学习
Guiding Attention for Self-Supervised Learning with Transformers
论文作者
论文摘要
在本文中,我们提出了一种简单有效的技术,可以通过双向变压器进行有效的自学学习。最近的研究激发了我们的方法,该研究表明,训练有素的模型中的自我发场模式包含大多数非语言规律性。我们提出了一个计算有效的辅助损失函数,以指导注意力头以符合此类模式。我们的方法对实际训练的实际训练目标不可知,并且与基线相比,在下游任务上可以更快地收敛,并且在下游任务上具有更好的性能,从而实现了最新的状态,从而导致了低资源设置。令人惊讶的是,我们还发现,注意力头的语言特性不一定与语言建模性能相关。
In this paper, we propose a simple and effective technique to allow for efficient self-supervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pre-training objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, achieving state of the art results in low-resource settings. Surprisingly, we also find that linguistic properties of attention heads are not necessarily correlated with language modeling performance.