论文标题
贝叶式:具有不确定性估计的变压器
BayesFormer: Transformer with Uncertainty Estimation
论文作者
论文摘要
由于其在各种NLP和图像处理任务中的主要性能,变压器已变得无处不在。但是,它缺乏对如何生成变压器体系结构的数学基础不确定性估计的理解。配备了这种不确定性估计的模型通常可以改善预测性能,使网络稳健,避免过度合适并用作主动学习中的获取功能。在本文中,我们介绍了Bayesformer,这是一个由贝叶斯理论设计的辍学者的变压器模型。我们提出了一个新的理论框架,以将基于变异推理的大约基于变异的辍学扩展到基于变压器的体系结构。通过广泛的实验,我们以四个范式验证了所提出的体系结构,并显示了整体的改进:语言建模和分类,长期序列理解,机器翻译和获取功能,以进行主动学习。
Transformer has become ubiquitous due to its dominant performance in various NLP and image processing tasks. However, it lacks understanding of how to generate mathematically grounded uncertainty estimates for transformer architectures. Models equipped with such uncertainty estimates can typically improve predictive performance, make networks robust, avoid over-fitting and used as acquisition function in active learning. In this paper, we introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory. We proposed a new theoretical framework to extend the approximate variational inference-based dropout to Transformer-based architectures. Through extensive experiments, we validate the proposed architecture in four paradigms and show improvements across the board: language modeling and classification, long-sequence understanding, machine translation and acquisition function for active learning.