注意作为激活

论文标题

注意作为激活

Attention as Activation

论文作者

Dai, Yimian, Oehmcke, Stefan, Gieseke, Fabian, Wu, Yiquan, Barnard, Kobus

论文摘要

激活功能和注意机制通常被视为具有不同的目的，并且进化的方式不同。但是，这两个概念都可以作为非线性门控函数进行配合。受其相似性的启发，我们提出了一种新型的激活单元，称为注意激活（ATAC）单位，作为激活函数和注意机制的统一。特别是，我们为同时非线性激活和元素特征改进提供了一个本地通道注意模块，该模块局部汇总了点跨通道特征上下文。通过在卷积网络中的此类ATAC单元替换众所周知的整流线性单元，我们可以构建完全注意力网络，这些网络可以使用数量适度的其他参数来效果明显更好。我们使用具有不同网络深度的几个主机网络对ATAC单元进行了详细的消融研究，以验证单位的有效性和效率。此外，我们将ATAC单元的性能与现有激活函数以及CIFAR-10，CIFAR-100和Imagenet数据集的其他注意力机制进行了比较。我们的实验结果表明，使用拟议的ATAC单位构建的网络通常会比其竞争对手获得相当数量的参数，而竞争者的性能提高。

Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a non-linear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题