Omni维动态卷积

论文标题

Omni维动态卷积

Omni-Dimensional Dynamic Convolution

论文作者

Li, Chao, Zhou, Aojun, Yao, Anbang

论文摘要

在每个卷积层中学习一个静态卷积内核是现代卷积神经网络（CNN）的常见训练范式。取而代之的是，动态卷积的最新研究表明，学习$ n $卷积内核与输入依赖性注意力的线性组合可以显着提高轻重量CNN的准确性，同时保持有效的推断。但是，我们观察到，现有的作品endow卷积内核具有通过一个维度（关于卷积内核编号）的动态属性的内核空间，但是其他三个维度（关于每个卷积内核的空间大小，输入通道号和输出通道号）。在此的启发下，我们提出了一种更普遍而优雅的动态卷积设计，以推进这一研究。 ODCONV利用了一种新型的多维注意机制，并采用平行的策略来学习沿着任何卷积层的内核空间的所有四个维度的卷积内核的互补注意力。作为定期卷积的倒数替换，可以将ODCONV插入许多CNN架构中。 ImageNet和MS-Coco数据集的广泛实验表明，ODCONV为各种盛行的CNN骨干带来了可靠的准确性，包括轻量重量和大型骨架，例如，3.77％〜5.71％| 1.86％〜1.86％〜3.72％的绝对1个绝对1的绝对1改进对MobivLenEtV2 | Resnetv2 | Resnet Datases in the Imasenet the Imagenet the Imagenet the Imagenet。有趣的是，由于其功能学习能力的提高，即使有一个内核的ODCONV也可以与具有多个内核的现有动态卷积对应物竞争或胜过，从而大大降低了额外的参数。此外，ODCONV也优于其他注意模块，用于调节输出特征或卷积重量。

Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题