SplitMixer：从MLP样型号修剪的脂肪

论文标题

SplitMixer：从MLP样型号修剪的脂肪

SplitMixer: Fat Trimmed From MLP-like Models

论文作者

Borji, Ali, Lin, Sikun

论文摘要

我们提出了SplitMixer，这是一种简单且轻巧的各向同性MLP型体系结构，用于视觉识别。它包含两种类型的交织卷积操作，以在空间位置（空间混合）和通道（通道混合）之间混合信息。第一个包括依次应用两个深度的1D内核，而不是2D内核来混合空间信息。第二个是将通道分成或没有共享参数的重叠或非重叠段，并应用我们提出的通道混合方法或3D卷积以混合通道信息。根据设计选择，可以构建许多拆分变体，以平衡准确性，参数数量和速度。我们在理论上和实验上都表明，SplitMixer在最先进的MLP样模型上表现出色，同时具有明显较低的参数和拖船。例如，如果没有强大的数据增强和优化，SplitMixer仅使用0.28亿参数就可以在CIFAR-10上获得94％的精度，而Convmixer则使用约0.60万参数实现了相同的精度。众所周知的MLP混合仪以1710万参数实现了85.45％。在CIFAR-100数据集上，SplitMixer的准确性约为73％，与Convmixer相当，但参数和拖鞋少约52％。我们希望我们的结果能够激发进一步的研究，以寻找更有效的视力体系结构，并促进类似MLP的模型的发展。代码可在https://github.com/aliborji/splitmixer上找到。

We present SplitMixer, a simple and lightweight isotropic MLP-like architecture, for visual recognition. It contains two types of interleaving convolutional operations to mix information across spatial locations (spatial mixing) and channels (channel mixing). The first one includes sequentially applying two depthwise 1D kernels, instead of a 2D kernel, to mix spatial information. The second one is splitting the channels into overlapping or non-overlapping segments, with or without shared parameters, and applying our proposed channel mixing approaches or 3D convolution to mix channel information. Depending on design choices, a number of SplitMixer variants can be constructed to balance accuracy, the number of parameters, and speed. We show, both theoretically and experimentally, that SplitMixer performs on par with the state-of-the-art MLP-like models while having a significantly lower number of parameters and FLOPS. For example, without strong data augmentation and optimization, SplitMixer achieves around 94% accuracy on CIFAR-10 with only 0.28M parameters, while ConvMixer achieves the same accuracy with about 0.6M parameters. The well-known MLP-Mixer achieves 85.45% with 17.1M parameters. On CIFAR-100 dataset, SplitMixer achieves around 73% accuracy, on par with ConvMixer, but with about 52% fewer parameters and FLOPS. We hope that our results spark further research towards finding more efficient vision architectures and facilitate the development of MLP-like models. Code is available at https://github.com/aliborji/splitmixer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题