被动批次注射训练技术：通过从不同的数据分布中注入迷你批次来提高网络性能

论文标题

被动批次注射训练技术：通过从不同的数据分布中注入迷你批次来提高网络性能

Passive Batch Injection Training Technique: Boosting Network Performance by Injecting Mini-Batches from a different Data Distribution

论文作者

Singh, Pravendra, Mazumder, Pratik, Namboodiri, Vinay P.

论文摘要

这项工作为深神经网络提供了一种新颖的培训技术，该技术利用与原始输入数据不同的分布中的其他数据。该技术旨在减少过度拟合并改善网络的概括性能。我们提出的技术，即被动批次注射训练技术（PBITT），甚至降低了已经使用标准技术来减少过度拟合的网络中的过度拟合水平，例如$ L_2 $正则化和批准归一化，从而导致了显着准确的改进。被动批次注射培训技术（PBITT）将一些被动的小型批次引入训练过程中，该过程包含来自分布的数据，该分布与输入数据分布不同。该技术不会增加最终模型中参数的数量，也不会增加推理（测试）时间，但仍然可以改善深CNN的性能。据我们所知，这是使用不同数据分布来帮助卷积神经网络（CNN）的培训的第一项工作。我们对标准体系结构（VGG，Resnet和WideSnet）以及几个流行的数据集（CIFAR-10，CIFAR-100，CIFAR-100，SVHN和Imagenet）彻底评估了提出的方法。我们通过使用建议的技术观察到一致的准确性提高。我们还通过实验表明，通过我们的技术训练的模型可以很好地推广到其他任务，例如使用更快的R-CNN在MS-COCO数据集中检测到对象检测。我们提出了广泛的消融来验证所提出的方法。我们的方法将VGG-16的准确性提高了CIFAR-100数据集的显着余量2.1％。

This work presents a novel training technique for deep neural networks that makes use of additional data from a distribution that is different from that of the original input data. This technique aims to reduce overfitting and improve the generalization performance of the network. Our proposed technique, namely Passive Batch Injection Training Technique (PBITT), even reduces the level of overfitting in networks that already use the standard techniques for reducing overfitting such as $L_2$ regularization and batch normalization, resulting in significant accuracy improvements. Passive Batch Injection Training Technique (PBITT) introduces a few passive mini-batches into the training process that contain data from a distribution that is different from the input data distribution. This technique does not increase the number of parameters in the final model and also does not increase the inference (test) time but still improves the performance of deep CNNs. To the best of our knowledge, this is the first work that makes use of different data distribution to aid the training of convolutional neural networks (CNNs). We thoroughly evaluate the proposed approach on standard architectures: VGG, ResNet, and WideResNet, and on several popular datasets: CIFAR-10, CIFAR-100, SVHN, and ImageNet. We observe consistent accuracy improvement by using the proposed technique. We also show experimentally that the model trained by our technique generalizes well to other tasks such as object detection on the MS-COCO dataset using Faster R-CNN. We present extensive ablations to validate the proposed approach. Our approach improves the accuracy of VGG-16 by a significant margin of 2.1% over the CIFAR-100 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题