论文标题
邻里区域平滑正规化,以在深神经网络中找到平坦的最小值
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
论文作者
论文摘要
由于具有严重过度参数化的深神经网络(DNN)中的各种体系结构,正则化技术对于在巨大的假设空间中找到最佳解决方案至关重要。在本文中,我们提出了一种有效的正则化技术,称为邻里区域平滑(NRS)。 NRS利用了一个发现,即模型将受益于融合到平坦的最小值,并试图在体重空间中正规化邻里区域以产生近似的输出。具体而言,基于Kullback-Leibler Divergence定义的度量标准来衡量附近区域中模型之间的差距。该度量标准提供了类似的见解,并具有解释扁平最小值的最小描述长度原理。通过最大程度地减少这种差异和经验损失,NRS可以明确驱动优化器趋向于平坦的最小值。我们通过在常用数据集(例如Cifar和Imagenet)上执行各种模型体系结构执行图像分类任务来确认NRS的有效性,这些模型可以普遍提高概括能力。另外,我们从经验上表明,与常规方法相比,NRS发现的最小值将具有相对较小的Hessian特征值,而Hessian特征值被认为是扁平最小值的证据。
Due to diverse architectures in deep neural networks (DNNs) with severe overparameterization, regularization techniques are critical for finding optimal solutions in the huge hypothesis space. In this paper, we propose an effective regularization technique, called Neighborhood Region Smoothing (NRS). NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs. Specifically, gap between outputs of models in the neighborhood region is gauged by a defined metric based on Kullback-Leibler divergence. This metric provides similar insights with the minimum description length principle on interpreting flat minima. By minimizing both this divergence and empirical loss, NRS could explicitly drive the optimizer towards converging to flat minima. We confirm the effectiveness of NRS by performing image classification tasks across a wide range of model architectures on commonly-used datasets such as CIFAR and ImageNet, where generalization ability could be universally improved. Also, we empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method, which is considered as the evidence of flat minima.