通过控制神经网络重量中的标签噪声信息来改善概括

论文标题

通过控制神经网络重量中的标签噪声信息来改善概括

Improving Generalization by Controlling Label-Noise Information in Neural Network Weights

论文作者

Harutyunyan, Hrayr, Reing, Kyle, Steeg, Greg Ver, Galstyan, Aram

论文摘要

在存在嘈杂或不正确的标签的情况下，神经网络具有记忆有关噪声信息的不良趋势。标准的正则化技术，例如辍学，重量衰减或数据增强，有时会有所帮助，但不能阻止这种行为。如果将神经网络权重视为取决于训练的数据和随机性的随机变量，则可以使用给定输入的所有训练标签（W; \ MathBf {Y} {Y} \ MID \ MATHBF {x} {x}）$量化记忆信息的量。我们表明，对于任何训练算法，该术语的低值对应于标签噪声和更好概括界的记忆的减少。为了获得这些低值，我们提出了采用辅助网络的培训算法，该辅助网络可以预测分类器的最终层中梯度而无需访问标签。我们说明了我们方法对MNIST，CIFAR-10和CIFAR-100版本的有效性，这些版本被各种噪声模型损坏，以及具有嘈杂标签的大型数据集服装。

In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but do not prevent this behavior. If one considers neural network weights as random variables that depend on the data and stochasticity of training, the amount of memorized information can be quantified with the Shannon mutual information between weights and the vector of all training labels given inputs, $I(w ; \mathbf{y} \mid \mathbf{x})$. We show that for any training algorithm, low values of this term correspond to reduction in memorization of label-noise and better generalization bounds. To obtain these low values, we propose training algorithms that employ an auxiliary network that predicts gradients in the final layers of a classifier without accessing labels. We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题