深度学习概括，外推和过度参数化

论文标题

深度学习概括，外推和过度参数化

Deep Learning Generalization, Extrapolation, and Over-parameterization

论文作者

Yousefzadeh, Roozbeh

论文摘要

我们研究了与训练集的凸面有关的过度参数深网的概括（用于图像分类）。尽管他们取得了巨大的成功，但深度网络的概括被认为是一个谜。这些模型的数量级比训练样本的参数多，即使训练图像是随机标记的，或者将图像内容的内容替换为随机噪声，它们也可以在训练集上实现完美的准确性。这些模型的训练损失函数具有无限数量的接近零最小化器，其中只有一小部分的最小化器可以很好地概括。总体而言，尚不清楚为什么需要过度参数化模型，为什么我们应该使用非常具体的培训制度来训练它们，以及为什么它们的分类如此容易受到不可见力的对抗性扰动的影响（被称为对逆脆弱性的现象）最近的一些研究已在回答这些问题方面取得了进步，但是，它们仅考虑插值。我们表明，插值不足以了解深网的概括，我们应该扩大观点。

We study the generalization of over-parameterized deep networks (for image classification) in relation to the convex hull of their training sets. Despite their great success, generalization of deep networks is considered a mystery. These models have orders of magnitude more parameters than their training samples, and they can achieve perfect accuracy on their training sets, even when training images are randomly labeled, or the contents of images are replaced with random noise. The training loss function of these models has infinite number of near zero minimizers, where only a small subset of those minimizers generalize well. Overall, it is not clear why models need to be over-parameterized, why we should use a very specific training regime to train them, and why their classifications are so susceptible to imperceivable adversarial perturbations (phenomenon known as adversarial vulnerability) \cite{papernot2016limitations,shafahi2018adversarial,tsipras2018robustness}. Some recent studies have made advances in answering these questions, however, they only consider interpolation. We show that interpolation is not adequate to understand generalization of deep networks and we should broaden our perspective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题