论文标题
通过平滑插值深度下降
Deep Double Descent via Smooth Interpolation
论文作者
论文摘要
最近,过度参数化的深网插值嘈杂数据的能力,同时显示出良好的概括性能,但最近以双重下降曲线的测试误差来表征。多项式回归中的常见直觉表明,过度参数化的网络能够在没有太大偏离地面真实信号的情况下急剧插值嘈杂的数据,从而保留了概括能力。目前,缺少对深网的插值和概括之间关系的精确表征。在这项工作中,我们通过研究损失景观W.R.T.来量化神经网络函数插值的训练数据的舒适性。在本地到每个训练点的输入变量,随着我们系统地增加模型参数和训练时期的数量,围绕清洁和噪声标记的训练样本的体积超过了。我们的发现表明,输入空间中的损失清晰度均遵循模型和时期双重下降,在嘈杂的标签周围观察到了较差的峰值。虽然小型插值模型尤其适合干净和嘈杂的数据,但大型插值模型表达了平稳的损失景观,与现有直觉相比,训练数据点的大量噪声目标被预测在训练数据点周围的大量范围内。
The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.