论文标题
贝叶斯关于训练速度和模型选择的观点
A Bayesian Perspective on Training Speed and Model Selection
论文作者
论文摘要
我们采用贝叶斯的观点来说明线性模型中训练速度与边际可能性之间的联系。这提供了两个主要见解:首先,可以使用模型训练速度的度量来估计其边际可能性。其次,在某些条件下,该度量预测了模型在线性模型组合中的相对加权,该模型组合训练以最大程度地减少回归损失。我们在线性模型的模型选择任务以及深神经网络的无限宽度限制中验证了我们的结果。我们进一步提供了令人鼓舞的经验证据,即这些环境中开发的直觉也适用于接受随机梯度下降训练的深神经网络。我们的结果表明了一个有希望的新方向,以解释为什么接受随机梯度下降训练的神经网络偏向概括的功能。
We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. Our results suggest a promising new direction towards explaining why neural networks trained with stochastic gradient descent are biased towards functions that generalize well.