对比深贝叶斯线性回归中的随机和学习的特征

论文标题

对比深贝叶斯线性回归中的随机和学习的特征

Contrasting random and learned features in deep Bayesian linear regression

论文作者

Zavatone-Veth, Jacob A., Tong, William L., Pehlevan, Cengiz

论文摘要

了解特征学习如何影响概括是现代深度学习理论的最重要目标之一。在这里，我们研究了学习表示的能力如何影响一类简单模型的概括性能：深贝叶斯线性神经网络接受了非结构化高斯数据训练。通过将深层的随机特征模型与所有训练所有层的深层网络进行比较，我们可以详细介绍宽度，深度，数据密度和事先不匹配之间的相互作用。我们表明，在存在标签噪声的情况下，这两种模型都显示出样本的双重变态行为。如果有狭窄的瓶颈层，那么随机特征模型还可以显示模型的双重时光，而深网不显示这些差异。随机特征模型可以具有特定的宽度，这些宽度对于在给定的数据密度下是最佳的概括，同时使神经网络尽可能宽或狭窄始终是最佳的。此外，我们表明，对内核限制学习曲线的前阶校正无法区分所有训练所有层的随机特征模型和深层网络。综上所述，我们的发现开始阐明体系结构细节如何影响这种简单的深层回归模型类别的概括性能。

Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题