每个梯度下降学到的模型大约都是内核机器

论文标题

每个梯度下降学到的模型大约都是内核机器

Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

论文作者

Domingos, Pedro

论文摘要

深度学习的成功通常归因于其自动发现数据的新表示的能力，而不是依靠手工制作的功能，例如其他学习方法。但是，我们表明，通过标准梯度下降算法学到的深网实际上在数学上相当于内核机，这是一种学习方法，一种简单地记住数据并直接通过相似性函数（kernel）直接将数据用于预测的学习方法。通过阐明它们实际上是训练示例的叠加，这大大提高了深层网络权重的解释性。网络架构将目标功能的知识纳入内核。这种改善的理解应该导致更好的学习算法。

Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题