神经网络可以推断吗？佩德罗·多明格斯（Pedro Domingos）对定理的讨论

论文标题

神经网络可以推断吗？佩德罗·多明格斯（Pedro Domingos）对定理的讨论

Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos

论文作者

Courtois, Adrien, Morel, Jean-Michel, Arias, Pablo

论文摘要

通过最大程度地减少损失，在大型数据集上培训的神经网络已成为解决数据科学问题的最新方法，尤其是在计算机视觉，图像处理和自然语言处理中。尽管取得了惊人的结果，但我们对神经网络如何运作的理论理解是有限的。特别是，受过训练的神经网络的插值功能是什么？在本文中，我们讨论了Domingos的定理，指出“连续梯度下降学到的每台机器大约都是内核机器”。根据多明格斯（Domingos）的说法，这一事实得出的结论是，所有在数据上培训的机器都是核心机器。我们首先将Domingo的结果扩展到离散的情况下，并扩展到具有向量值输出的网络。然后，我们在简单的例子上研究其相关性和意义。我们发现，在简单的情况下，多明格斯定理中产生的“神经切线内核”确实提供了对网络预测的理解。此外，当赋予网络的任务增长时，网络的插值能力可以通过Domingos的定理有效地解释，因此受到限制。我们在经典的感知理论问题上说明了这一事实：从其边界中恢复形状。

Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the interpolation capabilities of trained neural networks? In this paper we discuss a theorem of Domingos stating that "every machine learned by continuous gradient descent is approximately a kernel machine". According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo's result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the "neural tangent kernel" arising in Domingos' theorem does provide understanding of the networks' predictions. Furthermore, when the task given to the network grows in complexity, the interpolation capability of the network can be effectively explained by Domingos' theorem, and therefore is limited. We illustrate this fact on a classic perception theory problem: recovering a shape from its boundary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题