关于神经网络和无梯度培训的教程

论文标题

关于神经网络和无梯度培训的教程

A Tutorial on Neural Networks and Gradient-free Training

论文作者

Rozario, Turibius, Trivedi, Arjun, Goel, Ankit

论文摘要

本文以独立的教程方式介绍了神经网络的紧凑，基于基质的表示。具体而言，我们开发神经网络作为几个矢量值函数的组成。尽管在相互联系的神经元方面，神经网络在图像中得到了很好的理解，但神经网络是数学非线性函数，该函数通过构成多个矢量值函数而构建。使用线性代数的基本结果，我们代表神经网络作为线性图和标量非线性函数的交替序列，也称为激活函数。神经网络的训练需要最小化成本函数，这反过来又需要计算梯度。使用基本的多变量计算结果，成本梯度也被证明是由线性图和非线性函数序列组成的函数。除了进行分析梯度计算外，我们还考虑了两种无梯度训练方法，并根据收敛率和预测准确性比较了三种训练方法。

This paper presents a compact, matrix-based representation of neural networks in a self-contained tutorial fashion. Specifically, we develop neural networks as a composition of several vector-valued functions. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are mathematical nonlinear functions constructed by composing several vector-valued functions. Using basic results from linear algebra, we represent a neural network as an alternating sequence of linear maps and scalar nonlinear functions, also known as activation functions. The training of neural networks requires the minimization of a cost function, which in turn requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a function composed of a sequence of linear maps and nonlinear functions. In addition to the analytical gradient computation, we consider two gradient-free training methods and compare the three training methods in terms of convergence rate and prediction accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题