论文标题
学习深度relu网络是固定参数可进行的
Learning Deep ReLU Networks Is Fixed-Parameter Tractable
论文作者
论文摘要
我们考虑学习有关高斯输入的未知relu网络的问题,并获得深度超过两个网络的第一个非平凡结果。我们给出了一种算法,其运行时间是环境维度的固定多项式,而仅网络参数的某些(指数较大)函数。 我们的边界取决于整个网络的隐藏单元数量,重量矩阵的深度,光谱规范和Lipschitz常数(我们表明对Lipschitz常数的某些依赖是必要的)。我们还给出了一个在网络大小中达到双重指数的界限,但与光谱规范无关。这些结果证明无法使用基于梯度的方法获得,并为梯度下降将无法学习的一类有效学习的神经网络提供了第一个示例。 相比之下,即使上述参数受常数界限,深度三个或更高的学习网络的先前工作也需要在环境维度中的指数时间。此外,对深度案例的所有先前工作都需要条件良好的权重和/或正系数才能获得有效的运行时间。我们的算法不需要这些假设。 我们的主要技术工具是一种过滤的PCA,可用于迭代恢复第一层中隐藏单元跨越的子空间的近似基础。我们的分析利用了热带几何形状的晶格多项式的新结构结果。
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, even when the above parameters are bounded by a constant. Additionally, all prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry.