具有多个GPU的牛顿共轭梯度方法的神经网

论文标题

具有多个GPU的牛顿共轭梯度方法的神经网

Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

论文作者

Reiz, Severin, Neckel, Tobias, Bungartz, Hans-Joachim

论文摘要

培训深度神经网络消耗了许多计算中心的计算资源份额。通常，采用蛮力的方法来获得高参数值。我们的目标是（1）通过启用对大型神经网络的二阶优化方法的增强，以及（2）对特定任务的性能优化器进行调查，以建议用户最适合他们的问题。我们引入了一种新颖的二阶优化方法，该方法仅需要Hessian对向量的影响，并避免明确设置大型网络的Hessian的巨大成本。我们将提出的二阶方法与两个最先进的优化器进行了比较，这些方法在五个代表性的神经网络问题上进行了比较，包括回归和来自计算机视觉或变异自动编码器的非常深的网络。对于最大的设置，我们用HOROVOD有效地将优化器并行并行，并将其应用于8 GPU NVIDIA P100（DGX-1）机器。

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to perform a survey of the performance optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.

下载PDF全文

下载文献需遵守相关版权规定

论文标题