二阶优化器的深度神经网络学习 - 一项使用随机的准加斯 - 纽顿方法的实践研究

论文标题

二阶优化器的深度神经网络学习 - 一项使用随机的准加斯 - 纽顿方法的实践研究

Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

论文作者

Thiele, Christopher, Araya-Polo, Mauricio, Hohl, Detlef

论文摘要

监督深度学习中的培训是计算要求的，并且融合行为通常不完全理解。我们介绍并研究了一种二阶准加斯 - 纽顿（SQGN）优化方法，该方法结合了从随机准牛顿方法，高斯 - 纽顿方法和降低方差来解决此问题中的思想。 SQGN提供了出色的准确性，而无需尝试许多高参数配置，鉴于组合的数量和每个训练过程的成本，通常在计算上是过度的。我们讨论了SQGN使用TensorFlow的实现，并使用MNIST基准和地球科学的大规模地震层析成像应用将其收敛性和计算性能与所选的一阶方法进行了比较。

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.

下载PDF全文

下载文献需遵守相关版权规定

论文标题