异步平行随机准牛顿方法

论文标题

异步平行随机准牛顿方法

Asynchronous Parallel Stochastic Quasi-Newton Methods

论文作者

Tong, Qianqian, Liang, Guannan, Cai, Xingyu, Zhu, Chunjiang, Bi, Jinbo

论文摘要

尽管一阶随机算法（例如随机梯度下降）一直是扩大机器学习模型（例如深神经网）的主要力量，但二阶准Newton方法开始引起注意力，因为它们在处理不良条件优化问题方面的有效性。 L-BFGS方法是使用最广泛的准Newton方法之一。我们为随机准牛顿（AsysQN）方法提出了一种异步平行算法。与先前的尝试不同，该尝试仅使L-BFGS的梯度或两循环递归的计算并行，我们的算法是第一个真正使L-BFG和具有收敛保证的L-BFG平行的算法。采用差异技术，尚未用于并行计算的先前随机L-BFGS达到线性收敛速率。我们证明，我们的异步平行方案保持相同的线性收敛速率，但实现了显着的加速。与非平行的随机L-BFG相比，模拟和基准数据集中的经验评估都证明了加速度的加速度，并且在解决不条件问题方面的一阶方法比一阶方法更好。

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题