论文标题

高效-ADAM:沟通效率的分布式亚当

Efficient-Adam: Communication-Efficient Distributed Adam

论文作者

Chen, Congliang, Shen, Li, Liu, Wei, Luo, Zhi-Quan

论文摘要

分布式自适应随机梯度方法已广泛用于大规模的非凸优化,例如训练深度学习模型。但是,他们在寻找$ \ varepsilon $ stationary点上的沟通复杂性很少在非convex设置中进行分析。在这项工作中,我们在参数 - 服务器模型中介绍了一种新型的通信分布式ADAM,以进行随机非convex优化,称为{\ em效率 - adam}。具体而言,我们将双向量化方案纳入有效-ADAM,以降低工人和服务器之间的通信成本。同时,我们采用了双向错误反馈策略,以减少服务器和工人的双向量化引起的偏差。此外,我们还通过一类量化运算符建立了提议的有效-ADAM的迭代复杂性,并在实现$ \ varepsilon $ stationary点时进一步表征了服务器和工人之间的通信复杂性。最后,我们应用高效的ADAM来解决玩具随机凸优化问题,并在现实世界的视觉和语言任务上训练深度学习模型。广泛的实验以及理论保证证明了有效的亚当的优点。

Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding $\varepsilon$-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed {\em Efficient-Adam}. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators, and further characterize its communication complexity between the server and workers when an $\varepsilon$-stationary point is achieved. Finally, we apply Efficient-Adam to solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experiments together with a theoretical guarantee justify the merits of Efficient Adam.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源