论文标题

在异质数据上使用一位压缩机的分布式非凸优化:有效且有弹性的算法

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

论文作者

Xiang, Ming, Su, Lili

论文摘要

联合学习(FL)是一个新生的分散学习框架,在该框架下,大量的异质客户在不透露其本地数据的情况下协作训练模型。稀缺的沟通,隐私泄漏和拜占庭式攻击是系统可伸缩性的关键瓶颈。在本文中,我们专注于用于非凸优化的沟通效率分布式(随机)梯度下降,这是FL的驱动力。我们提出了两种算法,称为{\ em自适应随机标志SGD(ADA-STOSIGN)}和{\ em $ $ $β$ - 稳态标志SGD($β$ -Stosign)},每个词都会将局部梯度压缩到位载体中。 为了处理无界的梯度,ADA -STOSIGN使用了一种新颖的标准跟踪功能,该功能可自适应地调整本地梯度的$ \ ell _ {\ elfty} $的粗略估计 - 用于梯度压缩中的关键参数。我们表明,ADA-STOSIGN在期望中收敛,价格$ o(\ log t/\ sqrt {t} + 1/\ sqrt {m})$,其中$ m $是客户端的数量。据我们所知,当$ m $足够大时,ADA-Stosign优于基于最新的标志方法,其收敛速率为$ O(T^{ - 1/4})$。在有限的梯度假设下,$β$ stosign可实现可量化的拜占庭式弹性和隐私保证,并与部分客户的参与和微型批次梯度合作,这些梯度可能是无限的。我们通过对MNIST和CIFAR-10数据集的实验来证实和补充我们的理论。

Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em $β$-Stochastic Sign SGD ($β$-StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the $\ell_{\infty}$ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} + 1/\sqrt{M})$, where $M$ is the number of clients. To the best of our knowledge, when $M$ is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is $O(T^{-1/4})$. Under bounded gradient assumption, $β$-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源