在异质数据上使用一位压缩机的分布式非凸优化：有效且有弹性的算法

论文标题

在异质数据上使用一位压缩机的分布式非凸优化：有效且有弹性的算法

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

论文作者

Xiang, Ming, Su, Lili

论文摘要

联合学习（FL）是一个新生的分散学习框架，在该框架下，大量的异质客户在不透露其本地数据的情况下协作训练模型。稀缺的沟通，隐私泄漏和拜占庭式攻击是系统可伸缩性的关键瓶颈。在本文中，我们专注于用于非凸优化的沟通效率分布式（随机）梯度下降，这是FL的驱动力。我们提出了两种算法，称为{\ em自适应随机标志SGD（ADA-STOSIGN）}和{\ em $ $ $β$ - 稳态标志SGD（$β$ -Stosign）}，每个词都会将局部梯度压缩到位载体中。为了处理无界的梯度，ADA -STOSIGN使用了一种新颖的标准跟踪功能，该功能可自适应地调整本地梯度的$ \ ell _ {\ elfty} $的粗略估计 - 用于梯度压缩中的关键参数。我们表明，ADA-STOSIGN在期望中收敛，价格$ o（\ log t/\ sqrt {t} + 1/\ sqrt {m}）$，其中$ m $是客户端的数量。据我们所知，当$ m $足够大时，ADA-Stosign优于基于最新的标志方法，其收敛速率为$ O（T^{ - 1/4}）$。在有限的梯度假设下，$β$ stosign可实现可量化的拜占庭式弹性和隐私保证，并与部分客户的参与和微型批次梯度合作，这些梯度可能是无限的。我们通过对MNIST和CIFAR-10数据集的实验来证实和补充我们的理论。

Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em $β$-Stochastic Sign SGD ($β$-StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the $\ell_{\infty}$ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} + 1/\sqrt{M})$, where $M$ is the number of clients. To the best of our knowledge, when $M$ is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is $O(T^{-1/4})$. Under bounded gradient assumption, $β$-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题