论文标题
缓冲拜占庭学习的异步SGD
Buffered Asynchronous SGD for Byzantine Learning
论文作者
论文摘要
由于其在基于聚类的大规模学习,联合学习,边缘计算等中的广泛应用,分布式学习已成为一个热门的研究主题。大多数传统的分布式学习方法通常没有失败或攻击。但是,许多意外情况,例如沟通失败甚至是恶意攻击,可能会在实际应用中发生。因此,拜占庭学习(BL)是指通过失败或攻击分配的学习,最近引起了很多关注。大多数现有的BL方法都是同步的,在某些应用中,由于异质或离线工人,它们在某些应用中是不切实际的。在这些情况下,通常优选异步BL(ABL)。在本文中,我们提出了一种新的方法,称为ABL,称为缓冲异步随机梯度下降(BASGD)。据我们所知,BASGD是第一种ABL方法,它可以抵抗非官方攻击而无需存储服务器上的任何实例。此外,我们还提出了通过将动量引入BASGD的改进的BASGD的变体,称为BASGD(BASGDM)。 BASGDM可以抵抗非警觉和无所不知的攻击。与需要在服务器上存储实例的方法相比,BASGD和BASGDM具有更大的应用程序范围。 BASGD和BASGDM都与各种聚合规则兼容。此外,BASGD和BASGDM均被证明是收敛的,并且能够抵抗故障或攻击。经验结果表明,当存在失败或攻击工人时,我们的方法明显优于现有的ABL基线。
Distributed learning has become a hot research topic due to its wide application in clusterbased large-scale learning, federated learning, edge computing and so on. Most traditional distributed learning methods typically assume no failure or attack. However, many unexpected cases, such as communication failure and even malicious attack, may happen in real applications. Hence, Byzantine learning (BL), which refers to distributed learning with failure or attack, has recently attracted much attention. Most existing BL methods are synchronous, which are impractical in some applications due to heterogeneous or offline workers. In these cases, asynchronous BL (ABL) is usually preferred. In this paper, we propose a novel method, called buffered asynchronous stochastic gradient descent (BASGD), for ABL. To the best of our knowledge, BASGD is the first ABL method that can resist non-omniscient attacks without storing any instances on server. Furthermore, we also propose an improved variant of BASGD, called BASGD with momentum (BASGDm), by introducing momentum into BASGD. BASGDm can resist both non-omniscient and omniscient attacks. Compared with those methods which need to store instances on server, BASGD and BASGDm have a wider scope of application. Both BASGD and BASGDm are compatible with various aggregation rules. Moreover, both BASGD and BASGDm are proved to be convergent and be able to resist failure or attack. Empirical results show that our methods significantly outperform existing ABL baselines when there exists failure or attack on workers.