论文标题
邻里梯度聚类:非IID数据分布的有效分散学习方法
Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions
论文作者
论文摘要
分布式数据集对分布式数据集的分散学习可以在整个代理中具有显着不同的数据分布。当前的最新分散算法主要假设数据分布是独立的且分布相同的。本文着重于改善与非IID数据的分散学习。我们提出\ textIt {邻居梯度聚类(NGC)},这是一种新颖的分散学习算法,使用自我和交叉梯度信息修改每个代理的局部梯度。一对相邻代理的交叉梯度是代理的模型参数的衍生物,相对于其他代理的数据集。特别地,提出的方法用自我成分的加权平均值,模型变化的跨梯度(相对于本地数据集的邻居参数的衍生物)和数据变化的跨梯度(与其邻居数据集有关的局部模型的衍生物)。数据变化的交叉梯度通过额外的通信回合汇总而不打破隐私约束。此外,我们提出\ textit {compngc},这是\ textit {ngc}的压缩版,将通信开销降低$ 32 \ times $。我们从理论上分析了提出的算法的收敛速率,并证明了其对从经过培训的{各种视觉和语言}数据集采样的非IID数据的效率。我们的实验表明,\ textit {ngc}和\ textit {compngc}跑赢大盘(以$ 0-6 \%$)比非IID数据的现有SOTA分散学习算法的计算和内存要求明显较小。此外,我们的实验表明,每个代理商在本地可用的模型变异的交叉梯度信息可以将非IID数据的性能提高$ 1-35 \%$,而无需额外的通信成本。
Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.