论文标题

通过本地更新的层次聚类联合学习,以改善对非IID数据的培训

Federated learning with hierarchical clustering of local updates to improve training on non-IID data

论文作者

Briggs, Christopher, Fan, Zhong, Andras, Peter

论文摘要

联合学习(FL)是一种在大规模分布数据上执行机器学习任务的良好方法。但是,在数据以非IID(不是独立且分布相同)的方式分布的环境中 - 与现实世界中的典型情况一样 - 与IID数据的培训相比,FL生成的联合模型在测试集准确性和/或通信成本方面遭受了损失。我们表明,在存在某些类型的非IID数据的情况下,学习单个联合模型通常不是最佳的。在这项工作中,我们通过引入分层聚类步骤(FL+HC)来提出一个修改,以通过其本地更新与全球关节模型的相似性分开客户群。一旦分开,簇就会独立训练并在专业模型上并行。我们对几种IID和非IID设置的FL+HC进行了强大的经验分析。我们展示了FL+HC如何允许模型训练在较少的通信回合(在某些非IID设置下)与无聚类相比。此外,与标准FL相比,FL+HC允许更多的客户达到目标准确性。最后,我们提出了良好的默认超标仪的建议,以促进出色的表现专业模型,而无需修改基础联合学习通信协议。

Federated learning (FL) is a well established method for performing machine learning tasks over massively distributed data. However in settings where data is distributed in a non-iid (not independent and identically distributed) fashion -- as is typical in real world situations -- the joint model produced by FL suffers in terms of test set accuracy and/or communication costs compared to training on iid data. We show that learning a single joint model is often not optimal in the presence of certain types of non-iid data. In this work we present a modification to FL by introducing a hierarchical clustering step (FL+HC) to separate clusters of clients by the similarity of their local updates to the global joint model. Once separated, the clusters are trained independently and in parallel on specialised models. We present a robust empirical analysis of the hyperparameters for FL+HC for several iid and non-iid settings. We show how FL+HC allows model training to converge in fewer communication rounds (significantly so under some non-iid settings) compared to FL without clustering. Additionally, FL+HC allows for a greater percentage of clients to reach a target accuracy compared to standard FL. Finally we make suggestions for good default hyperparameters to promote superior performing specialised models without modifying the the underlying federated learning communication protocol.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源