将混乱标记为学习和谐：用嘈杂的标签联合学习

论文标题

将混乱标记为学习和谐：用嘈杂的标签联合学习

Labeling Chaos to Learning Harmony: Federated Learning with Noisy Labels

论文作者

Tsouvalas, Vasileios, Saeed, Aaqib, Ozcelebi, Tanir, Meratnia, Nirvana

论文摘要

联合学习（FL）是一个分布式的机器学习范式，可以从分散的私人数据集中学习模型，在该数据集中将标签工作委托给客户。尽管大多数现有的FL方法都假定在用户设备上很容易获得高质量的标签。实际上，标签噪声自然会发生在FL中，并且与客户的特征密切相关。由于佛罗里达州客户端的可用数据稀缺和客户差异的显着标签变化，现有的最先进的集中式方法表现出不令人满意的性能，而先前的FL研究依赖于过度的智障计算方案或服务器上可用的其他清洁数据。在这里，我们提出了Fedln，这是一个框架，可以在不同的FL训练阶段处理标签噪声；也就是说，FL初始化，设备模型培训和服务器模型聚合能够适应FL系统中设备的各种计算能力。具体而言，FedLN在单个联合回合中计算每个客户的噪声级估计，并通过校正或减轻嘈杂样品的效果来改善模型的性能。我们对各种公开视觉和音频数据集的评估表明，与60％的标签噪声水平相比，与其他现有方法相比，平均提高了22％。我们进一步验证了FedLN在人类宣传的现实世界嘈杂数据集中的效率，并报告模型的识别性能平均增加了4.8％，这强调了〜\ sath方法〜可以改善提供给日常用户的FL服务。

Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized private datasets, where the labeling effort is entrusted to the clients. While most existing FL approaches assume high-quality labels are readily available on users' devices; in reality, label noise can naturally occur in FL and is closely related to clients' characteristics. Due to scarcity of available data and significant label noise variations among clients in FL, existing state-of-the-art centralized approaches exhibit unsatisfactory performance, while prior FL studies rely on excessive on-device computational schemes or additional clean data available on server. Here, we propose FedLN, a framework to deal with label noise across different FL training stages; namely, FL initialization, on-device model training, and server model aggregation, able to accommodate the diverse computational capabilities of devices in a FL system. Specifically, FedLN computes per-client noise-level estimation in a single federated round and improves the models' performance by either correcting or mitigating the effect of noisy samples. Our evaluation on various publicly available vision and audio datasets demonstrate a 22% improvement on average compared to other existing methods for a label noise level of 60%. We further validate the efficiency of FedLN in human-annotated real-world noisy datasets and report a 4.8% increase on average in models' recognition performance, highlighting that~\method~can be useful for improving FL services provided to everyday users.

下载PDF全文

下载文献需遵守相关版权规定

论文标题