联合级过度下降

论文标题

联合级过度下降

Federated Hypergradient Descent

论文作者

Kan, Andrew K

论文摘要

在这项工作中，我们探讨了在线单次程序中为联合学习（FL）进行自动超参数调整和优化的组合。我们将有原则的方法应用于一种自适应客户学习率，本地步骤数量和批处理大小的方法。在我们的联合学习应用中，我们的主要动机是最大程度地减少培训管道中的沟通预算以及本地计算资源。通常，高参数调整方法至少涉及一定程度的反复试验，这是效率低下的样本。为了解决我们的动机，我们建议FATHOM（联邦自动超参数优化）作为一次性在线程序。我们研究了有关感兴趣的超参数的分析梯度的挑战和解决方案。我们的方法的灵感来自以下事实：除本地数据外，我们对培训过程中涉及的所有组成部分有充分的了解，并且可以在我们的算法中利用这一事实。我们表明，与使用优化的静态高度参数相比，Fathom比联邦平均（FedAvg）更有效地沟通，并且总体上也更有效地计算效率。作为一种沟通有效的，单发的在线程序，Fathom通过消除潜在的浪费调整过程以及在整个培训程序中无需试用，即可通过消除潜在的浪费调整过程来解决昂贵的沟通和有限的本地计算的瓶颈。我们通过使用FEDJAX作为我们的基线框架，通过联合EMNIST-62（FEMNIST）和联合堆栈溢出（FSO）数据集进行了广泛的经验实验来展示我们的数值结果。

In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题