论文标题
重新启动:通过改装引导样品分布式统计学习
ReBoot: Distributed statistical learning via refitting bootstrap samples
论文作者
论文摘要
在本文中,我们通过改装Bootstrap样本提出了一种单发分布的学习算法,我们将其称为重新启动。重新启动将新型号改写为小批次的引导样品,这些样本是从每个本地拟合模型中不断绘制的。它仅需要一轮模型参数的通信,而没有很多内存。从理论上讲,我们分别分析了重启的统计误差率(GLM)和嘈杂的相位检索,分别代表凸和非凸问题。在这两种情况下,重新启动都可以实现全样本的统计率。特别是,我们表明,重新启动的系统偏见,与子样本数(即站点数量)无关的错误是GLM中的$ O(n ^ {-2})$,其中$ n $是子样本大小(每个本地站点的样本大小)。该速率比模型参数平均及其变体的速率更高,这意味着对数据拆分的重新启动的公差较高,以维持全样本率。我们的仿真研究证明了重新启动比竞争方法的统计优势。最后,我们建议重新启动的迭代版本Fedreboot,以汇总卷积神经网络进行图像分类。在交流的早期,FedReboot比联邦平均(FedAvg)表现出很大的优势。
In this paper, we propose a one-shot distributed learning algorithm via refitting bootstrap samples, which we refer to as ReBoot. ReBoot refits a new model to mini-batches of bootstrap samples that are continuously drawn from each of the locally fitted models. It requires only one round of communication of model parameters without much memory. Theoretically, we analyze the statistical error rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems, respectively. In both cases, ReBoot provably achieves the full-sample statistical rate. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples (i.e., the number of sites), is $O(n ^ {-2})$ in GLM, where $n$ is the subsample size (the sample size of each local site). This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Our simulation study demonstrates the statistical advantage of ReBoot over competing methods. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification. FedReBoot exhibits substantial superiority over Federated Averaging (FedAvg) within early rounds of communication.