通过偏置变化降低了局部扰动的SGD，以逃避鞍点，以进行沟通有效的非convex分布式学习

论文标题

通过偏置变化降低了局部扰动的SGD，以逃避鞍点，以进行沟通有效的非convex分布式学习

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

论文作者

Murata, Tomoya, Suzuki, Taiji

论文摘要

在最近的集中式非凸出分布式学习和联合学习中，本地方法是减少沟通时间的有前途方法之一。但是，现有工作主要集中于研究一阶最佳保证。另一方面，在非分布的优化文献中已广泛研究了二阶最优性保证算法，即逃脱鞍点的算法。在本文中，我们研究了一种称为“偏置变化”的新局部算法，降低了局部扰动的SGD（BVR-L-PSGD），该算法将现有的偏置变化降低梯度估计器与参数扰动结合在一起，以在集中式非convex分布式优化中找到二阶最佳点。 BVR-L-PSGD具有二阶最优性，其通信复杂性与BVR-L-SGD中最著名的沟通复杂性几乎相同，可以找到一阶最优性。特别是，当本地数据集异质性小于本地损失的平滑度时，通信复杂性比非本地方法更好。在极端情况下，当本地数据集异质性变为零时，通信复杂性接近$ \widetildeθ（1）$。数值结果验证了我们的理论发现。

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde Θ(1)$ when the local datasets heterogeneity goes to zero. Numerical results validate our theoretical findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题