论文标题
通过偏置变化降低了局部扰动的SGD,以逃避鞍点,以进行沟通有效的非convex分布式学习
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
论文作者
论文摘要
在最近的集中式非凸出分布式学习和联合学习中,本地方法是减少沟通时间的有前途方法之一。但是,现有工作主要集中于研究一阶最佳保证。另一方面,在非分布的优化文献中已广泛研究了二阶最优性保证算法,即逃脱鞍点的算法。在本文中,我们研究了一种称为“偏置变化”的新局部算法,降低了局部扰动的SGD(BVR-L-PSGD),该算法将现有的偏置变化降低梯度估计器与参数扰动结合在一起,以在集中式非convex分布式优化中找到二阶最佳点。 BVR-L-PSGD具有二阶最优性,其通信复杂性与BVR-L-SGD中最著名的沟通复杂性几乎相同,可以找到一阶最优性。特别是,当本地数据集异质性小于本地损失的平滑度时,通信复杂性比非本地方法更好。在极端情况下,当本地数据集异质性变为零时,通信复杂性接近$ \widetildeθ(1)$。数值结果验证了我们的理论发现。
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde Θ(1)$ when the local datasets heterogeneity goes to zero. Numerical results validate our theoretical findings.