通过职业时间适应的扰动有效地逃脱了马鞍点

论文标题

通过职业时间适应的扰动有效地逃脱了马鞍点

Escaping Saddle Points Efficiently with Occupation-Time-Adapted Perturbations

论文作者

Guo, Xin, Han, Jiequn, Tajrobehkar, Mahan, Tang, Wenpin

论文摘要

由自我避免随机步行的超扩张性的动机，该步行源于统计物理学，本文开发了一种新的扰动机制，以优化算法。在这种机制中，扰动通过职业时间的概念适应了国家历史。将这种机制集成到扰动梯度下降（PGD）和扰动加速梯度下降（PAGD）的框架之后，提出了两种新算法：扰动的梯度下降适用于职业时间（PGDOT）及其加速版（PAGDOT）（PAGDOT）。 PGDOT和PAGDOT显示出至少与PGD和PAGD一样快地收敛到二阶固定点，因此保证它们可以避免陷入非排分鞍点。经验研究证实了理论分析，在这种研究中，新算法始终逃脱鞍点，不仅胜过其对应物，PGD和PAGD，而且还优于其他流行的替代方案，包括随机梯度下降，Adam，Amsgrad和Rmsprop。

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively, and thus they are guaranteed to avoid getting stuck at non-degenerate saddle points. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, AMSGrad, and RMSProp.

下载PDF全文

下载文献需遵守相关版权规定

论文标题