如何捕获梯度流

论文标题

如何捕获梯度流

How to trap a gradient flow

论文作者

Bubeck, Sébastien, Mikulincer, Dan

论文摘要

我们考虑找到$ \ Mathbb {r}^d $的紧凑型域上平滑函数的$ \ varepsilon $ -Approximate固定点的问题。与诸如梯度下降之类的无维度方法相反，我们在这里集中在$ d $有限且可能很小的情况下。 Vavasis在1993年探讨了这种观点，他提出了一种算法，该算法对于任何固定有限的尺寸$ d $，都可以在$ O（1/\ Varepsilon^2）$ O oracle梯度下降的复杂性上改善。例如，对于$ d = 2 $，Vavasis的方法获得了复杂性$ O（1/\ Varepsilon）$。此外，对于$ d = 2 $，他还证明了$ω（1/\ sqrt {\ varepsilon}）$的下限用于确定性算法（我们将此结果扩展到随机算法）。我们的主要贡献是一种算法，我们称之为梯度流陷阱（GFT）及其甲骨文复杂性的分析。在尺寸$ d = 2 $中，gft用vavasis的下限（最高为对数因素）缩小了差距，因为我们证明它具有复杂性$ o \ left（\ sqrt {\ frac {\ frac {\ log log（1/\ varepsilon）} {\ varepsilon} {\ \ varepsilon} {\ varepsilon} \ right）在尺寸$ d = 3 $中，我们显示了$ o \ left的复杂性（\ frac {\ log（1 / \ varepsilon）} {\ varepsilon} \ right）$，可在Vavasis' $ o \ left（1 / \ varepsilon^{1.2} {1.2} \ right上改善。在较高的维度中，GFT具有与对数平行深度策略的显着特性，与梯度下降或Vavasis算法的多项式深度形成鲜明对比。在这个较高的维度状态下，GFT的总工作在此问题的唯一已知的Polyrogarithmic Depth策略（即幼稚的网格搜索）上四处改善。我们使用另一种名为\ emph {cut and flow}（CF）的算法增强了此结果，该算法在任何固定尺寸中都在Vavasis的算法上改进。

We consider the problem of finding an $\varepsilon$-approximate stationary point of a smooth function on a compact domain of $\mathbb{R}^d$. In contrast with dimension-free approaches such as gradient descent, we focus here on the case where $d$ is finite, and potentially small. This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent. For example for $d=2$, Vavasis' approach obtains the complexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower bound of $Ω(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend this result to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping (GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFT closes the gap with Vavasis' lower bound (up to a logarithmic factor), as we show that it has complexity $O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension $d=3$, we show a complexity of $O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving upon Vavasis' $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT has the remarkable property of being a logarithmic parallel depth strategy, in stark contrast with the polynomial depth of gradient descent or Vavasis' algorithm. In this higher dimensional regime, the total work of GFT improves quadratically upon the only other known polylogarithmic depth strategy for this problem, namely naive grid search. We augment this result with another algorithm, named \emph{cut and flow} (CF), which improves upon Vavasis' algorithm in any fixed dimension.

下载PDF全文

下载文献需遵守相关版权规定

论文标题