论文标题
随机多层组成优化的最佳算法
Optimal Algorithms for Stochastic Multi-Level Compositional Optimization
论文作者
论文摘要
在本文中,我们研究了随机多层组成优化的问题,其中目标函数是多个平滑但可能是非凸功能的组成。解决此问题的现有方法要么患有亚最佳样本复杂性,要么需要批量较大。为了解决这些局限性,我们提出了一种随机多级别差异方法(SMVR),该方法达到了$ \ Mathcal {o} \ left(1 /ε^{3} \ right)$的最佳样本复杂性,以找到一个非convex目标的$ε$ - 规格点。此外,当目标函数满足凸度或polyak-olojasiewicz(PL)条件时,我们提出了SMVR的阶段变体,并将样本复杂性提高到$ \ Mathcal {o} \ left(1 /ε^{2} \ right)$ convex函数或$ \ nathcal playcal of $ \ awsecal {1 /ε^{2} \ right( /\ left(με\右)\右)$ $用于满足$μ$ PL条件的非凸功能。后一个结果意味着对于$ $ $ $ strongly凸功能的复杂性。为了利用自适应学习率,我们还开发了自适应SMVR,它达到了相同的复杂性,但在实践中融合得更快。我们所有的复杂性不仅与$ε$有关,而且与$μ$(用于PL或强烈凸功能)相匹配,而无需在每次迭代中使用较大的批次大小。
In this paper, we investigate the problem of stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth but possibly non-convex functions. Existing methods for solving this problem either suffer from sub-optimal sample complexities or need a huge batch size. To address these limitations, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of $\mathcal{O}\left(1 / ε^{3}\right)$ to find an $ε$-stationary point for non-convex objectives. Furthermore, when the objective function satisfies the convexity or Polyak-Łojasiewicz (PL) condition, we propose a stage-wise variant of SMVR and improve the sample complexity to $\mathcal{O}\left(1 / ε^{2}\right)$ for convex functions or $\mathcal{O}\left(1 /\left(με\right)\right)$ for non-convex functions satisfying the $μ$-PL condition. The latter result implies the same complexity for $μ$-strongly convex functions. To make use of adaptive learning rates, we also develop Adaptive SMVR, which achieves the same complexities but converges faster in practice. All our complexities match the lower bounds not only in terms of $ε$ but also in terms of $μ$ (for PL or strongly convex functions), without using a large batch size in each iteration.