受约束的非平滑非凸优化的块大量最小化，半径减少了半径

论文标题

受约束的非平滑非凸优化的块大量最小化，半径减少了半径

Block majorization-minimization with diminishing radius for constrained nonsmooth nonconvex optimization

论文作者

Lyu, Hanbaek, Li, Yuchen

论文摘要

块大型化最小化（BMM）是一种简单的迭代算法，用于受约束的非convex优化，在固定其他块时，在每个块中依次最小化目标函数的替代物。 BMM需要大量的优化算法，例如块坐标下降及其近端变体，期望最小化和块投影梯度下降。我们首先确定，对于一般受限的非平滑非凸优化，BMM具有$ρ$ -Strongly的凸出和$ L_G $ -Smooth替代物可以产生$ε$ -Appproximate的一阶最佳点$ \ widetilde {o}（O}（O}）一组一阶最佳点。接下来，我们证明BMM与半径减少的信任区域方法相结合，其复杂性的提高了$ \ widetilde {o}（（（1+l_g）ε^{ - 2}）$，独立于相反的强度convexity参数$ρ^{ - 1} $，使得改善了理论和实践效果。即使解决最佳差距可以总结，我们的结果即使解决了凸子问题，我们的结果也可以坚固。我们分析的核心是一种新型的连续一阶最佳度量，通过该测量，我们通过算法的一阶改进来限制每种迭代中最坏情况下的次要措施。我们应用一般框架来获得各种算法的新结果，例如Lee和Seung的非负矩阵分解的著名乘法更新算法，正规化的非负张量分解以及经典块投影梯度下降算法。最后，我们从数值上证明，在许多情况下，额外使用减少半径可以提高BMM的收敛速率。

Block majorization-minimization (BMM) is a simple iterative algorithm for constrained nonconvex optimization that sequentially minimizes majorizing surrogates of the objective function in each block while the others are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We first establish that for general constrained nonsmooth nonconvex optimization, BMM with $ρ$-strongly convex and $L_g$-smooth surrogates can produce an $ε$-approximate first-order optimal point within $\widetilde{O}((1+L_g+ρ^{-1})ε^{-2})$ iterations and asymptotically converges to the set of first-order optimal points. Next, we show that BMM combined with trust-region methods with diminishing radius has an improved complexity of $\widetilde{O}((1+L_g) ε^{-2})$, independent of the inverse strong convexity parameter $ρ^{-1}$, allowing improved theoretical and practical performance with `flat' surrogates. Our results hold robustly even when the convex sub-problems are solved as long as the optimality gaps are summable. Central to our analysis is a novel continuous first-order optimality measure, by which we bound the worst-case sub-optimality in each iteration by the first-order improvement the algorithm makes. We apply our general framework to obtain new results on various algorithms such as the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung, regularized nonnegative tensor decomposition, and the classical block projected gradient descent algorithm. Lastly, we numerically demonstrate that the additional use of diminishing radius can improve the convergence rate of BMM in many instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题