通过Chebyshev步骤的收敛加速度：深处梯度下降的合理解释

论文标题

通过Chebyshev步骤的收敛加速度：深处梯度下降的合理解释

Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

论文作者

Takabe, Satoshi, Wadayama, Tadashi

论文摘要

深度展开是一种有希望的深度学习技术，其网络体系结构基于扩展现有迭代算法的递归结构。尽管融合加速度是深层发展的显着优势，但尚未透露其理论方面。这项研究的前半部分详细介绍了深度梯度下降（DUGD）中收敛加速度的理论分析，其可训练的参数是步进尺寸。我们通过引入Chebyshev步骤从Chebyshev多项式派出的Chebyshev步骤，对DUGD中学到的步进参数进行了合理的解释。在梯度下降（GD）中使用Chebyshev步骤使我们能够绑定GD收敛速度的矩阵的光谱半径，从而导致收敛速度的紧密上限。使用Chebyshev步骤的GD收敛速率显示出渐近最佳，尽管它没有动量术语。我们还表明，Chebyshev步骤在数字上很好地解释了DUGD中的学习尺寸参数。在研究的后半部分，我们将Chebyshev步骤的理论和Chebyshev periodical连续的过度释放（Chebyshev-psor）提出，以加速线性/非线性固定点迭代。理论分析和数值实验表明，Chebyshev-Psor在诸如Jacobi方法和近端梯度方法等各种示例的趋于速度上表现出明显更快的收敛性。

Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although convergence acceleration is a remarkable advantage of deep unfolding, its theoretical aspects have not been revealed yet. The first half of this study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. The convergence rate of GD using Chebyshev steps is shown to be asymptotically optimal, although it has no momentum terms. We also show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well. In the second half of the study, %we apply the theory of Chebyshev steps and Chebyshev-periodical successive over-relaxation (Chebyshev-PSOR) is proposed for accelerating linear/nonlinear fixed-point iterations. Theoretical analysis and numerical experiments indicate that Chebyshev-PSOR exhibits significantly faster convergence for various examples such as Jacobi method and proximal gradient methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题