论文标题
展开的诅咒:通过优化区分的速率
The Curse of Unrolling: Rate of Differentiating Through Optimization
论文作者
论文摘要
计算优化问题解决方案解决方案的雅各布是机器学习中的一个中心问题,其应用程序优化,元学习,优化为层和数据集蒸馏中的应用程序中的应用程序。展开的分化是一种流行的启发式方法,它使用迭代求解器近似溶液,并通过计算路径区分它。这项工作提供了这种方法对梯度下降和Chebyshev方法的二次目标的非反应收敛速率分析。我们表明,为了确保雅各布的收敛性,我们可以1)选择较大的学习率,导致快速渐近地收敛,但接受该算法可能具有任意长的燃烧阶段,或者2)选择较小的学习率,导致立即但较慢的收敛性。我们将这种现象称为展开的诅咒。最后,我们讨论了相对于这种方法的开放问题,例如为最佳展开策略提供了实用的更新规则,并与Sobolev正交多项式领域建立了新的联系。
Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.