论文标题
随机协调下降方法,用于不可分割的复合优化
Random coordinate descent methods for nonseparable composite optimization
论文作者
论文摘要
在本文中,我们考虑将目标函数形成为两个术语(可能是非convex)的大规模复合优化问题,一个是(可能是非convex),一个具有(块)坐标的Lipschitz连续梯度,而另一个则是可区分的但不可分割的。在这些一般设置下,我们得出并分析了两种新的坐标下降方法。第一种算法(称为坐标近端梯度方法)考虑了目标函数的复合形式,而其他算法则忽略了目标的复合形式,并使用了完整物镜的部分梯度,从而产生了具有新型Adaptive步骤规则的坐标梯度下降方案。我们证明,这些新的步骤规则使坐标梯度方案成为下降方法,前提是目标函数中第二项的其他假设。我们为这两种新方法,即凸和非凸形设置中的这两种新方法提供了完整的最坏情况复杂性分析,前提是(块)坐标是随机或循环的。初步数值结果还证实了我们两种算法在实际问题上的效率。
In this paper we consider large-scale composite optimization problems having the objective function formed as a sum of two terms (possibly nonconvex), one has (block) coordinate-wise Lipschitz continuous gradient and the other is differentiable but nonseparable. Under these general settings we derive and analyze two new coordinate descent methods. The first algorithm, referred to as coordinate proximal gradient method, considers the composite form of the objective function, while the other algorithm disregards the composite form of the objective and uses the partial gradient of the full objective, yielding a coordinate gradient descent scheme with novel adaptive stepsize rules. We prove that these new stepsize rules make the coordinate gradient scheme a descent method, provided that additional assumptions hold for the second term in the objective function. We present a complete worst-case complexity analysis for these two new methods in both, convex and nonconvex settings, provided that the (block) coordinates are chosen random or cyclic. Preliminary numerical results also confirm the efficiency of our two algorithms on practical problems.