论文标题
在球外思考:最佳学习,具有梯度下降的最佳学习,以进行广义线性随机凸优化
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
论文作者
论文摘要
我们考虑使用凸Lipschitz丢失或更一般的线性预测,或者更一般地,随机凸优化问题的广义线性形式的问题,即每个瞬时丢失是线性函数的标量凸函数。我们表明,在这种情况下,没有任何明确的正规化或投影,早期停止的梯度下降(GD),可确保最多$ε$(与单位欧几里得规范相比最佳)具有最佳的误差,最佳的对数因素,to Googarithmic因素,样本的复杂性,$ \ tilde {o}(O}(O}(1/amis^2)$^2)$和$ \ tilde(1/ε^2)(1/ε^2)迭代。这与一般的随机凸优化形成对比,其中$ω(1/ε^4)$迭代是需要Amir等人的。 [2021b]。通过利用统一的收敛而不是稳定性来确保较低的迭代复杂性。但是,我们表明的是,我们可以保证使用$θ(1/ε^4)$样本中的次优学习,而是依赖于分布依赖性球的均匀收敛,而不是均匀的融合。
We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $ε$ (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of $\tilde{O}(1/ε^2)$ and only $\tilde{O}(1/ε^2)$ iterations. This contrasts with general stochastic convex optimization, where $Ω(1/ε^4)$ iterations are needed Amir et al. [2021b]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Θ(1/ε^4)$ samples, we rely on uniform convergence in a distribution-dependent ball.