在球外思考：最佳学习，具有梯度下降的最佳学习，以进行广义线性随机凸优化

论文标题

在球外思考：最佳学习，具有梯度下降的最佳学习，以进行广义线性随机凸优化

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

论文作者

Amir, Idan, Livni, Roi, Srebro, Nathan

论文摘要

我们考虑使用凸Lipschitz丢失或更一般的线性预测，或者更一般地，随机凸优化问题的广义线性形式的问题，即每个瞬时丢失是线性函数的标量凸函数。我们表明，在这种情况下，没有任何明确的正规化或投影，早期停止的梯度下降（GD），可确保最多$ε$（与单位欧几里得规范相比最佳）具有最佳的误差，最佳的对数因素，to Googarithmic因素，样本的复杂性，$ \ tilde {o}（O}（O}（1/amis^2）$^2）$和$ \ tilde（1/ε^2）（1/ε^2）迭代。这与一般的随机凸优化形成对比，其中$ω（1/ε^4）$迭代是需要Amir等人的。 [2021b]。通过利用统一的收敛而不是稳定性来确保较低的迭代复杂性。但是，我们表明的是，我们可以保证使用$θ（1/ε^4）$样本中的次优学习，而是依赖于分布依赖性球的均匀收敛，而不是均匀的融合。

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $ε$ (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of $\tilde{O}(1/ε^2)$ and only $\tilde{O}(1/ε^2)$ iterations. This contrasts with general stochastic convex optimization, where $Ω(1/ε^4)$ iterations are needed Amir et al. [2021b]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Θ(1/ε^4)$ samples, we rely on uniform convergence in a distribution-dependent ball.

下载PDF全文

下载文献需遵守相关版权规定

论文标题