论文标题
深线性模型中非概念的祝福:深度使真实解决方案的优化景观变平
Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution
论文作者
论文摘要
这项工作表征了深度对线性回归优化景观的影响,表明尽管具有非凸度,但更深的模型具有更理想的优化景观。我们考虑了一个健壮且过度参数化的设置,其中一部分测量值严重损坏了噪声,真正的线性模型将通过$ n $ layer的线性神经网络捕获。负面的一面,我们表明这个问题\ textit {do}具有良性景观:鉴于任何$ n \ geq 1 $,具有持续的概率,存在与既不是本地最小值也不是全球最小值的地面真理的解决方案。但是,从积极的一面来看,我们证明,对于具有$ n \ geq 2 $的任何$ n $ layer模型,一种简单的次级方法会忽略这种``有问题的''解决方案;取而代之的是,它收敛于平衡的解决方案,该解决方案不仅接近地面真理,而且享有平坦的当地景观,从而避免了“早期停止”的需求。最后,我们从经验上验证了更深型模型的理想优化格局扩展到其他强大的学习任务,包括具有$ \ ell_1 $ -loss的深矩阵恢复和深度relu网络。
This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss.