论文标题

加速SGD,用于严重不良的大规模在线矩阵完成

Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

论文作者

Zhang, Gavin, Chiu, Hong-Ming, Zhang, Richard Y.

论文摘要

矩阵的完成问题旨在从对单个元素的观察中恢复低级$ r \ ll d $的$ d \ times d $地面真相矩阵。现实世界中的矩阵完成通常是一个巨大的优化问题,$ d $如此之大,以至于即使是$ O(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)$ o(d)的时间复杂性也变得非常昂贵。随机梯度下降(SGD)是少数能够大规模求解矩阵完成的算法之一,也可以自然地通过不断发展的地面真相处理流数据。不幸的是,当底层地面真理不足时,SGD经历了戏剧性的慢速下降。它需要至少$ o(κ\ log(1/ε))$迭代才能获得$ε$ -Close,以将条件号$κ$接地。在本文中,我们提出了一个预处理的SGD版本,该版本保留了SGD对大规​​模在线优化的所有有利的实践素质,同时也使其不可知到$κ$。对于对称地面真相和均方根误差(RMSE)损失,我们证明了预处理的SGD在$ O(\ log(1/ε))$迭代中收敛到$ε$ -1ARCIRACY,其快速线性收敛速度就好像地面真相是由$κ= 1 $ $κ= 1 $。在我们的实验中,我们观察到通过成对排名损失在Movielens25M数据集上进行项目项目协作过滤的类似加速度,并具有1亿个培训对和1000万个测试对。 [请参阅https://github.com/hong-ming/scaledsgd的支持代码。]

The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(κ\log(1/ε))$ iterations to get $ε$-close to ground truth matrix with condition number $κ$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $κ$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $ε$-accuracy in $O(\log(1/ε))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $κ=1$. In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs. [See supporting code at https://github.com/Hong-Ming/ScaledSGD.]

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源