加速SGD，用于严重不良的大规模在线矩阵完成

论文标题

加速SGD，用于严重不良的大规模在线矩阵完成

Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

论文作者

Zhang, Gavin, Chiu, Hong-Ming, Zhang, Richard Y.

论文摘要

矩阵的完成问题旨在从对单个元素的观察中恢复低级$ r \ ll d $的$ d \ times d $地面真相矩阵。现实世界中的矩阵完成通常是一个巨大的优化问题，$ d $如此之大，以至于即使是$ O（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）$ o（d）的时间复杂性也变得非常昂贵。随机梯度下降（SGD）是少数能够大规模求解矩阵完成的算法之一，也可以自然地通过不断发展的地面真相处理流数据。不幸的是，当底层地面真理不足时，SGD经历了戏剧性的慢速下降。它需要至少$ o（κ\ log（1/ε））$迭代才能获得$ε$ -Close，以将条件号$κ$接地。在本文中，我们提出了一个预处理的SGD版本，该版本保留了SGD对大规模在线优化的所有有利的实践素质，同时也使其不可知到$κ$。对于对称地面真相和均方根误差（RMSE）损失，我们证明了预处理的SGD在$ O（\ log（1/ε））$迭代中收敛到$ε$ -1ARCIRACY，其快速线性收敛速度就好像地面真相是由$κ= 1 $ $κ= 1 $。在我们的实验中，我们观察到通过成对排名损失在Movielens25M数据集上进行项目项目协作过滤的类似加速度，并具有1亿个培训对和1000万个测试对。 [请参阅https://github.com/hong-ming/scaledsgd的支持代码。]

The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(κ\log(1/ε))$ iterations to get $ε$-close to ground truth matrix with condition number $κ$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $κ$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $ε$-accuracy in $O(\log(1/ε))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $κ=1$. In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs. [See supporting code at https://github.com/Hong-Ming/ScaledSGD.]

下载PDF全文

下载文献需遵守相关版权规定

论文标题