论文标题

非对抗对称非负基质分解

Off-diagonal Symmetric Nonnegative Matrix Factorization

论文作者

Moutier, François, Vandaele, Arnaud, Gillis, Nicolas

论文摘要

对称非负矩阵分解(SYMNMF)是非负矩阵分解(NMF)的变体,它允许处理对称输入矩阵,并且已被证明非常适合聚类任务。在本文中,我们提出了一个新模型,称为非对角线符号(ODSYMNMF),该模型未考虑目标函数中输入矩阵的对角线条目。与SYMNMF相比,ODSYMNMF具有三个关键优势。首先,在理论上,ODSYMNMF的声音要多得多,因为总是存在$ \ nicefrac {n(n-1)} {2} $的确切大小的确切分解,其中$ n $是输入矩阵的维度。其次,在实践中,随着输入矩阵的对角线条目通常与项目和自身之间的相似性相对应,而不是带来太多信息,这在实践中更有意义。第三,它使优化问题更容易解决。特别是,它将允许我们根据坐标下降设计算法,从而最大程度地减少输入矩阵及其近似之间的组件$ \ ell_1 $规范。我们证明,该规范更适合在实践中经常遇到的二进制输入矩阵。我们还为组件的$ \ ell_2 $ norm推导了一种坐标下降方法,并将两种方法与合成数据集和文档数据集的SYMNMF进行比较。

Symmetric nonnegative matrix factorization (symNMF) is a variant of nonnegative matrix factorization (NMF) that allows to handle symmetric input matrices and has been shown to be particularly well suited for clustering tasks. In this paper, we present a new model, dubbed off-diagonal symNMF (ODsymNMF), that does not take into account the diagonal entries of the input matrix in the objective function. ODsymNMF has three key advantages compared to symNMF. First, ODsymNMF is theoretically much more sound as there always exists an exact factorization of size at most $\nicefrac{n(n-1)}{2}$ where $n$ is the dimension of the input matrix. Second, it makes more sense in practice as diagonal entries of the input matrix typically correspond to the similarity between an item and itself, not bringing much information. Third, it makes the optimization problem much easier to solve. In particular, it will allow us to design an algorithm based on coordinate descent that minimizes the component-wise $\ell_1$ norm between the input matrix and its approximation. We prove that this norm is much better suited for binary input matrices often encountered in practice. We also derive a coordinate descent method for the component-wise $\ell_2$ norm, and compare the two approaches with symNMF on synthetic and document data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源