$ \ ell_0 $基于稀疏规范相关分析

论文标题

$ \ ell_0 $基于稀疏规范相关分析

$\ell_0$-based Sparse Canonical Correlation Analysis

论文作者

Lindenbaum, Ofir, Salhov, Moshe, Averbuch, Amir, Kluger, Yuval

论文摘要

规范相关分析（CCA）模型对于研究两组变量之间的关联非常有力。规范相关的表示，称为\ textIt {canonical xariates}，被广泛用于无监督学习中，以分析未标记的多模式注册数据集。尽管它们成功了，但如果两个模态中的变量数量超过样品数量，则CCA模型可能会破坏（或过度拟合）。此外，通常很大一部分变量测量了特定于模式的信息，因此将其删除对于识别\ textit {canon上相关的变量}是有益的。在这里，我们提出了$ \ ell_0 $ -cca，这是一种基于从两个观察到的模式的变量的稀疏子集学习相关表示的方法。通过将输入变量乘以随机门来获得稀疏性，其参数与CCA权重通过$ \ ell_0 $ regarlized相关性损失一起学习。我们进一步提出了$ \ ell_0 $ -Deep CCA，用于通过使用深网对相关表示形式进行建模，以解决非线性稀疏CCA的问题。我们使用几个合成和真实示例来证明该方法的功效。最值得注意的是，与其他线性，非线性和稀疏基于CCA的模型相比，通过门控滋扰输入变量，我们的方法改善了提取的表示。

Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the \textit{canonically correlated variates}. Here, we propose $\ell_0$-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an $\ell_0$-regularized correlation loss. We further propose $\ell_0$-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题