多视图潜在变量模型中编码域知识：具有结构性稀疏性的贝叶斯方法

论文标题

多视图潜在变量模型中编码域知识：具有结构性稀疏性的贝叶斯方法

Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity

论文作者

Qoku, Arber, Buettner, Florian

论文摘要

许多现实世界的系统不仅由来自单个源的数据来描述，而且还通过多个数据视图来描述。例如，在基因组医学中，患者的特征是来自不同分子层的数据。具有结构性稀疏性的潜在变量模型是一种常用的工具，可在数据视图内部和跨数据视图内部分解变化。但是，它们的解释性很麻烦，因为它需要对领域专家的每个因素进行直接检查和解释。在这里，我们提出了MUVI，这是一种基于修改的马蹄化的新型多视图潜在变量模型，用于建模结构化的稀疏度。这有助于融合有限和嘈杂的域知识，从而允许以固有的解释方式对多视图数据进行分析。 We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.

Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题