视觉神经分解以解释多元数据集

论文标题

视觉神经分解以解释多元数据集

Visual Neural Decomposition to Explain Multivariate Data Sets

论文作者

Knittel, Johannes, Lalama, Andres, Koch, Steffen, Ertl, Thomas

论文摘要

在多维数据集中研究变量之间的关系是数据分析师和工程师的常见任务。更具体地说，了解哪些输入变量导致给定目标变量的特定值通常是有价值的。不幸的是，随着自变量数量的越来越多，由于必须探索的许多可能组合，此过程可能会变得繁琐且耗时。在本文中，我们提出了一种新颖的方法来可视化输入变量与目标输出变量之间的相关性，该变量比例扩展到数百个变量。我们开发了一种基于神经网络的视觉模型，该模型可以以指导方式探索，以帮助分析师查找和理解此类相关性。首先，我们训练神经网络以从输入变量中预测目标。然后，我们可视化所得模型的内部工作，以帮助了解数据集中的关系。我们进一步介绍了一个新的正规化术语，用于反向传播算法，该算法鼓励神经网络学习更易于在视觉上解释的表示形式。我们将方法应用于人工和现实世界数据集以显示其实用性。

Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题