论文标题
保存距离的矩阵草图
A Distance-preserving Matrix Sketch
论文作者
论文摘要
可视化非常大的矩阵涉及许多强大的问题。这些问题的各种流行解决方案涉及采样,聚类,投影或功能选择,以降低原始任务的大小和复杂性。这些方法的一个重要方面是如何在降低行和色谱柱之后保持较高维空间中点之间的相对距离以适合较低的维空间。这方面很重要,因为基于错误的视觉推理的结论可能是有害的。根据可视化,将不同的点视为类似或相似的点可能会导致错误的结论。为了改善这种偏见并使非常大的数据集可视化,我们引入了两种新算法,它们分别选择了矩形矩阵的行和列的子集和列。该选择旨在尽可能地保持相对距离。我们将矩阵草图与各种人工和真实数据集的更多传统替代方案进行了比较。
Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.