图形结构化数据的距离度量学习

论文标题

图形结构化数据的距离度量学习

Distance Metric Learning for Graph Structured Data

论文作者

Yoshida, Tomoki, Takeuchi, Ichiro, Karasuyama, Masayuki

论文摘要

图是用于表示结构化数据的多功能工具。结果，已经研究了各种机器学习方法以进行图形数据分析。尽管许多这样的学习方法取决于输入图之间差异的测量，但定义图形适当的距离度量仍然是一个有争议的问题。因此，我们为图形分类问题提出了一种监督的距离度量学习方法。我们的方法称为可解释的图度量学习（IGML），在基于子图的特征空间中学习了判别指标，该图表空间具有很强的图表能力。通过对每个子图的重量引入稀疏性的惩罚，IGML可以识别出少数重要的子图，这些子图可以提供对给定分类任务的见解。由于我们的公式具有大量的优化变量，因此还提出了一种有效的算法，该算法也提出了基于安全筛选和工作集选择方法的修剪技术。 IGML的重要特性是保证解决方案最佳性，因为该问题被称为凸问题，而我们的修剪策略仅丢弃了不必要的子图。此外，我们表明IGML也适用于其他结构化数据，例如项目集和序列数据，并且可以通过使用基于运输的子学功能来结合顶点标签的相似性。我们从经验上评估了IGML在几个基准数据集上的计算效率和分类性能，并提供了一些说明性示例，说明了IGML如何识别给定图形数据集中的重要子图。

Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on the weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight into the given classification task. Because our formulation has a large number of optimization variables, an efficient algorithm that uses pruning techniques based on safe screening and working set selection methods is also proposed. An important property of IGML is that solution optimality is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Furthermore, we show that IGML is also applicable to other structured data such as itemset and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance of IGML on several benchmark datasets and provide some illustrative examples of how IGML identifies important subgraphs from a given graph dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题