论文标题
M-进化:基于结构图的数据扩展用于图形分类
M-Evolve: Structural-Mapping-Based Data Augmentation for Graph Classification
论文作者
论文摘要
旨在识别图的类别标签的图形分类在药物分类,毒性检测,蛋白质分析等中起重要作用。但是,基准数据集中的量表的限制使图形分类模型变得容易陷入过度拟合和不足。为了改善这一点,我们介绍了图表上的数据扩展(即图形增强),并介绍了四种方法:随机映射,顶点相似性映射,图案随机映射和图案相似性映射,以通过图形结构的图形转换来生成更弱标记的小规模基准数据集的数据。此外,我们提出了一个名为M-Evolve的通用模型演化框架,该框架结合了图扩大,数据过滤和模型重新培训以优化预训练的图形分类器。六个基准数据集的实验表明,所提出的框架有助于现有的图形分类模型减轻了小规模基准数据集的培训中的过度拟合和概括性,这成功得出了图形分类任务的平均提高3-13%的精度。
Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale in the benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. To improve this, we introduce data augmentation on graphs (i.e. graph augmentation) and present four methods:random mapping, vertex-similarity mapping, motif-random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic transformation of graph structures. Furthermore, we propose a generic model evolution framework, named M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments on six benchmark datasets demonstrate that the proposed framework helps existing graph classification models alleviate over-fitting and undergeneralization in the training on small-scale benchmark datasets, which successfully yields an average improvement of 3 - 13% accuracy on graph classification tasks.