论文标题

改进了用于欺诈检测的空间图神经网络的聚合和加速培训方法

Improved Aggregating and Accelerating Training Methods for Spatial Graph Neural Networks on Fraud Detection

论文作者

Zeng, Yufan, Tang, Jiashan

论文摘要

图神经网络(GNN)已被广泛应用于众多字段。将分层结构和残留连接结合的最新工作提出了改进的深度建筑,以将伪装的GNN(CARE-GNN)扩展到被称为残留分层Care-GNN(RLC-GNN)的深模型,形成了自我校正和增量学习机制,并在欺诈检测任务上进行了重大绩效。但是,我们发现了RLC-GNN的三个问题,这些问题是相邻信息达到限制的使用,这是深层模型的固有问题,并且缺乏对节点特征和外部模式的全面考虑。在这项工作中,我们提出了三种分别解决这三个问题的方法。首先,我们建议通过余弦距离进行相似性度量,以考虑局部特征和外部模式。然后,我们结合了相似性测量模块和邻接归一化的想法与节点和批量归一化的概念,然后将部分邻域归一化方法结合在一起,然后使用训练难度来克服训练难度,同时减轻图形高密度引起的太多噪声的影响。最后,我们提出了中间信息补充以解决信息限制。实验是在Yelp和Amazon数据集上进行的。结果表明,我们提出的方法有效地解决了这三个问题。应用三种方法后,我们在Yelp数据集上分别取得了召回,AUC和Macro-f1的指标的4.81%,6.62%和6.81%的提高。我们在亚马逊数据集上分别获得了召回和AUC的1.65%和0.29%的提高。

Graph neural networks (GNNs) have been widely applied to numerous fields. A recent work which combines layered structure and residual connection proposes an improved deep architecture to extend CAmouflage-REsistant GNN (CARE-GNN) to deep models named as Residual Layered CARE-GNN (RLC-GNN), which forms self-correcting and incremental learning mechanism, and achieves significant performance improvements on fraud detection task. However, we spot three issues of RLC-GNN, which are the usage of neighboring information reaching limitation, the training difficulty which is inherent problem to deep models and lack of comprehensive consideration about node features and external patterns. In this work, we propose three approaches to solve those three problems respectively. First, we suggest conducting similarity measure via cosine distance to take both local features and external patterns into consideration. Then, we combine the similarity measure module and the idea of adjacency-wise normalization with node-wise and batch-wise normalization and then propound partial neighborhood normalization methods to overcome the training difficulty while mitigating the impact of too much noise caused by high-density of graph. Finally, we put forward intermediate information supplement to solve the information limitation. Experiments are conducted on Yelp and Amazon datasets. And the results show that our proposed methods effectively solve the three problems. After applying the three methods, we achieve 4.81%, 6.62% and 6.81% improvements in the metrics of recall, AUC and Macro-F1 respectively on the Yelp dataset. And we obtain 1.65% and 0.29% improvements in recall and AUC respectively on the Amazon datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源