论文标题

一种用于交易数据分析的基于Wasserstein的基于距离的光谱聚类方法

A Wasserstein distance-based spectral clustering method for transaction data analysis

论文作者

Zhu, Yingqiu, Huang, Danyang, Zhang, Bo

论文摘要

随着在线支付平台的快速开发,现在可以记录大量交易数据。交易数据的聚类显着有助于分析商人的行为模式。这使付款平台能够提供差异化​​服务或实施风险管理策略。但是,传统方法通过产生低维特征来利用交易,从而导致不可避免的信息丢失。在这项研究中,我们使用交易的经验累积分布来表征商人。我们采用WASSERTEIN距离来衡量任何两个商人之间的差异,并提出了基于Wasserstein-distance的光谱聚类(WSC)方法。基于商家交易分布之间的相似性,生成了商人图。因此,我们将商人的聚类视为剪切问题,并在光谱聚类的框架下解决。为了确保在具有有限的计算资源的大规模数据集上提出的方法的可行性,我们建议使用WSC(subwsc)的亚采样方法。研究了相关的理论特性,以验证所提出的方法的效率。模拟和实证研究表明,所提出的方法在寻找商人行为模式方面优于基于特征的方法。

With the rapid development of online payment platforms, it is now possible to record massive transaction data. Clustering on transaction data significantly contributes to analyzing merchants' behavior patterns. This enables payment platforms to provide differentiated services or implement risk management strategies. However, traditional methods exploit transactions by generating low-dimensional features, leading to inevitable information loss. In this study, we use the empirical cumulative distribution of transactions to characterize merchants. We adopt Wasserstein distance to measure the dissimilarity between any two merchants and propose the Wasserstein-distance-based spectral clustering (WSC) approach. Based on the similarities between merchants' transaction distributions, a graph of merchants is generated. Thus, we treat the clustering of merchants as a graph-cut problem and solve it under the framework of spectral clustering. To ensure feasibility of the proposed method on large-scale datasets with limited computational resources, we propose a subsampling method for WSC (SubWSC). The associated theoretical properties are investigated to verify the efficiency of the proposed approach. The simulations and empirical study demonstrate that the proposed method outperforms feature-based methods in finding behavior patterns of merchants.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源