论文标题
使用机器学习的信用卡欺诈检测:调查
Credit card fraud detection using machine learning: A survey
论文作者
论文摘要
信用卡欺诈已成为电子支付领域的主要问题。在这项调查中,我们研究了数据驱动的信用卡欺诈检测特殊性和几种机器学习方法,以解决其每个复杂的挑战,目的是确定代表合法卡所有者非法发行的欺诈交易。特别是,我们首先表征典型的信用卡检测任务:数据集及其属性,指标选择以及一些处理此类不平衡数据集的方法。这些问题是每个信用卡欺诈检测问题的切入点。然后,我们专注于数据集偏移(有时称为概念漂移),这是指生成数据集随时间发展的基础分布的事实:例如,卡持有人可能会在季节内改变其购买习惯,而欺诈者可能会适应其策略。这种现象可能会阻碍现实世界数据集(例如信用卡事务数据集)的机器学习方法的使用。之后,我们突出显示了用于捕获信用卡交易的顺序属性的不同方法。这些方法范围从特征工程技术(例如交易聚合)到正确的序列建模方法,例如复发性神经网络(LSTM)或图形模型(隐藏的Markov模型)。
Credit card fraud has emerged as major problem in the electronic payment sector. In this survey, we study data-driven credit card fraud detection particularities and several machine learning methods to address each of its intricate challenges with the goal to identify fraudulent transactions that have been issued illegitimately on behalf of the rightful card owner. In particular, we first characterize a typical credit card detection task: the dataset and its attributes, the metric choice along with some methods to handle such unbalanced datasets. These questions are the entry point of every credit card fraud detection problem. Then we focus on dataset shift (sometimes called concept drift), which refers to the fact that the underlying distribution generating the dataset evolves over times: For example, card holders may change their buying habits over seasons and fraudsters may adapt their strategies. This phenomenon may hinder the usage of machine learning methods for real world datasets such as credit card transactions datasets. Afterwards we highlights different approaches used in order to capture the sequential properties of credit card transactions. These approaches range from feature engineering techniques (transactions aggregations for example) to proper sequence modeling methods such as recurrent neural networks (LSTM) or graphical models (hidden markov models).