论文标题
如何训练您的Dragan:以任务为导向的解决方案,以进行分类不平衡
How to train your draGAN: A task oriented solution to imbalanced classification
论文作者
论文摘要
自20年前创建合成少数民族过度采样技术(SMOTE)以来,为小型和不平衡数据集建立有效分类模型的长期挑战几乎没有改善。尽管基于GAN的模型似乎很有希望,但由于大多数以前的研究都集中在应用现有模型上,因此缺乏解决上述问题的构建架构。本文提出了一种独特的,面向性能的,数据生成的策略,该策略利用了新的架构,即引用Dragan来生成少数族裔和多数样本。生成样品的目的是优化分类模型的性能,而不是与真实数据相似。我们对Smote家族的最先进方法和竞争性GAN的方法进行了基准测试,并在94个表格数据集上具有不同程度的失衡和线性性。从经验上讲,我们展示了Dragan的优势,但也强调了它的一些缺点。所有代码均提供:https://github.com/leonguertler/dragan。
The long-standing challenge of building effective classification models for small and imbalanced datasets has seen little improvement since the creation of the Synthetic Minority Over-sampling Technique (SMOTE) over 20 years ago. Though GAN based models seem promising, there has been a lack of purpose built architectures for solving the aforementioned problem, as most previous studies focus on applying already existing models. This paper proposes a unique, performance-oriented, data-generating strategy that utilizes a new architecture, coined draGAN, to generate both minority and majority samples. The samples are generated with the objective of optimizing the classification model's performance, rather than similarity to the real data. We benchmark our approach against state-of-the-art methods from the SMOTE family and competitive GAN based approaches on 94 tabular datasets with varying degrees of imbalance and linearity. Empirically we show the superiority of draGAN, but also highlight some of its shortcomings. All code is available on: https://github.com/LeonGuertler/draGAN.