论文标题
在没有真正的异常值的情况下产生人造异常值 - 调查
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey
论文作者
论文摘要
根据定义,在现实中很少观察到异常值,因此难以检测或分析。人造异常值近似于这种真实的异常值,例如,可以帮助检测真正的异常值或进行基准测试异常值检测算法。文献采用不同的方法来产生人造异常值。但是,这些方法的系统比较仍然没有。这是调查并比较这些方法。我们首先阐明该领域的术语,该术语因出版物而异,我们提出了一个普遍的问题提出。我们对产生异常值与其他研究领域(例如实验设计或生成模型)的连接的描述构成了人造异常值领域。除了提供简洁的描述外,我们还通过它们的一般概念以及它们如何利用真实实例进行分组。一项广泛的实验研究揭示了最终用于异常检测时产生方法之间的差异。这项调查显示,现有方法已经涵盖了这一代人的广泛概念,但该领域仍然具有进一步发展的潜力。我们的实验研究确实证实了人们期望产生方法的质量有很大变化,例如,就使用的数据集而言。最终,为了指导在特定环境中选择生成方法,我们提出了一个适当的一般决策过程。总而言之,这项调查包括,描述并连接了有关人造异常值的所有相关工作,并可以作为指导该领域进一步研究的基础。
By definition, outliers are rarely observed in reality, making them difficult to detect or analyse. Artificial outliers approximate such genuine outliers and can, for instance, help with the detection of genuine outliers or with benchmarking outlier-detection algorithms. The literature features different approaches to generate artificial outliers. However, systematic comparison of these approaches remains absent. This surveys and compares these approaches. We start by clarifying the terminology in the field, which varies from publication to publication, and we propose a general problem formulation. Our description of the connection of generating outliers to other research fields like experimental design or generative models frames the field of artificial outliers. Along with offering a concise description, we group the approaches by their general concepts and how they make use of genuine instances. An extensive experimental study reveals the differences between the generation approaches when ultimately being used for outlier detection. This survey shows that the existing approaches already cover a wide range of concepts underlying the generation, but also that the field still has potential for further development. Our experimental study does confirm the expectation that the quality of the generation approaches varies widely, for example, in terms of the data set they are used on. Ultimately, to guide the choice of the generation approach in a specific context, we propose an appropriate general-decision process. In summary, this survey comprises, describes, and connects all relevant work regarding the generation of artificial outliers and may serve as a basis to guide further research in the field.