机器学习的差异测试：深度学习之外的分类算法分析

论文标题

机器学习的差异测试：深度学习之外的分类算法分析

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

论文作者

Herbold, Steffen, Tunkel, Steffen

论文摘要

上下文：差分测试是一种有用的方法，它使用相同算法的不同实现，并比较软件测试的结果。近年来，这种方法成功地用于深度学习框架的测试活动。目的：对差分测试的应用不超过深度学习的知识。在本文中，我们要缩小此差距以进行分类算法。方法：我们使用Scikit-Learn，Weka，Spark Mllib和Caret进行了一项案例研究，在该案例研究中，我们通过考虑在多个框架中可用哪些算法来确定差异测试的潜力，这是通过多对算法来识别相同行为的可行性，并通过执行相同的行为，并通过执行识别的对识别的deviatiation和分析来表现出有效性。结果：尽管我们发现流行算法的潜力很大，但可行性似乎有限，因为通常无法确定其他框架中相同的配置。可行测试的执行表明，分数和类别存在大量偏差。只有基于类统计意义的宽大方法不会导致大量的测试失败。结论：超出深度学习的差异测试的潜力似乎有限用于研究机器学习库的质量。如果从业人员对实施有深入的了解，则可能仍然使用该方法，尤其是如果仅考虑班级显着差异的粗甲骨文就足够了。

Context: Differential testing is a useful approach that uses different implementations of the same algorithms and compares the results for software testing. In recent years, this approach was successfully used for test campaigns of deep learning frameworks. Objective: There is little knowledge on the application of differential testing beyond deep learning. Within this article, we want to close this gap for classification algorithms. Method: We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret in which we identify the potential of differential testing by considering which algorithms are available in multiple frameworks, the feasibility by identifying pairs of algorithms that should exhibit the same behavior, and the effectiveness by executing tests for the identified pairs and analyzing the deviations. Results: While we found a large potential for popular algorithms, the feasibility seems limited because often it is not possible to determine configurations that are the same in other frameworks. The execution of the feasible tests revealed that there is a large amount of deviations for the scores and classes. Only a lenient approach based on statistical significance of classes does not lead to a huge amount of test failures. Conclusions: The potential of differential testing beyond deep learning seems limited for research into the quality of machine learning libraries. Practitioners may still use the approach if they have deep knowledge about implementations, especially if a coarse oracle that only considers significant differences of classes is sufficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题