通过自动关系推断对深度学习库的模糊库

论文标题

通过自动关系推断对深度学习库的模糊库

Fuzzing Deep-Learning Libraries via Automated Relational API Inference

论文作者

Deng, Yinlin, Yang, Chenyuan, Wei, Anjiang, Zhang, Lingming

论文摘要

越来越多的研究专门用于DL模型测试。但是，在测试DL库中仍然有限的工作，这些库是建筑，培训和运行DL模型的基础。关于模糊DL库的先前工作只能为API生成测试，这些API已通过文档示例，开发人员测试或DL模型进行了调用，并留下大量未经测试的API。在本文中，我们提出了DeepRel，这是自动推断关系API的第一种方法，以进行更有效的DL库模糊。我们的基本假设是，对于正在测试的DL库，可能存在许多API共享类似的输入参数和输出。这样，我们可以轻松地“借用”从调用API的测试输入来测试其他关系API。此外，我们正式地将关系等效性的概念和关系API作为有效错误发现的甲骨文的概念。 We have implemented DeepREL as a fully automated end-to-end relational API inference and fuzzing technique for DL libraries, which 1) automatically infers potential API relations based on API syntactic or semantic information, 2) synthesizes concrete test programs for invoking relational APIs, 3) validates the inferred relational APIs via representative test inputs, and finally 4) performs fuzzing on the verified relational API发现潜在的不一致。我们对两个最受欢迎的DL库Pytorch和Tensorflow的评估表明，DeepRel比最先进的Freefuzz可以覆盖157％的API。迄今为止，DeepRel总共检测到了162个错误，开发人员已经确认了106个错误是以前未知的错误。令人惊讶的是，DeepRel在三个月内检测到整个Pytorch发行跟踪系统的高优先级错误的13.5％。另外，除了162个代码错误外，我们还检测到14个文档错误（均已确认）。

A growing body of research has been dedicated to DL model testing. However, there is still limited work on testing DL libraries, which serve as the foundations for building, training, and running DL models. Prior work on fuzzing DL libraries can only generate tests for APIs which have been invoked by documentation examples, developer tests, or DL models, leaving a large number of APIs untested. In this paper, we propose DeepREL, the first approach to automatically inferring relational APIs for more effective DL library fuzzing. Our basic hypothesis is that for a DL library under test, there may exist a number of APIs sharing similar input parameters and outputs; in this way, we can easily "borrow" test inputs from invoked APIs to test other relational APIs. Furthermore, we formalize the notion of value equivalence and status equivalence for relational APIs to serve as the oracle for effective bug finding. We have implemented DeepREL as a fully automated end-to-end relational API inference and fuzzing technique for DL libraries, which 1) automatically infers potential API relations based on API syntactic or semantic information, 2) synthesizes concrete test programs for invoking relational APIs, 3) validates the inferred relational APIs via representative test inputs, and finally 4) performs fuzzing on the verified relational APIs to find potential inconsistencies. Our evaluation on two of the most popular DL libraries, PyTorch and TensorFlow, demonstrates that DeepREL can cover 157% more APIs than state-of-the-art FreeFuzz. To date, DeepREL has detected 162 bugs in total, with 106 already confirmed by the developers as previously unknown bugs. Surprisingly, DeepREL has detected 13.5% of the high-priority bugs for the entire PyTorch issue-tracking system in a three-month period. Also, besides the 162 code bugs, we have also detected 14 documentation bugs (all confirmed).

下载PDF全文

下载文献需遵守相关版权规定

论文标题