论文标题
软件测试任务的加固学习框架的比较
A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks
论文作者
论文摘要
软件测试活动仔细检查了工件和软件产品的行为,以找到可能的缺陷,并确保产品满足其预期要求。最近,深入的增强学习(DRL)已成功地用于复杂的测试任务,例如游戏测试,回归测试和测试案例优先级,以自动化该过程并提供持续的适应。从业者可以通过从头开始实现DRL算法或使用DRL框架来使用DRL。 DRL框架提供了维护良好的实施最先进的DRL算法,以促进和加快DRL应用程序的开发。开发人员已广泛使用这些框架来解决包括软件测试在内的各个领域中的问题。但是,据我们所知,尚无研究从经验上评估DRL框架中实施算法的有效性和性能。此外,文献中缺乏一些准则,可以帮助从业者选择一个DRL框架而不是另一个DRL框架。在本文中,我们凭经验研究了精心选择的DRL算法在两个重要的软件测试任务上的应用:在连续集成(CI)和游戏测试的背景下进行测试案例的优先级。对于游戏测试任务,我们对简单游戏进行实验,并使用DRL算法探索游戏以检测错误。结果表明,某些选定的DRL框架,例如Tensorforce优于文献的最新方法。为了确定测试用例的优先级,我们在CI环境上进行实验,其中使用来自不同框架的DRL算法来对测试用例进行排名。我们的结果表明,在某些情况下,实施算法之间的性能差异很大,激励了进一步的研究。
Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.