代码评论中错过了哪些错误：SmartShark数据集的实证研究

论文标题

代码评论中错过了哪些错误：SmartShark数据集的实证研究

Which bugs are missed in code reviews: An empirical study on SmartSHARK dataset

论文作者

Khoshnoud, F., Nasab, A. Rezaei, Toudeji, Z., Sami, A.

论文摘要

在基于拉动的开发系统中，代码审查和提取请求评论在改善代码质量方面起着重要作用。在这样的系统中，审阅者试图通过不同的单元测试仔细检查一件代码。不幸的是，有时他们会在审查拉动请求的审查中错过错误，从而导致系统的质量下降。换句话说，在合并拉的请求后观察到错误时，就会发生灾难性的后果。缺乏对这些错误的具体理解，使我们对它们进行了调查和分类。在这项研究中，我们尝试在SmartShark数据集项目的拉动请求中识别错过的错误。我们的贡献是双重的。首先，我们假设有代码审核，代码审核评论或合并后的拉动请求评论的合并拉动请求可能会在代码审查后错过错误。我们认为这些合并的拉请请求是缺少错误的候选拉请请求。根据我们的假设，我们从77个开源GitHub项目中获得了3,261个候选拉请求。经过两轮限制性的手动分析，我们发现173个拉的请求中错过了187个错误。在第一步中，我们发现了224个包含漏斗请求的漏斗请求，并在合并了拉动请求后。其次，我们定义并完成了一个适用于我们发现的错误，然后在分析这些拉的请求后发现错误类别的分布的分类法。拉力请求中错过的错误类别及其分布是：语义（51.34％），构建（15.5％），分析检查（9.09％），兼容性（7.49％），并发性（4.28％），配置（4.28％），GUI（4.28％），GUI（2.14％），API（2.14％），2.14％％（2.14％），2.14％和2.14％和2.14％和1.4％和1.4％和（1.4％），和（1.4％），和（2.14％），和（2.14％）和（1.4％），（和2.14％），（和2.14％），（和2.14％），和（2.14％）。

In pull-based development systems, code reviews and pull request comments play important roles in improving code quality. In such systems, reviewers attempt to carefully check a piece of code by different unit tests. Unfortunately, sometimes they miss bugs in their review of pull requests, which lead to quality degradations of the systems. In other words, disastrous consequences occur when bugs are observed after merging the pull requests. The lack of a concrete understanding of these bugs led us to investigate and categorize them. In this research, we try to identify missed bugs in pull requests of SmartSHARK dataset projects. Our contribution is twofold. First, we hypothesized merged pull requests that have code reviews, code review comments, or pull request comments after merging, may have missed bugs after the code review. We considered these merged pull requests as candidate pull requests having missed bugs. Based on our assumption, we obtained 3,261 candidate pull requests from 77 open-source GitHub projects. After two rounds of restrictive manual analysis, we found 187 bugs missed in 173 pull requests. In the first step, we found 224 buggy pull requests containing missed bugs after merging the pull requests. Secondly, we defined and finalized a taxonomy that is appropriate for the bugs that we found and then found the distribution of bug categories after analysing those pull requests all over again. The categories of missed bugs in pull requests and their distributions are: semantic (51.34%), build (15.5%), analysis checks (9.09%), compatibility (7.49%), concurrency (4.28%), configuration (4.28%), GUI (2.14%), API (2.14%), security (2.14%), and memory (1.6%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题