使用预训练的模型来增强代码审核自动化

论文标题

使用预训练的模型来增强代码审核自动化

Using Pre-Trained Models to Boost Code Review Automation

论文作者

Tufano, Rosalia, Masiero, Simone, Mastropaolo, Antonio, Pascarella, Luca, Poshyvanyk, Denys, Bavota, Gabriele

论文摘要

代码审查是一种在开源和工业项目中广泛采用的实践。考虑到此类过程的不可忽略的成本，研究人员开始研究自动化特定代码审查任务的可能性。我们最近提出了针对两个任务自动化的深度学习（DL）模型：第一个模型将提交的代码作为审查和工具的输入为输入。第二个将提交的代码和以自然语言发布的审阅者评论作为输入，并自动实施审阅者所需的更改。尽管我们取得的初步结果令人鼓舞，但在相当简单的代码审查方案中都测试过这两个模型，从而实质上简化了目标问题。这也是由于我们在设计技术和实验时做出的选择。在本文中，我们通过证明预先训练的文本到文本传输变压器（T5）模型可以胜过以前的DL模型来实现自动化代码审查任务的先前DL模型，从而构建了这项工作。此外，我们对代码审查活动的更大，更现实（且具有挑战性的）数据集进行了实验。

Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently proposed Deep Learning (DL) models targeting the automation of two tasks: the first model takes as input a code submitted for review and implements in it changes likely to be recommended by a reviewer; the second takes as input the submitted code and a reviewer comment posted in natural language and automatically implements the change required by the reviewer. While the preliminary results we achieved are encouraging, both models had been tested in rather simple code review scenarios, substantially simplifying the targeted problem. This was also due to the choices we made when designing both the technique and the experiments. In this paper, we build on top of that work by demonstrating that a pre-trained Text-To-Text Transfer Transformer (T5) model can outperform previous DL models for automating code review tasks. Also, we conducted our experiments on a larger and more realistic (and challenging) dataset of code review activities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题