论文标题
预先训练的语言模型是否确实了解软件工程任务?
Do Pre-trained Language Models Indeed Understand Software Engineering Tasks?
论文作者
论文摘要
软件工程(SE)任务的人工智能(AI)最近实现了有希望的性能。在本文中,我们研究了预先训练的语言模型在多大程度上真正理解了SE任务,例如代码搜索,代码摘要等。我们通过使用各种屏蔽率和2)足够的输入子集方法对SE(AI4SE)任务的董事会(AI4SE)任务进行了全面的经验研究。然后,在不同的SE任务上评估了训练有素的模型,包括代码搜索,代码摘要和重复的错误报告检测。我们的实验结果表明,预训练的语言模型对给定输入不敏感,因此它们在这三个SE任务中实现了相似的性能。我们将这种现象称为过度解释,其中一个模型自信地做出决定而没有显着特征,或者模型在最终决策与数据集之间找到了一些无关的关系。我们的研究研究了两种减轻过度诠释现象的方法:整个掩盖策略和结合。据我们所知,我们是第一个向AI4SE社区揭示这种过度诠释现象的人,这是研究人员为模型设计投入的重要提醒,并要求在理解和实施AI4SE任务方面进行必要的未来工作。
Artificial intelligence (AI) for software engineering (SE) tasks has recently achieved promising performance. In this paper, we investigate to what extent the pre-trained language model truly understands those SE tasks such as code search, code summarization, etc. We conduct a comprehensive empirical study on a board set of AI for SE (AI4SE) tasks by feeding them with variant inputs: 1) with various masking rates and 2) with sufficient input subset method. Then, the trained models are evaluated on different SE tasks, including code search, code summarization, and duplicate bug report detection. Our experimental results show that pre-trained language models are insensitive to the given input, thus they achieve similar performance in these three SE tasks. We refer to this phenomenon as overinterpretation, where a model confidently makes a decision without salient features, or where a model finds some irrelevant relationships between the final decision and the dataset. Our study investigates two approaches to mitigate the overinterpretation phenomenon: whole word mask strategy and ensembling. To the best of our knowledge, we are the first to reveal this overinterpretation phenomenon to the AI4SE community, which is an important reminder for researchers to design the input for the models and calls for necessary future work in understanding and implementing AI4SE tasks.