论文标题
摇篮:基于语义依赖性学习的深度代码检索
CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning
论文作者
论文摘要
代码检索是程序员在开源存储库中重用现有代码段的常见做法。给定用户查询(即自然语言描述),代码检索旨在从一组代码段中搜索最相关的代码。有效代码检索的主要挑战在于减轻自然语言描述和代码段之间的语义差距。随着可用的开源代码不断增加的数量,最近的研究诉诸神经网络,以学习两种来源之间的语义匹配关系。语句级别的依赖性信息强调了执行过程中程序语句之间的依赖关系,它反映了代码中一个语句的结构重要性,这有利于准确捕获代码语义语义,但从未探索过代码检索任务。在本文中,我们提出了Cradle,这是一种基于陈述级语义依赖性学习的新型代码检索方法。具体而言,摇篮通过在语句级别融合依赖关系和语义信息,然后学习每个代码和描述对以建模匹配关系的统一向量表示。对现实世界数据集的全面实验和分析表明,所提出的方法可以准确检索给定查询的代码段,并显着优于最先进的任务方法。
Code retrieval is a common practice for programmers to reuse existing code snippets in open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching for the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches to the task.