代码脆弱性标识的多视图预训练模型

论文标题

代码脆弱性标识的多视图预训练模型

Multi-View Pre-Trained Model for Code Vulnerability Identification

论文作者

Jiang, Xuxiang, Xiao, Yinhao, Wang, Jun, Zhang, Wei

论文摘要

脆弱性识别对于软件相关行业的网络安全至关重要。早期识别方法需要在制作功能或注释脆弱的代码方面进行重大手动努力。尽管最近的预培训模型减轻了这个问题，但他们忽略了法规本身中包含的多个丰富的结构信息。在本文中，我们提出了一种新型的多视图预训练模型（MV-PTM），该模型（MV-PTM）编码源代码的顺序和多类型结构信息，并使用对比度学习来增强代码表示。在两个公共数据集上进行的实验证明了MV-PTM的优势。特别是，就F1分数而言，MV-PTM平均将GraphCodebert提高了3.36 \％。

Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained models alleviate this issue, they overlook the multiple rich structural information contained in the code itself. In this paper, we propose a novel Multi-View Pre-Trained Model (MV-PTM) that encodes both sequential and multi-type structural information of the source code and uses contrastive learning to enhance code representations. The experiments conducted on two public datasets demonstrate the superiority of MV-PTM. In particular, MV-PTM improves GraphCodeBERT by 3.36\% on average in terms of F1 score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题