Vudenc：在Python的自然代码库上进行深度学习的脆弱性检测

论文标题

Vudenc：在Python的自然代码库上进行深度学习的脆弱性检测

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

论文作者

Wartschinski, Laura, Noller, Yannic, Vogel, Thomas, Kehrer, Timo, Grunske, Lars

论文摘要

上下文：确定潜在的脆弱代码对于提高我们软件系统的安全性很重要。但是，手动检测软件漏洞需要专家知识，并且耗时，并且必须由自动化技术支持。目的：这种自动化漏洞检测技术应达到高精度，直接点开发人员到脆弱的代码片段，扩展到现实世界软件，在特定软件项目的整个边界上进行概括，并且不需要或仅需要中等设置或配置工作。方法：在本文中，我们介绍了Vudenc（在自然代码库上进行深度学习的脆弱性检测），这是一种基于深度学习的脆弱性检测工具，从大型且现实世界中的Python代码库中自动学习脆弱代码的功能。 Vudenc应用Word2Vec模型来识别语义上相似的代码令牌并提供向量表示。然后，使用长期内存单元格（LSTM）网络将其分类为细粒级别的脆弱代码令牌序列，突出显示源代码中可能包含漏洞的特定区域，并为其预测提供置信度。结果：为了评估Vudenc，我们使用了来自不同GitHub存储库中的1,009个漏洞固定订单，这些存储库中包含七种不同类型的漏洞（SQL注入，XSS，命令注入，XSRF，XSRF，远程代码执行，路径披露，开放式REDIRECT）进行培训。在实验评估中，Vudenc的召回率为78％-87％，精度为82％-96％，F1得分为80％-90％。 Vudenc的代码，漏洞的数据集和Word2Vec模型的Python语料库可用于复制。结论：我们的实验结果表明...

Context: Identifying potential vulnerable code is important to improve the security of our software systems. However, the manual detection of software vulnerabilities requires expert knowledge and is time-consuming, and must be supported by automated techniques. Objective: Such automated vulnerability detection techniques should achieve a high accuracy, point developers directly to the vulnerable code fragments, scale to real-world software, generalize across the boundaries of a specific software project, and require no or only moderate setup or configuration effort. Method: In this article, we present VUDENC (Vulnerability Detection with Deep Learning on a Natural Codebase), a deep learning-based vulnerability detection tool that automatically learns features of vulnerable code from a large and real-world Python codebase. VUDENC applies a word2vec model to identify semantically similar code tokens and to provide a vector representation. A network of long-short-term memory cells (LSTM) is then used to classify vulnerable code token sequences at a fine-grained level, highlight the specific areas in the source code that are likely to contain vulnerabilities, and provide confidence levels for its predictions. Results: To evaluate VUDENC, we used 1,009 vulnerability-fixing commits from different GitHub repositories that contain seven different types of vulnerabilities (SQL injection, XSS, Command injection, XSRF, Remote code execution, Path disclosure, Open redirect) for training. In the experimental evaluation, VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%. VUDENC's code, the datasets for the vulnerabilities, and the Python corpus for the word2vec model are available for reproduction. Conclusions: Our experimental results suggest...

下载PDF全文

下载文献需遵守相关版权规定

论文标题