关于软件脆弱性评估模型的细粒度易受攻击的代码语句

论文标题

关于软件脆弱性评估模型的细粒度易受攻击的代码语句

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

论文作者

Le, Triet H. M., Babar, M. Ali

论文摘要

许多研究开发了机器学习（ML）方法，以检测引起此类SVS的功能和细粒度的代码语句中的软件漏洞（SVS）。但是，利用此类检测输出进行数据驱动的SV评估几乎没有工作，以提供有关SVS的可剥削性，影响和严重性的信息。该信息对于了解SV并确定其修复优先级很重要。使用200个现实世界项目中429个SV的1,782个功能的大规模数据，我们研究了自动化功能级SV评估任务的ML模型，即预测七个常见的脆弱性评分系统（CVSS）指标。我们特别研究了弱势陈述作为开发评估模型的输入的价值和使用，因为功能中的SV源于这些陈述。我们表明，弱势陈述的大小小于5.8倍，但表现出7.5-114.5％的评估绩效（Matthews相关系数（MCC））是不可抛光的陈述。结合脆弱声明的上下文进一步提高了8.9％的绩效（0.64 mcc和0.75 F1得分）。总体而言，我们为功能级别的SV评估提供了最初但有希望的基于ML的基线，为朝这个方向进行进一步研究铺平了道路。

Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题