重新审视依赖性网络指标对软件缺陷预测的影响

论文标题

重新审视依赖性网络指标对软件缺陷预测的影响

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction

论文作者

Gong, Lina, Rajbahadur, Gopi Krishnan, Hassan, Ahmed E., Jiang, Shujuan

论文摘要

已经证明，通过应用社交网络分析（SNA指标）从软件模块的依赖项图中提取的软件依赖性网络指标已显示可改善软件缺陷预测（SDP）模型的性能。但是，这些SNA指标在改善SDP模型性能方面的相对有效性已广泛争议，没有明确的共识。此外，一些常见的SDP方案，例如预测交叉转换和交叉项目中的模块（缺陷计数）中的缺陷数量仍然没有探索。与广泛使用的代码指标相比，这种缺乏关于SNA指标有效性的明确指令使我们无法建立更好的性能SDP模型。因此，通过对30个版本的9个开源软件项目的案例研究，我们研究了SNA指标与3个常用的SDP上下文（项目内部，横向和交叉项目内）和场景（缺陷计数，缺陷，缺陷分类（如果模块有缺陷为有缺陷）和努力劳动的适用性（分类）和适当的wertection f和Searte frompective w的适用性）的相对有效性。我们发现SNA指标本身或与代码指标一起改善了SDP模型的性能，而不是仅在9个研究的SDP方案中使用代码指标（三个SDP上下文中的3个SDP方案）。但是，我们注意到，在某些情况下，通过考虑SNA指标或与代码指标一起使用的改进可能仅是边际，而在其他情况下，改进可能是很大的。根据这些发现，我们建议未来的工作应：考虑SNA指标与他们的SDP模型中的代码指标以及代码指标；除了考虑自我指标和全球指标外，当训练SDP模型的行为不同时，SNA指标的两种不同类型。

Software dependency network metrics extracted from the dependency graph of the software modules by the application of Social Network Analysis (SNA metrics) have been shown to improve the performance of the Software Defect prediction (SDP) models. However, the relative effectiveness of these SNA metrics over code metrics in improving the performance of the SDP models has been widely debated with no clear consensus. Furthermore, some of the common SDP scenarios like predicting the number of defects in a module (Defect-count) in Cross-version and Cross-project SDP contexts remain unexplored. Such lack of clear directive on the effectiveness of SNA metrics when compared to the widely used code metrics prevents us from potentially building better performing SDP models. Therefore, through a case study of 9 open source software projects across 30 versions, we study the relative effectiveness of SNA metrics when compared to code metrics across 3 commonly used SDP contexts (Within-project, Cross-version and Cross-project) and scenarios (Defect-count, Defect-classification (classifying if a module is defective) and Effort-aware (ranking the defective modules w.r.t to the involved effort)). We find the SNA metrics by themselves or along with code metrics improve the performance of SDP models over just using code metrics on 5 out of the 9 studied SDP scenarios (three SDP scenarios across three SDP contexts). However, we note that in some cases the improvements afforded by considering SNA metrics over or alongside code metrics might only be marginal, whereas in other cases the improvements could be potentially large. Based on these findings we suggest that the future work should: consider SNA metrics alongside code metrics in their SDP models; as well as consider Ego metrics and Global metrics, the two different types of the SNA metrics separately when training SDP models as they behave differently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题