论文标题
在代码属性图中代表LLVM-IR
Representing LLVM-IR in a Code Property Graph
论文作者
论文摘要
在过去的几年中,已经提出了许多静态应用程序安全测试工具,该工具使用所谓的代码属性图,这是一种图形模型,可在允许其用户编写语言 - 敏捷分析的同时,可保留有关源代码的丰富信息。但是,他们遭受了几个缺点。他们主要在源代码上工作,并排除第三方依赖性的分析,如果它们仅作为编译二进制文件可用。此外,他们的分析限制了是否支持单个编程语言。尽管包括包括C/C ++或Java等良好公认的语言的支持,但由于语言设计的不断变化,但仍未考虑仍在不断发展的语言(例如Rust)。为了克服这些局限性,我们扩展了代码属性图的开源实现以支持LLVM-IR,该实现可以由许多编译器和二进制提升器用作输出。在本文中,我们讨论如何解决将中间表示概念映射到CPG时产生的挑战。同时,我们优化结果图是最小的,并且接近等效源代码的表示。我们的评估表明,现有的分析可以在没有修改的情况下重复使用,并且性能要求与在源代码上操作相当。这使该方法适合分析大型项目。
In the past years, a number of static application security testing tools have been proposed which make use of so-called code property graphs, a graph model which keeps rich information about the source code while enabling its user to write language-agnostic analyses. However, they suffer from several shortcomings. They work mostly on source code and exclude the analysis of third-party dependencies if they are only available as compiled binaries. Furthermore, they are limited in their analysis to whether an individual programming language is supported or not. While often support for well-established languages such as C/C++ or Java is included, languages that are still heavily evolving, such as Rust, are not considered because of the constant changes in the language design. To overcome these limitations, we extend an open source implementation of a code property graph to support LLVM-IR which can be used as output by many compilers and binary lifters. In this paper, we discuss how we address challenges that arise when mapping concepts of an intermediate representation to a CPG. At the same time, we optimize the resulting graph to be minimal and close to the representation of equivalent source code. Our evaluation indicates that existing analyses can be reused without modifications and that the performance requirements are comparable to operating on source code. This makes the approach suitable for an analysis of large-scale projects.