流行测验！大型语言模型可以帮助逆向工程吗？

论文标题

流行测验！大型语言模型可以帮助逆向工程吗？

Pop Quiz! Can a Large Language Model Help With Reverse Engineering?

论文作者

Pearce, Hammond, Tan, Benjamin, Krishnamurthy, Prashanth, Khorrami, Farshad, Karri, Ramesh, Dolan-Gavitt, Brendan

论文摘要

大型语言模型（例如OpenAI的法典）已在软件域中显示出令人印象深刻的零击功能，包括代码说明。在这项工作中，我们检查是否可以使用此功能来帮助进行反向工程。具体而言，我们调查了提示Codex以确定代码的目的，功能和重要变量名称或值，即使代码是通过反编译产生的。除了检查模型在回答开放式问题时的回答外，我们设计了一个真/错误的测验框架来表征语言模型的性能。我们对语言模型在一组程序目的识别和信息提取任务上的测量性能进行了广泛的定量分析：在我们提出的136,260个问题中，它正确地回答了72,754。关键要点是，尽管有希望，但LLM尚未准备好进行零拍反向工程。

Large language models (such as OpenAI's Codex) have demonstrated impressive zero-shot multi-task capabilities in the software domain, including code explanation. In this work, we examine if this ability can be used to help with reverse engineering. Specifically, we investigate prompting Codex to identify the purpose, capabilities, and important variable names or values from code, even when the code is produced through decompilation. Alongside an examination of the model's responses in answering open-ended questions, we devise a true/false quiz framework to characterize the performance of the language model. We present an extensive quantitative analysis of the measured performance of the language model on a set of program purpose identification and information extraction tasks: of the 136,260 questions we posed, it answered 72,754 correctly. A key takeaway is that while promising, LLMs are not yet ready for zero-shot reverse engineering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题