将预先训练的代码模型压缩为3 MB

论文标题

将预先训练的代码模型压缩为3 MB

Compressing Pre-trained Models of Code into 3 MB

论文作者

Shi, Jieke, Yang, Zhou, Xu, Bowen, Kang, Hong Jin, Lo, David

论文摘要

尽管大型预培训的代码模型在各种代码处理任务中取得了重大进步，但在软件开发人员的日常工作流程中，这些强大的模型广泛地采用这些强大的模型存在障碍：这些大型模型的内存记忆力并缓慢地在个人设备上慢慢运行，这会导致模型部署和降低用户体验的问题。它激励我们提出压缩机，这是一种新颖的方法，可以将预先训练的代码模型压缩为具有可忽略的绩效牺牲的极小模型。我们提出的方法为简化了预训练的模型体系结构的设计制定了小型模型的设计：寻找一个遵循类似于原始预培训模型的体系结构设计的明显较小的模型。压缩机提出了一种基于遗传算法（GA）的策略，以指导简化过程。先前的研究发现，具有较高计算成本的模型往往更强大。受此见解的启发，GA算法旨在最大化模型的Giga浮点操作（GFLOPS），这是模型计算成本的指标，以满足目标模型大小的约束。然后，我们使用知识蒸馏技术来训练小型模型：未标记的数据被输入大型模型，并将输出用作训练小型模型的标签。我们使用两个最先进的预训练模型（即Codebert和GraphCodebert）评估压缩机，即两个重要的任务，即脆弱性预测和克隆检测。我们使用我们的方法将预训练的模型压缩为尺寸（3 MB），比原始尺寸小160美元$ \ times $。结果表明，压缩Codebert和GraphCodebert分别比推断时的原始型号快4.31 $ \ times $和4.15 $ \ times $。更重要的是，...

Although large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in software developers' daily workflow: these large models consume hundreds of megabytes of memory and run slowly on personal devices, which causes problems in model deployment and greatly degrades the user experience. It motivates us to propose Compressor, a novel approach that can compress the pre-trained models of code into extremely small models with negligible performance sacrifice. Our proposed method formulates the design of tiny models as simplifying the pre-trained model architecture: searching for a significantly smaller model that follows an architectural design similar to the original pre-trained model. Compressor proposes a genetic algorithm (GA)-based strategy to guide the simplification process. Prior studies found that a model with higher computational cost tends to be more powerful. Inspired by this insight, the GA algorithm is designed to maximize a model's Giga floating-point operations (GFLOPs), an indicator of the model computational cost, to satisfy the constraint of the target model size. Then, we use the knowledge distillation technique to train the small model: unlabelled data is fed into the large model and the outputs are used as labels to train the small model. We evaluate Compressor with two state-of-the-art pre-trained models, i.e., CodeBERT and GraphCodeBERT, on two important tasks, i.e., vulnerability prediction and clone detection. We use our method to compress pre-trained models to a size (3 MB), which is 160$\times$ smaller than the original size. The results show that compressed CodeBERT and GraphCodeBERT are 4.31$\times$ and 4.15$\times$ faster than the original model at inference, respectively. More importantly, ...

下载PDF全文

下载文献需遵守相关版权规定

论文标题