明确的知识转移，用于弱监督的代码生成

论文标题

明确的知识转移，用于弱监督的代码生成

Explicit Knowledge Transfer for Weakly-Supervised Code Generation

论文作者

Azerbayev, Zhangir, Ni, Ansong, Schoelkopf, Hailey, Radev, Dragomir

论文摘要

大型语言模型（LLM）可以通过几次学习来获得强大的代码生成能力。相比之下，较小的型号仍然需要进行监督的微调以实现良好的性能。这样的微调需要大量特定于任务的NL代码对，这很昂贵。在本文中，我们试图借助弱监督数据，将LLM的代码生成能力转移到较小的模型中。更具体地说，我们提出了明确的知识转移（EKT），该知识转移（EKT）使用教师llm的少量功能来创建NL代码对，然后我们过滤以确保正确性并对学生进行微调。我们评估了GSM8K数据集中数学单词问题的代码解决方案的任务。我们发现，EKT不仅比在专家迭代的培训中产生更好的性能，而且还优于知识蒸馏，这是一种知识转移的另一种形式。 GPT-NEO 1.3B模型在GPT-J老师的GSM8K上@100@100通过EKT训练，而接受知识蒸馏的同一位学生和老师仅产生3.7％的通行证@100。我们还表明，学生模型有可能使用EKT胜过老师。

Large language models (LLMs) can acquire strong code-generation capabilities through few-shot learning. In contrast, supervised fine-tuning is still needed for smaller models to achieve good performance. Such fine-tuning demands a large number of task-specific NL-code pairs, which are expensive to obtain. In this paper, we attempt to transfer the code generation ability of an LLM to a smaller model with the aid of weakly-supervised data. More specifically, we propose explicit knowledge transfer (EKT), which uses the few-shot capabilities of a teacher LLM to create NL-code pairs that we then filter for correctness and fine-tune the student on. We evaluate EKT on the task of generating code solutions to math word problems from the GSM8k dataset. We find that EKT not only yields better performance than training with expert iteration, but also outperforms knowledge distillation, another form of knowledge transfer. A GPT-Neo 1.3B model trained using EKT with a GPT-J teacher achieves a 12.4% pass@100 on GSM8k, while the same student and teacher trained with knowledge distillation yield only a 3.7% pass@100. We also show that it is possible for a student model to outperform the teacher using EKT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题