论文标题
转向神经编程界面
Towards Neural Programming Interfaces
论文作者
论文摘要
众所周知,很难控制人工神经网络(例如生成神经语言模型)的行为。我们重述了控制自然语言生成的问题,即学习与验证的语言模型接口,就像应用程序编程接口(API)通过更改超参数来控制程序的行为一样。在这个新的范式中,专门的神经网络(称为神经编程界面或NPI)通过操纵预审预周化的模型的隐藏激活来产生所需的输出,从而与预审前的语言模型进行接口。重要的是,没有对原始模型的权重进行永久性更改,从而使我们可以在不覆盖语言模型的任何方面的新任务预审计的模型。我们还贡献了一种新的数据集构建算法和受GAN启发的损失函数,该损失功能使我们能够训练NPI模型控制自回归变压器的输出。在针对其他最先进方法的实验中,我们使用OpenAI的GPT-2模型证明了方法的功效,成功控制了名词选择,主题厌恶,进攻性语音过滤以及语言的其他方面,同时在很大程度上维持了在确定性设置下受控模型的流动性。
It is notoriously difficult to control the behavior of artificial neural networks such as generative neural language models. We recast the problem of controlling natural language generation as that of learning to interface with a pretrained language model, just as Application Programming Interfaces (APIs) control the behavior of programs by altering hyperparameters. In this new paradigm, a specialized neural network (called a Neural Programming Interface or NPI) learns to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. Importantly, no permanent changes are made to the weights of the original model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model. We also contribute a new data set construction algorithm and GAN-inspired loss function that allows us to train NPI models to control outputs of autoregressive transformers. In experiments against other state-of-the-art approaches, we demonstrate the efficacy of our methods using OpenAI's GPT-2 model, successfully controlling noun selection, topic aversion, offensive speech filtering, and other aspects of language while largely maintaining the controlled model's fluency under deterministic settings.