通过操纵验证语言模型的隐藏状态进行分类任务来调整参数有效的调整

论文标题

通过操纵验证语言模型的隐藏状态进行分类任务来调整参数有效的调整

Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained Language Models For Classification Tasks

论文作者

Yang, Haoran, Li, Piji, Lam, Wai

论文摘要

参数有效的调整旨在通过优化一些引入的参数，同时冻结预审计的语言模型（PLM）来提炼下游任务。连续的及时调整将一些可训练的向量置于输入的嵌入是其中一种方法之一，并且由于其有效性和效率而引起了很多关注。这种方法家族可以说明是在PLMS内部施加隐藏状态的非线性转换。但是，自然的问题被忽略了：可以直接将隐藏状态直接用于分类而不更改它们？在本文中，我们旨在通过提出一种仅引入三个可训练的向量的简单调整方法来回答这个问题。首先，我们使用引入的向量集成了所有隐藏状态。然后，我们将集成的隐藏状态输入到特定于任务的线性分类器以预测类别。该方案类似于Elmo使用隐藏状态的方式，除了它们将隐藏状态馈送到基于LSTM的模型。尽管我们提出的调整方案很简单，但它通过迅速调整方法（如P-Tuning和P-Tuning V2）实现了可比性的性能，并验证原始的隐藏状态确实包含用于分类任务的有用信息。此外，我们的方法比在时间和参数数量方面更优于迅速调整。

Parameter-efficient tuning aims to distill knowledge for downstream tasks by optimizing a few introduced parameters while freezing the pretrained language models (PLMs). Continuous prompt tuning which prepends a few trainable vectors to the embeddings of input is one of these methods and has drawn much attention due to its effectiveness and efficiency. This family of methods can be illustrated as exerting nonlinear transformations of hidden states inside PLMs. However, a natural question is ignored: can the hidden states be directly used for classification without changing them? In this paper, we aim to answer this question by proposing a simple tuning method which only introduces three trainable vectors. Firstly, we integrate all layers hidden states using the introduced vectors. And then, we input the integrated hidden state(s) to a task-specific linear classifier to predict categories. This scheme is similar to the way ELMo utilises hidden states except that they feed the hidden states to LSTM-based models. Although our proposed tuning scheme is simple, it achieves comparable performance with prompt tuning methods like P-tuning and P-tuning v2, verifying that original hidden states do contain useful information for classification tasks. Moreover, our method has an advantage over prompt tuning in terms of time and the number of parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题