论文标题
Fl-Tuning:用于变压器中的前馈网络的图层调整
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
论文作者
论文摘要
及时调整是将预先训练的语言模型调整到下游任务的一种新兴方法。但是,现有的研究主要是为输入序列增加提示。由于中间多头自我注意和前进网络计算,这种方式将无法正常工作,从而使模型优化不是很好。因此,我们提出了一种称为图层调整的新颖调整方式,旨在在变压器层中添加可学习的参数。具体而言,我们专注于变压器中的馈电网络的图层调整,即fl-Tuning。它将其他单元引入每个馈送网络的隐藏层。我们对公共线索基准进行了广泛的实验。结果表明:1)在几乎所有情况下,我们的FL-tuning thuning unthun tosport inflydata和少量设置下都促进了调整方法。特别是,它在WSC 1.0上的准确性提高了17.93%(全数据设置),而F1上的精度则提高了Cluener上P-Tuning V2上的Cluener上的精度(少量设置)。 2)我们的FL-调整更稳定,收敛速度比P-Tuning V2快约1.17倍。 3)只有大约3%的变压器参数要训练,因此在大多数数据集中进行了微调,并且在几个数据集上的精确调整(例如,在WSC 1.1上的准确性提高了12.9%)。源代码可从https://github.com/genggui001/fl-tuning获得。
Prompt tuning is an emerging way of adapting pre-trained language models to downstream tasks. However, the existing studies are mainly to add prompts to the input sequence. This way would not work as expected due to the intermediate multi-head self-attention and feed-forward network computation, making model optimization not very smooth. Hence, we propose a novel tuning way called layer tuning, aiming to add learnable parameters in Transformer layers. Specifically, we focus on layer tuning for feed-forward network in the Transformer, namely FL-tuning. It introduces additional units into the hidden layer of each feed-forward network. We conduct extensive experiments on the public CLUE benchmark. The results show that: 1) Our FL-tuning outperforms prompt tuning methods under both full-data and few-shot settings in almost all cases. In particular, it improves accuracy by 17.93% (full-data setting) on WSC 1.0 and F1 by 16.142% (few-shot setting) on CLUENER over P-tuning v2. 2) Our FL-tuning is more stable and converges about 1.17 times faster than P-tuning v2. 3) With only about 3% of Transformer's parameters to be trained, FL-tuning is comparable with fine-tuning on most datasets, and significantly outperforms fine-tuning (e.g., accuracy improved by 12.9% on WSC 1.1) on several datasets. The source codes are available at https://github.com/genggui001/FL-Tuning.