论文标题
小型注意适配器:上下文比参数数量更重要
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters
论文作者
论文摘要
适配器调整是一种范式,它通过添加和调整少数新参数将验证的语言模型转移到下游任务。先前提出的适配器体系结构都是前馈神经网络。在本文中,我们调查了使用微小注意事项的有效性 - 即,作为适配器的注意力极小的注意力。我们的小项专职适配器学会了修改直接在所有其他位置上的隐藏状态下的隐藏状态,这被先前提议的适配器所遗漏。此外,我们将其多个注意力头视为专家的混合,并建议在部署期间平均其权重,从而进一步降低其推理计算成本。在胶水基准上,我们的小注意适配器优于其他参数有效的传输学习方法以及完整的微调,而仅更新了0.05%的参数。在Ligglue基准上,其性能与GPT-3和PET相当。
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters. Previously proposed adapter architectures are all feed-forward neural networks. In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters. Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions, which is missed by the previously proposed adapters. Moreover, we view its multiple attention heads as a mixture of experts and propose to average their weights during deployment, which further reduces its inference computation cost. On the GLUE benchmark, our tiny-attention adapter outperforms the other parameter-efficient transfer learning methods as well as full fine-tuning while only updating 0.05% of the parameters. On the FewGLUE benchmark, its performance is comparable to that of GPT-3 and PET.