论文标题
重新访问几个样本的伯特微调
Revisiting Few-sample BERT Fine-tuning
论文作者
论文摘要
本文是对BERT上下文表示的微调研究,重点是在几个样本场景中通常观察到的不稳定性。我们确定了导致这种不稳定的几个因素:具有偏置梯度估计的非标准优化方法的共同使用; BERT网络重要部分在下游任务中的有限适用性;以及使用预先确定和少量训练迭代的普遍做法。我们从经验上测试这些因素的影响,并确定解决该过程中通常不稳定的替代实践。鉴于这些观察结果,我们重新访问了最近提出的方法来通过BERT改善几次样本微调并重新评估其有效性。通常,我们观察到这些方法的影响大大减少了我们的修改过程。
This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.