论文标题
改进了具有基于及时的对比学习和基于能量的学习的通用句子嵌入
Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
论文作者
论文摘要
对比度学习已被证明可以有效增强预训练的语言模型(PLM),以得出出色的普遍句子嵌入。但是,现有的对比方法仍然有两个限制。首先,以前的工作可能会在域移位设置下获得较差的性能,从而阻碍句子在实践中的应用。我们将这种低性能归因于具有数百万参数的PLM的过度参数化。为了减轻这一点,我们提出了Promcse(对句子嵌入的迅速对比度学习),它仅训练小规模\ emph {soft stript}(即一组可训练的向量),同时保持PLMS固定。其次,对比度学习的常用NT Xent损失函数并不能完全利用监督学习环境中的艰难负面影响。为此,我们建议将基于能量的铰链损失整合起来,以增强成对的判别能力,灵感来自NT Xent损失与基于能量的学习范式之间的联系。与当前最新的句子嵌入模型相比,对七个标准语义文本相似性(STS)任务和域移位的STS任务的经验结果都显示了我们方法的有效性。我们的代码可在https://github.com/yjiangcm/promcse上公开提供。
Contrastive learning has been demonstrated to be effective in enhancing pre-trained language models (PLMs) to derive superior universal sentence embeddings. However, existing contrastive methods still have two limitations. Firstly, previous works may acquire poor performance under domain shift settings, thus hindering the application of sentence representations in practice. We attribute this low performance to the over-parameterization of PLMs with millions of parameters. To alleviate it, we propose PromCSE (Prompt-based Contrastive Learning for Sentence Embeddings), which only trains small-scale \emph{Soft Prompt} (i.e., a set of trainable vectors) while keeping PLMs fixed. Secondly, the commonly used NT-Xent loss function of contrastive learning does not fully exploit hard negatives in supervised learning settings. To this end, we propose to integrate an Energy-based Hinge loss to enhance the pairwise discriminative power, inspired by the connection between the NT-Xent loss and the Energy-based Learning paradigm. Empirical results on seven standard semantic textual similarity (STS) tasks and a domain-shifted STS task both show the effectiveness of our method compared with the current state-of-the-art sentence embedding models. Our code is publicly avaliable at https://github.com/YJiangcm/PromCSE