论文标题
除了提示:通过聚类表示,使预训练的语言模型更好地零射门学习者
Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations
论文作者
论文摘要
最近的工作表明,预训练的语言模型(PLM)是零摄的学习者。但是,大多数现有的零射方法涉及大量的人类工程或复杂的自我训练管道,从而阻碍了它们对新情况的应用。在这项工作中,我们表明,只需将文本群集中在PLM的嵌入空间中,就可以改进零击文本分类。具体而言,我们使用班级名称初始化了群集位置和形状后,使用贝叶斯高斯混合物模型拟合未标记的文本。尽管它很简单,但这种方法在主题和情感分类数据集上取得了卓越或可比性的性能,并且在不平衡的数据集上效果均优于先验。我们通过在14个数据集上评估聚类方法的适用性,该数据集具有更多样化的主题,文本长度和类数量。与迅速的零击学习相比,我们的方法平均实现了20%的绝对改善。最后,我们比较了不同的PLM嵌入空间,并发现文本被主题众多,即使PLM未明确预训练以生成有意义的句子嵌入。这项工作表明,PLM嵌入可以在没有特定任务的微调的情况下对文本进行分类,从而提供了一种新的方法来分析和利用其知识和零摄像的学习能力。
Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically, we fit the unlabeled texts with a Bayesian Gaussian Mixture Model after initializing cluster positions and shapes using class names. Despite its simplicity, this approach achieves superior or comparable performance on both topic and sentiment classification datasets and outperforms prior works significantly on unbalanced datasets. We further explore the applicability of our clustering approach by evaluating it on 14 datasets with more diverse topics, text lengths, and numbers of classes. Our approach achieves an average of 20% absolute improvement over prompt-based zero-shot learning. Finally, we compare different PLM embedding spaces and find that texts are well-clustered by topics even if the PLM is not explicitly pre-trained to generate meaningful sentence embeddings. This work indicates that PLM embeddings can categorize texts without task-specific fine-tuning, thus providing a new way to analyze and utilize their knowledge and zero-shot learning ability.