使用预训练的双重编码器对常识性质进行建模

论文标题

使用预训练的双重编码器对常识性质进行建模

Modelling Commonsense Properties using Pre-Trained Bi-Encoders

论文作者

Gajbhiye, Amit, Espinosa-Anke, Luis, Schockaert, Steven

论文摘要

掌握日常概念的常识性能是语言理解的重要先决条件。据报道，情境化的语言模型能够以人级的准确性预测这种常识性能，但我们认为由于培训和测试概念之间的相似性很高，因此这种结果已经被夸大了。这意味着捕获概念相似性的模型也可以表现良好，即使它们没有捕获常识性能的任何知识。在训练和测试期间所考虑的属性之间没有重叠的设置中，我们发现标准语言模型的经验性能会大大降低。为了解决这个问题，我们研究了微调语言模型的可能性，以明确模型概念及其属性。特别是，我们在两种类型的可用数据上训练单独的概念和属性编码器：提取的attrey式hypernym对和通用句子。我们的实验结果表明，所得的编码器使我们能够以比直接微调语言模型更高的准确性来预测常识性能。我们还提出了无监督高表现发现的相关任务的实验结果。

Grasping the commonsense properties of everyday concepts is an important prerequisite to language understanding. While contextualised language models are reportedly capable of predicting such commonsense properties with human-level accuracy, we argue that such results have been inflated because of the high similarity between training and test concepts. This means that models which capture concept similarity can perform well, even if they do not capture any knowledge of the commonsense properties themselves. In settings where there is no overlap between the properties that are considered during training and testing, we find that the empirical performance of standard language models drops dramatically. To address this, we study the possibility of fine-tuning language models to explicitly model concepts and their properties. In particular, we train separate concept and property encoders on two types of readily available data: extracted hyponym-hypernym pairs and generic sentences. Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible by directly fine-tuning language models. We also present experimental results for the related task of unsupervised hypernym discovery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题