论文标题
当分类语法角色时,伯特不在乎单词顺序...除非很重要
When classifying grammatical role, BERT doesn't care about word order... except when it matters
论文作者
论文摘要
因为含义通常可以仅从词汇语义上推断出来,所以单词顺序通常是自然语言中的多余提示。例如,切碎的单词,厨师和洋葱的单词更有可能传达“厨师切碎洋葱”,而不是“洋葱切碎的厨师”。最近的工作表明,大型语言模型令人惊讶地是单词顺序不变的,但至关重要的是,在很大程度上被认为是天然原型输入,其中构图的含义主要与词汇期望相匹配。为了克服这种混杂,我们在英语bert和gpt-2中探测语法角色表示,在词汇期望不够的情况下,对于正确的分类,单词顺序知识是必要的。这种非概况的实例是自然存在的英语句子,带有无生命的主题或动画对象,或者我们系统地将论点换成句子以制作诸如``洋葱切碎厨师''之类的句子。我们发现,尽管早期层的嵌入在很大程度上是词汇,但单词顺序实际上对于在语义上非原型位置中定义单词的后期表示至关重要。我们的实验隔离了单词顺序对上下文化过程的影响,并突出了模型如何在不常见但至关重要的情况下使用上下文。
Because meaning can often be inferred from lexical semantics alone, word order is often a redundant cue in natural language. For example, the words chopped, chef, and onion are more likely used to convey "The chef chopped the onion," not "The onion chopped the chef." Recent work has shown large language models to be surprisingly word order invariant, but crucially has largely considered natural prototypical inputs, where compositional meaning mostly matches lexical expectations. To overcome this confound, we probe grammatical role representation in English BERT and GPT-2, on instances where lexical expectations are not sufficient, and word order knowledge is necessary for correct classification. Such non-prototypical instances are naturally occurring English sentences with inanimate subjects or animate objects, or sentences where we systematically swap the arguments to make sentences like "The onion chopped the chef". We find that, while early layer embeddings are largely lexical, word order is in fact crucial in defining the later-layer representations of words in semantically non-prototypical positions. Our experiments isolate the effect of word order on the contextualization process, and highlight how models use context in the uncommon, but critical, instances where it matters.