论文标题
神经网络中的紧急语言结构是脆弱的
Emergent Linguistic Structures in Neural Networks are Fragile
论文作者
论文摘要
据报道,大型语言模型(LLMS)在自然语言处理任务上具有很强的性能。但是,诸如准确性之类的性能指标并不能以鲁棒代表复杂语言结构的能力来衡量模型的质量。在本文中,侧重于语言模型代表语法的能力,我们提出了一个框架来评估语言表示的一致性和鲁棒性。为此,我们介绍了神经网络模型的鲁棒性度量,这些度量通过探测任务从LLM提取语言构造的最新进展,即用于提取有关语言模型的单个方面的有意义信息,例如语言模型,例如语法重建和根识别。从经验上讲,我们通过分析其在语法保护扰动方面的性能和鲁棒性来研究六个不同语料库中四个LLM的性能。我们提供的证据表明,在某些情况下,无上下文的表示(例如,手套)具有与现代LLM(例如BERT)的上下文相关表示的竞争,但对于语法传播扰动而言同样脆弱。我们的主要观察结果是,神经网络中的新兴句法表示脆弱。我们为社区提供了代码,经过训练的模型和日志,以贡献有关LLMS功能的辩论。
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks. However, performance metrics such as accuracy do not measure the quality of the model in terms of its ability to robustly represent complex linguistic structures. In this paper, focusing on the ability of language models to represent syntax, we propose a framework to assess the consistency and robustness of linguistic representations. To this end, we introduce measures of robustness of neural network models that leverage recent advances in extracting linguistic constructs from LLMs via probing tasks, i.e., simple tasks used to extract meaningful information about a single facet of a language model, such as syntax reconstruction and root identification. Empirically, we study the performance of four LLMs across six different corpora on the proposed robustness measures by analysing their performance and robustness with respect to syntax-preserving perturbations. We provide evidence that context-free representation (e.g., GloVe) are in some cases competitive with context-dependent representations from modern LLMs (e.g., BERT), yet equally brittle to syntax-preserving perturbations. Our key observation is that emergent syntactic representations in neural networks are brittle. We make the code, trained models and logs available to the community as a contribution to the debate about the capabilities of LLMs.