论文标题

重新审视多语言表示的跨语性相似性

Cross-lingual Similarity of Multilingual Representations Revisited

论文作者

Del, Maksym, Fishel, Mark

论文摘要

相关作品使用CKA等索引和CCA的变体来衡量多语言模型中跨语言表示的相似性。在本文中,我们认为,CKA/CCA的假设与跨语性学习分析的激励目标之一,即解释零拍的跨语言转移。我们强调了这些索引的跨语性相似性的宝贵方面无法捕获,并提供了激励人心的案例研究\ textit {以经验证明问题}。然后,我们将\ textIt {平均神经元相关(ANC)}作为一种直接的替代方案,它可以免于CKA/CCA的困难,并且在跨语言上是很好的。最后,我们使用ANC构建证据表明先前介绍的``首先对齐,预测''模式不仅在蒙版语言模型(MLMS)中进行,而且还发生在具有\ textit {Causal语言建模}目标(CLMS)的多语言模型中。此外,我们表明该模式扩展到MLMS和CLMS的\ textIt {缩放版本}(最多85x原始Mbert)。

Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源