Shapley Head修剪：识别和消除多种语言变压器的干扰

论文标题

Shapley Head修剪：识别和消除多种语言变压器的干扰

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

论文作者

Held, William, Yang, Diyi

论文摘要

基于多语言变压器的模型通过学习和重复使用语言 - 敏捷的功能来表现出跨语言的显着零和很少的传输。但是，随着固定尺寸的模型获得更多的语言，其在所有语言中的性能都会降低，这一现象称为干扰。通常归因于模型容量有限，尽管有证据表明基于变压器的模型被过度参数化，但通常通过添加其他参数来解决干扰。在这项工作中，我们表明可以通过识别和修剪特定于语言的参数来减少干扰。首先，我们使用联盟游戏理论的信用分配度量Shapley Value来确定引入干扰的注意力头。然后，我们表明，从固定模型中删除确定的注意力头会改善句子分类和结构预测的目标语言的性能，看到增益高达24.7 \％。最后，我们使用注意力可视化提供了有关语言敏锐和语言特定注意力头的见解。

Multilingual transformer-based models demonstrate remarkable zero and few-shot transfer across languages by learning and reusing language-agnostic features. However, as a fixed-size model acquires more languages, its performance across all languages degrades, a phenomenon termed interference. Often attributed to limited model capacity, interference is commonly addressed by adding additional parameters despite evidence that transformer-based models are overparameterized. In this work, we show that it is possible to reduce interference by instead identifying and pruning language-specific parameters. First, we use Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference. Then, we show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction, seeing gains as large as 24.7\%. Finally, we provide insights on language-agnostic and language-specific attention heads using attention visualization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题