论文标题

首先对语言数据影响更好

First is Better Than Last for Language Data Influence

论文作者

Yeh, Chih-Kuan, Taly, Ankur, Sundararajan, Mukund, Liu, Frederick, Ravikumar, Pradeep

论文摘要

识别有影响力的培训示例的能力使我们能够调试培训数据并解释模型行为。现有的技术是基于通过模型参数来影响训练数据影响的。对于NLP应用中的大型模型,在所有模型参数中研究此流程通常是不可行的,因此技术通常选择重量的最后一层。但是,我们观察到,由于与最后一层的重量相关的激活包含“共享逻辑”,因此通过最后一层权重计算的数据容易与``取消效应'',其中不同示例的数据影响具有相互矛盾的大小。取消效应降低了影响分数的歧视能力,并且根据此措施删除有影响力的例子通常不会使模型的行为太多。为了减轻这种情况,我们提出了一种称为Tracin的技术,我们可以修改一种称为Tracin的方法,可以在单词嵌入层而不是最后一层中进行操作,在该层中,取消效果不那么严重。一个潜在的问题是,基于单词嵌入层的影响可能无法编码足够的高级信息。但是,我们发现梯度(与嵌入不同)不会遭受这一影响,这可能是因为它们通过较高的层链。我们表明,Tracin-We在对不同模型的三个语言分类任务上的案例删除评估上显着胜过上一层应用的其他数据影响方法。此外,Tracin-We不仅可以在整体培训输入水平上产生分数,而且还可以在培训输入中的单词水平上产生分数,这是进一步的调试。

The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer significantly on the case deletion evaluation on three language classification tasks for different models. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源