论文标题
(DIS)协议的歌曲:评估自然语言处理中可解释的人工智能的评估
A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing
论文作者
论文摘要
在NLP社区中,关于是否可以将注意力重量用作解释 - 一种解释每个输入令牌对特定预测的重要性的机制。迄今为止,通过计算使用基于LSTM的模型的基于注意力的解释和现有特征归因解释之间的等级相关性来评估“注意力解释”的有效性。在我们的工作中,我们(i)比较了两种类型的NLP任务上的五种最新功能归因方法和两种基于注意力的方法之间的等级相关性,并且(ii)将此分析扩展到还包括基于变压器的模型。我们发现,无论模型或任务如何,基于注意力的解释都与任何最近的特征归因方法都没有密切相关。此外,我们发现,在基于变压器的模型中,未经测试的解释均没有与彼此之间的密切相关,这使我们质疑基本假设,即我们应该基于它们与现有特征归因解释方法的相关性如何衡量基于注意力的解释的有效性。在使用两个不同模型对五个数据集进行了实验之后,我们认为社区应停止使用等级相关性作为基于注意力的解释的评估指标。我们建议研究人员和从业人员应该测试各种解释方法,并采用人类的过程来确定解释是否与当前特定用例的人类直觉保持一致。
There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation - a mechanism for interpreting how important each input token is for a particular prediction. The validity of "attention as explanation" has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.