Cebab：估计现实世界概念对NLP模型行为的因果影响

论文标题

Cebab：估计现实世界概念对NLP模型行为的因果影响

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

论文作者

Abraham, Eldar David, D'Oosterlinck, Karel, Feder, Amir, Gat, Yair Ori, Geiger, Atticus, Potts, Christopher, Reichart, Roi, Wu, Zhengxuan

论文摘要

现代ML系统的规模和复杂性不断提高，提高了其预测能力，但使他们的行为更加难以解释。许多用于模型解释的技术已经开发出来，但我们缺乏评估这些技术的明确标准。在本文中，我们将模型解释为因果推断问题，即估计现实世界概念对给定实际输入数据的ML模型的输出行为的因果影响。我们介绍了Cebab，这是一种新的基准数据集，用于评估自然语言处理中的基于概念的解释方法（NLP）。 Cebab由简短的餐厅评论和人类产生的反事实评论组成，其中修改了用餐体验的方面（食物，噪音，氛围，服务）。原始和反事实评论在方面级别和评论级别的多重验证情感评分注释。 Cebab的丰富结构使我们能够超越输入特征，以研究抽象，现实世界概念对模型行为的影响。我们使用cebab比较了涵盖问题的不同假设和概念的一系列基于概念的解释方法的质量，我们试图建立自然指标来对这些方法进行比较评估。

The increasing size and complexity of modern ML systems has improved their predictive capabilities but made their behavior harder to explain. Many techniques for model explanation have been developed in response, but we lack clear criteria for assessing these techniques. In this paper, we cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models given actual input data. We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP). CEBaB consists of short restaurant reviews with human-generated counterfactual reviews in which an aspect (food, noise, ambiance, service) of the dining experience was modified. Original and counterfactual reviews are annotated with multiply-validated sentiment ratings at the aspect-level and review-level. The rich structure of CEBaB allows us to go beyond input features to study the effects of abstract, real-world concepts on model behavior. We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem, and we seek to establish natural metrics for comparative assessments of these methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题