语言特性的因果影响

论文标题

语言特性的因果影响

Causal Effects of Linguistic Properties

论文作者

Pryzant, Reid, Card, Dallas, Jurafsky, Dan, Veitch, Victor, Sridhar, Dhanya

论文摘要

我们考虑使用观察数据来估计语言特性的因果影响的问题。例如，写投诉礼貌地导致响应时间更快？积极的产品审查会增加销售量多少？本文在制定一种实用方法之前解决了与该问题有关的两个技术挑战。首先，我们将利益因果量形式化为作者意图的影响，并确定从观察数据中确定这一点所必需的假设。其次，实际上，我们只能获得有关感兴趣的语言特性的嘈杂代理，例如分类器和词典的预测。我们为此设置提出了一个估计器，并证明当我们对文本进行调整时，其偏差是有限的。基于这些结果，我们介绍了TextCause，这是一种用于估计语言特性因果影响的算法。该方法利用（1）遥远的监督以提高嘈杂代理的质量，以及（2）预先训练的语言模型（BERT）来调整文本。我们表明，在估计亚马逊审查情感对半模拟销售数字的影响时，所提出的方法优于相关的方法。最后，我们提出了一项适用的案例研究，调查了礼貌对官僚反应时间的影响。

We consider the problem of using observational data to estimate the causal effects of linguistic properties. For example, does writing a complaint politely lead to a faster response time? How much will a positive product review increase sales? This paper addresses two technical challenges related to the problem before developing a practical method. First, we formalize the causal quantity of interest as the effect of a writer's intent, and establish the assumptions necessary to identify this from observational data. Second, in practice, we only have access to noisy proxies for the linguistic properties of interest -- e.g., predictions from classifiers and lexicons. We propose an estimator for this setting and prove that its bias is bounded when we perform an adjustment for the text. Based on these results, we introduce TextCause, an algorithm for estimating causal effects of linguistic properties. The method leverages (1) distant supervision to improve the quality of noisy proxies, and (2) a pre-trained language model (BERT) to adjust for the text. We show that the proposed method outperforms related approaches when estimating the effect of Amazon review sentiment on semi-simulated sales figures. Finally, we present an applied case study investigating the effects of complaint politeness on bureaucratic response times.

下载PDF全文

下载文献需遵守相关版权规定

论文标题