在监督学习中嫁给公平性和解释性

论文标题

在监督学习中嫁给公平性和解释性

Marrying Fairness and Explainability in Supervised Learning

论文作者

Grabowicz, Przemyslaw, Perello, Nicholas, Mishra, Aarshee

论文摘要

有助于人类决策的机器学习算法可能会无意间歧视某些受保护的群体。我们将直接歧视形式化为受保护属性对决策的直接因果效应，而诱导的歧视是与受保护属性相关的非保护特征的因果影响的变化。边际直接效应（MDE）和沙普利添加性解释（SHAP）的测量表明，最先进的公平学习方法可以通过合成和现实世界数据集中的关联或反向歧视诱导歧视。为了抑制算法系统中的歧视，我们提议无效保护受保护属性对系统输出的影响，同时保留其余特征的影响。我们介绍和研究实现此类目标的后处理方法，发现它们产生了相对较高的模型准确性，防止直接歧视并减少各种差异度量，例如人口统计学差异。

Machine learning algorithms that aid human decision-making may inadvertently discriminate against certain protected groups. We formalize direct discrimination as a direct causal effect of the protected attributes on the decisions, while induced discrimination as a change in the causal influence of non-protected features associated with the protected attributes. The measurements of marginal direct effect (MDE) and SHapley Additive exPlanations (SHAP) reveal that state-of-the-art fair learning methods can induce discrimination via association or reverse discrimination in synthetic and real-world datasets. To inhibit discrimination in algorithmic systems, we propose to nullify the influence of the protected attribute on the output of the system, while preserving the influence of remaining features. We introduce and study post-processing methods achieving such objectives, finding that they yield relatively high model accuracy, prevent direct discrimination, and diminishes various disparity measures, e.g., demographic disparity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题