基于Wasserstein的机器学习模型的公平性可解释性框架

论文标题

基于Wasserstein的机器学习模型的公平性可解释性框架

Wasserstein-based fairness interpretability framework for machine learning models

论文作者

Miroshnikov, Alexey, Kotsiopoulos, Konstandinos, Franks, Ryan, Kannan, Arjun Ravi

论文摘要

本文的目的是介绍一个公平的解释性框架，用于测量和解释分布级别的分类和回归模型的偏见。在我们的工作中，我们使用Wasserstein Metric测量了模型输出中亚群分布之间的模型偏差。为了正确量化预测因子的贡献，我们考虑了模型和预测因子相对于非保护类别的青睐。量化是通过使用传输理论来完成的，该理论导致模型偏见的分解和偏见解释对正和负面贡献。为了更深入地了解偏爱的作用并允许偏见解释的添加性，我们可以从合作游戏理论中适应技术。

The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题