自己设备的囚犯：模型如何引起性能预测中的数据偏见

论文标题

自己设备的囚犯：模型如何引起性能预测中的数据偏见

Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction

论文作者

Pombal, José, Saleiro, Pedro, Figueiredo, Mário A. T., Bizarro, Pedro

论文摘要

机器学习算法从数据中学习模式的无与伦比的能力也使它们能够融合嵌入的偏差。然后，有偏见的模型可以做出不成比例地损害社会中某些群体的决定。在静态ML环境中，大多数工作都专门用于衡量不公平性，而不是在大多数现实世界中使用的动态性，表现性预测中的不公平性。在后者中，预测模型本身在塑造数据的分布中起着关键作用。但是，很少注意将不公平与这些互动联系起来。因此，为了进一步理解这些环境中的不公平性，我们提出了一种分类法，以表征数据中的偏见，并研究其由模型行为塑造的案例。以现实世界的开头欺诈检测案例研究为例，我们研究了表现性预测中两个典型偏见的性能和公平性的危险：分配变化以及选择性标签的问题。

The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in shaping the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题