在临床存在下的插定策略：对算法公平的影响

论文标题

在临床存在下的插定策略：对算法公平的影响

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

论文作者

Jeanselme, Vincent, De-Arteaga, Maria, Zhang, Zhe, Barrett, Jessica, Tom, Brian

论文摘要

机器学习风险会加强数据中存在的偏见，并且正如我们在这项工作中所说的那样，在数据中缺乏的情况。在医疗保健中，社会和决策偏见丢失的数据中的形状模式，但是对群体特定遗失的算法公平含义却鲜为人知。我们解决医疗保健失踪的方式可能会对下游算法公平产生不利影响。我们的工作质疑目前的建议和实践旨在处理丢失的数据，重点是它们对算法公平的影响，并提供了前进的途径。具体而言，我们考虑了现有建议的理论基础及其经验预测性能以及通过亚组表现衡量的相应算法公平性。我们的结果表明，当前的处理失踪性的实践缺乏原则上的基础，与医疗保健中缺失机制的现实脱节，并且可以适得其反。例如，我们表明，偏爱特定组的插补策略可能会被误导并加剧预测差异。然后，我们建立在发现的基础上，为经验指导插补选择和随附的报告框架提出一个框架。我们的工作构成了对监管机构和从业者最近努力努力应对现实数据现实并促进机器学习系统负责任和透明部署的重要贡献。我们通过对广泛使用的数据集进行实验来证明所提出的框架的实际实用性，在这里我们展示了所提出的框架如何指导插奖策略的选择，从而使我们能够在产生相等的总体预测性能但呈现不同算法的公平性能的策略中选择。

Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. Our work questions current recommendations and practices aimed at handling missing data with a focus on their effect on algorithmic fairness, and offers a path forward. Specifically, we consider the theoretical underpinnings of existing recommendations as well as their empirical predictive performance and corresponding algorithmic fairness measured through subgroup performances. Our results show that current practices for handling missingness lack principled foundations, are disconnected from the realities of missingness mechanisms in healthcare, and can be counterproductive. For example, we show that favouring group-specific imputation strategy can be misguided and exacerbate prediction disparities. We then build on our findings to propose a framework for empirically guiding imputation choices, and an accompanying reporting framework. Our work constitutes an important contribution to recent efforts by regulators and practitioners to grapple with the realities of real-world data, and to foster the responsible and transparent deployment of machine learning systems. We demonstrate the practical utility of the proposed framework through experimentation on widely used datasets, where we show how the proposed framework can guide the selection of imputation strategies, allowing us to choose among strategies that yield equal overall predictive performance but present different algorithmic fairness properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题