论文标题
在临床试验中评估数据偏差
Towards Assessing Data Bias in Clinical Trials
论文作者
论文摘要
算法和技术是遍及我们日常生活的各个方面的重要工具。在过去的几十年中,医疗保健研究受益于新的基于计算机的招聘方法,联合体系结构用于数据存储,引入数据集的创新分析等等。然而,医疗保健数据集仍然会受到数据偏见的影响。由于数据偏见,它们提供了对现实的扭曲视图,从而导致错误的分析结果以及因此决策。例如,在研究心血管疾病风险的临床试验中,由于缺乏对少数民族的数据,预测是错误的。因此,对于研究人员而言,要确认可能存在的数据偏差,最终采用技术来减轻它们并控制结果是否以及如何影响分析结果,这一点至关重要。本文提出了一种解决数据集中偏差的方法:(i)定义数据集中可能存在的数据偏差类型,(ii)用足够的指标来表征和量化数据偏差,(iii)提供了确定,测量和减轻不同数据源的数据偏差的指南。我们提出的方法适用于前瞻性和回顾性临床试验。我们通过理论考虑以及对医疗保健环境中研究人员的访谈评估我们的建议。
Algorithms and technologies are essential tools that pervade all aspects of our daily lives. In the last decades, health care research benefited from new computer-based recruiting methods, the use of federated architectures for data storage, the introduction of innovative analyses of datasets, and so on. Nevertheless, health care datasets can still be affected by data bias. Due to data bias, they provide a distorted view of reality, leading to wrong analysis results and, consequently, decisions. For example, in a clinical trial that studied the risk of cardiovascular diseases, predictions were wrong due to the lack of data on ethnic minorities. It is, therefore, of paramount importance for researchers to acknowledge data bias that may be present in the datasets they use, eventually adopt techniques to mitigate them and control if and how analyses results are impacted. This paper proposes a method to address bias in datasets that: (i) defines the types of data bias that may be present in the dataset, (ii) characterizes and quantifies data bias with adequate metrics, (iii) provides guidelines to identify, measure, and mitigate data bias for different data sources. The method we propose is applicable both for prospective and retrospective clinical trials. We evaluate our proposal both through theoretical considerations and through interviews with researchers in the health care environment.