论文标题
DC检查:以数据为中心的AI清单,用于指导可靠的机器学习系统的开发
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems
论文作者
论文摘要
尽管机器学习(ML)取得了许多显着的突破,但大部分重点都放在模型开发上。但是,为了真正实现现实世界中机器学习的潜力,必须在ML管道中考虑其他方面。以数据为中心的AI成为一种统一的范式,可以实现这种可靠的端到端管道。但是,这仍然是一个新生的区域,没有标准化的框架,可以指导从业人员以数据为中心的考虑或传达以数据为中心的以数据为中心的ML系统的设计。为了解决这一差距,我们提出了DC-Check,这是一个可操作的清单式框架,以在ML管道的不同阶段引起以数据为中心的注意事项:数据,培训,测试和部署。以数据为中心的开发镜头旨在在系统开发之前促进周到和透明度。此外,我们重点介绍了以数据为中心的特定AI挑战和研究机会。 DC-Check针对从业人员和研究人员指导日常发展。因此,为了轻松与DC-Check及相关资源互动,我们提供DC-Check Companion网站(https://www.vanderschaar-lab.com/dc-check/)。该网站还将作为方法和工具随着时间的推移而发展。
While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However, this remains a nascent area with no standardized framework to guide practitioners to the necessary data-centric considerations or to communicate the design of data-centric driven ML systems. To address this gap, we propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations at different stages of the ML pipeline: Data, Training, Testing, and Deployment. This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development. Additionally, we highlight specific data-centric AI challenges and research opportunities. DC-Check is aimed at both practitioners and researchers to guide day-to-day development. As such, to easily engage with and use DC-Check and associated resources, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an updated resource as methods and tooling evolve over time.