DIFFML：端到端可区分的ML管道

论文标题

DIFFML：端到端可区分的ML管道

DiffML: End-to-end Differentiable ML Pipelines

论文作者

Hilprecht, Benjamin, Hammacher, Christian, Reis, Eduardo, Abdelaal, Mohamed, Binnig, Carsten

论文摘要

在本文中，我们介绍了称为DIFFML的可区分ML管道的愿景，以自动以端到端的方式自动化ML管道的构建。这个想法是，DIFFML不仅允许共同训练ML模型本身，还可以训练整个管道，包括数据预处理步骤，例如数据清洁，功能选择等。我们的核心思想是以不同的方式制定所有管道步骤，以便可以使用反向批量进行培训整个管道。但是，这是一个非平凡的问题，并打开了许多新的研究问题。为了展示这个方向的可行性，我们演示了最初的想法和一般原则，即如何将典型的预处理步骤（例如数据清洁，特征选择和数据集选择）作为可区分的程序提出，并通过ML模型共同学习。此外，我们讨论了必须系统地应对的研究路线图和核心挑战，以实现完全可区分的ML管道。

In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also the entire pipeline including data preprocessing steps, e.g., data cleaning, feature selection, etc. Our core idea is to formulate all pipeline steps in a differentiable way such that the entire pipeline can be trained using backpropagation. However, this is a non-trivial problem and opens up many new research questions. To show the feasibility of this direction, we demonstrate initial ideas and a general principle of how typical preprocessing steps such as data cleaning, feature selection and dataset selection can be formulated as differentiable programs and jointly learned with the ML model. Moreover, we discuss a research roadmap and core challenges that have to be systematically tackled to enable fully differentiable ML pipelines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题