关于自动数据科学语义的调查

论文标题

关于自动数据科学语义的调查

A Survey on Semantics in Automated Data Science

论文作者

Khurana, Udayan, Srinivas, Kavitha, Samulowitz, Horst

论文摘要

数据科学家利用常识推理和领域知识来理解和丰富构建预测模型的数据。近年来，我们目睹了{\ em自动化机器学习}的工具和技术激增。尽管数据科学家可以采用各种此类工具来帮助建立模型，但许多其他方面（例如{\ em特征工程}，需要对概念的语义理解，在很大程度上仍然是手动的。在本文中，我们讨论了当前自动数据科学解决方案和机器学习的重要缺点。我们讨论如何利用基本的语义推理与数据科学自动化的新工具相结合，可以帮助一致，可解释的数据增强和转换。此外，语义可以通过与{\ em Trust}，{\ em bias}和{\ em Explionability}相关的挑战来以新的方式帮助数据科学家。

Data Scientists leverage common sense reasoning and domain knowledge to understand and enrich data for building predictive models. In recent years, we have witnessed a surge in tools and techniques for {\em automated machine learning}. While data scientists can employ various such tools to help with model building, many other aspects such as {\em feature engineering} that require semantic understanding of concepts, remain manual to a large extent. In this paper we discuss important shortcomings of current automated data science solutions and machine learning. We discuss how leveraging basic semantic reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation. Moreover, semantics can assist data scientists in a new manner by helping with challenges related to {\em trust}, {\em bias}, and {\em explainability}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题