论文标题

迈向有效的自动化功能工程

Toward Efficient Automated Feature Engineering

论文作者

Wang, Kafeng, Wang, Pengyang, xu, Chengzhong

论文摘要

自动化功能工程(AFE)是指自动生成并选择用于下游任务的最佳功能集,这在现实世界应用程序中取得了巨大成功。当前的AFE方法主要集中于提高生产特征的有效性,但忽略了大规模部署的低效率问题。因此,在这项工作中,我们提出了一个通用框架来提高AFE的效率。具体来说,我们基于强化学习设置构建AFE管道,每个功能都分配了一个代理来执行功能转换\ com {and}选择,以及下游任务中生成特征的评估得分是更新策略的奖励。我们从两个角度提高了AFE的效率。一方面,我们开发了一个功能预评估(FPE)模型,以减少样本量和特征大小,这是破坏特征评估效率的两个主要因素。另一方面,我们通过在预评估任务上运行FPE来设计一个两阶段的政策培训策略,以此作为避免从头开始培训政策的策略的初始化。我们在分类和回归任务方面对36个数据集进行了全面的实验。结果显示,与最先进的AFE方法相比,$ 2.9 \%$的平均性能和更高的计算效率。

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. Therefore, in this work, we propose a generic framework to improve the efficiency of AFE. Specifically, we construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation \com{and} selection, and the evaluation score of the produced features in downstream tasks serve as the reward to update the policy. We improve the efficiency of AFE in two perspectives. On the one hand, we develop a Feature Pre-Evaluation (FPE) Model to reduce the sample size and feature size that are two main factors on undermining the efficiency of feature evaluation. On the other hand, we devise a two-stage policy training strategy by running FPE on the pre-evaluation task as the initialization of the policy to avoid training policy from scratch. We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks. The results show $2.9\%$ higher performance in average and 2x higher computational efficiency comparing to state-of-the-art AFE methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源