论文标题
使用表格数据时,数据蒸馏方法的新属性
New Properties of the Data Distillation Method When Working With Tabular Data
论文作者
论文摘要
数据蒸馏是减少训练数据的数量,同时仅保留必要的信息的问题。使用此纸,我们将更深入地探索新的数据蒸馏算法,该算法先前设计用于图像数据。我们使用表格数据的实验表明,在蒸馏样品上训练的模型可以优于训练模型的原始数据集。产生数据的考虑算法的问题之一是,对具有不同囊肿的模型的概括较差。我们表明,在蒸馏过程中使用多个体系结构可以帮助克服这个问题。
Data distillation is the problem of reducing the volume oftraining data while keeping only the necessary information. With thispaper, we deeper explore the new data distillation algorithm, previouslydesigned for image data. Our experiments with tabular data show thatthe model trained on distilled samples can outperform the model trainedon the original dataset. One of the problems of the considered algorithmis that produced data has poor generalization on models with differenthyperparameters. We show that using multiple architectures during distillation can help overcome this problem.