论文标题
Glister:基于概括的数据子集选择,用于高效且稳健的学习
GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning
论文作者
论文摘要
大规模的机器学习和深层模型非常渴望数据。不幸的是,获得大量标记的数据很昂贵,培训最先进的模型(使用超参数调整)需要大量的计算资源和时间。其次,现实世界的数据嘈杂且不平衡。结果,最近的几篇论文试图使培训过程更加高效和稳健。但是,大多数现有的工作要么侧重于鲁棒性或效率,但并非两者兼而有之。在这项工作中,我们介绍了Glister,这是一种基于概括的数据子集选择,以实现高效且健壮的学习框架。我们将Glister作为混合离散的双级优化问题提出,以选择训练数据的子集,从而最大程度地提高了固定验证集中的对数可能性。接下来,我们提出了一种迭代在线算法侧线,该算法与参数更新一起迭代进行数据选择,并可以应用于任何基于损失的学习算法。然后,我们表明,对于丰富的损失函数,包括跨凝结,铰链损坏,平方损失和逻辑损失,内部离散数据选择是(弱)下义优化的一个实例,我们分析了glister-online降低验证损失和收敛性的条件。最后,我们提出了敏锐的活性,这是批处理主动学习的扩展,并且我们从经验上证明了闪光的表现,包括((a)数据选择以减少训练时间的数据选择,(b)在标签噪声和失衡设置下进行强大的学习,以及(c)使用几种深层模型的批次学习。我们表明,我们的框架在效率和准确性方面都可以改善最新技术(在(a)和(c)的情况下),并且与其他最先进的鲁棒性学习算法相比,在(b)的情况下更有效。
Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduce Glister, a GeneraLIzation based data Subset selecTion for Efficient and Robust learning framework. We formulate Glister as a mixed discrete-continuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set. Next, we propose an iterative online algorithm Glister-Online, which performs data selection iteratively along with the parameter updates and can be applied to any loss-based learning algorithm. We then show that for a rich class of loss functions including cross-entropy, hinge-loss, squared-loss, and logistic-loss, the inner discrete data selection is an instance of (weakly) submodular optimization, and we analyze conditions for which Glister-Online reduces the validation loss and converges. Finally, we propose Glister-Active, an extension to batch active learning, and we empirically demonstrate the performance of Glister on a wide range of tasks including, (a) data selection to reduce training time, (b) robust learning under label noise and imbalance settings, and (c) batch-active learning with several deep and shallow models. We show that our framework improves upon state of the art both in efficiency and accuracy (in cases (a) and (c)) and is more efficient compared to other state-of-the-art robust learning algorithms in case (b).