仔细观察无彩排的持续学习

论文标题

仔细观察无彩排的持续学习

A Closer Look at Rehearsal-Free Continual Learning

论文作者

Smith, James Seale, Tian, Junjiao, Halbe, Shaunak, Hsu, Yen-Chang, Kira, Zsolt

论文摘要

持续学习是机器学习模型从不断转移培训数据中学习新颖概念的环境，同时避免了对先前看到的类的知识退化，这些知识可能会在很长一段时间内从训练数据中消失（这种现象称为灾难性遗忘问题）。当前对单个扩展任务的持续学习的方法（又称班级知识持续学习）需要对先前看到的数据进行广泛的排练，以避免这种知识的退化。不幸的是，排练是有代价的记忆，也可能违反了数据私人关系。取而代之的是，我们以新的方式探索知识蒸馏和参数正则化，以实现强大的持续学习表现而无需排练。具体而言，我们深入研究了常见的持续学习技术：预测蒸馏，特征蒸馏，L2参数正则化和EWC参数正则化。我们首先反驳了一个共同的假设，即参数正则化技术失败了无连续学习，对单个扩展任务的持续学习。接下来，我们探索如何在无彩排的持续学习中从预训练的模型中利用知识，并发现香草L2参数正则正规化优于EWC参数正则化和特征蒸馏。最后，我们探索了最近流行的Imagenet-R基准测试，并表明在VIT变形金刚的自我发挥块中实现的L2参数正则化优于最近流行的持续学习方法的提示。

Continual learning is a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes which may disappear from the training data for extended periods of time (a phenomenon known as the catastrophic forgetting problem). Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a cost to memory, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation, feature distillation, L2 parameter regularization, and EWC parameter regularization. We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task. Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation. Finally, we explore the recently popular ImageNet-R benchmark, and show that L2 parameter regularization implemented in self-attention blocks of a ViT transformer outperforms recent popular prompting for continual learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题