关于灾难性遗忘的挑战的共同假设

论文标题

关于灾难性遗忘的挑战的共同假设

Challenging Common Assumptions about Catastrophic Forgetting

论文作者

Lesort, Timothée, Ostapenko, Oleksiy, Misra, Diganta, Arefin, Md Rifat, Rodríguez, Pau, Charlin, Laurent, Rish, Irina

论文摘要

建立可以逐步学习和积累知识的学习者是持续学习（CL）研究领域的核心目标。不幸的是，对新数据培训模型通常会损害过去数据的性能。在CL文献中，这种效果被称为灾难性遗忘（CF）。 CF已在很大程度上进行了研究，并且已经提出了许多方法来以简短的非重叠任务序列解决。在这样的设置中，CF始终导致过去任务中的性能快速下降。然而，尽管CF，最近的工作表明，线性模型的SGD培训在CL回归设置中积累了知识。当任务重新发生时，这种现象变得尤为明显。然后，我们可能会怀疑是否接受过SGD培训的DNN或任何基于标准梯度的优化都以这种方式积累了知识。这种现象将对真实的持续场景应用DNN会产生有趣的后果。实际上，基于标准的基于梯度的优化方法的计算在计算上明显低于现有CL算法。在本文中，我们研究了通过基于梯度的算法训练的DNN中的渐进知识积累（KA），这些算法是长期通过数据重新出现的任务。我们提出了一个新的框架，即Scole（扩展持续学习），以研究KA并发现灾难性遗忘对接受SGD训练的DNN的影响有限。当通过稀疏重新出现数据的长序列进行长序列训练时，总体准确性会提高，考虑到CF现象，这可能是违反直觉的。我们在各种数据出现频率下经验研究了DNN中的KA，并提出了简单且可扩展的策略，以增加DNN中知识的积累。

Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research field. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to address it on short sequences of non-overlapping tasks. In such setups, CF always leads to a quick and significant drop in performance in past tasks. Nevertheless, despite CF, recent work showed that SGD training on linear models accumulates knowledge in a CL regression setup. This phenomenon becomes especially visible when tasks reoccur. We might then wonder if DNNs trained with SGD or any standard gradient-based optimization accumulate knowledge in such a way. Such phenomena would have interesting consequences for applying DNNs to real continual scenarios. Indeed, standard gradient-based optimization methods are significantly less computationally expensive than existing CL algorithms. In this paper, we study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence. We propose a new framework, SCoLe (Scaling Continual Learning), to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD. When trained on long sequences with data sparsely re-occurring, the overall accuracy improves, which might be counter-intuitive given the CF phenomenon. We empirically investigate KA in DNNs under various data occurrence frequencies and propose simple and scalable strategies to increase knowledge accumulation in DNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题