元学习跨任务共享的学习趋势

论文标题

元学习跨任务共享的学习趋势

Meta-learning the Learning Trends Shared Across Tasks

论文作者

Rajasegaran, Jathushan, Khan, Salman, Hayat, Munawar, Khan, Fahad Shahbaz, Shah, Mubarak

论文摘要

元学习代表“学习学习”，因此可以实现对新任务的概括。在这些方法中，基于梯度的元学习算法是一种特定的子类，可快速适应具有有限数据的新任务。这表明了他们获得可转移知识的能力，这是人类学习至关重要的能力。但是，现有的元学习方法仅取决于适应过程中当前的任务信息，并且不共享有关以前如何适应类似任务的元知识。为了解决这一差距，我们提出了一种“路线感知”模型敏捷的元学习方法。具体而言，我们的方法不仅可以学习适应的良好初始化，而且还学习了一种最佳方法，使这些参数适应一组特定于任务的参数，并具有可学习的更新说明，学习率，最重要的是，最重要的是，更新的方式在不同的时间段中发展。与现有的元学习方法相比，我们的方法提供了：（a）在内环的不同时间段学习梯度至关重要的能力，从而对跨任务共享的动态学习行为进行建模，以及（b）通过提供与旧时间步骤的直接渐进连接，从而避免使用跨度的渐进式连接，从而避免使用普通范围，从而汇总学习上下文的能力。从本质上讲，我们的方法不仅学习了可转移的初始化，而且还可以对最佳更新说明，学习率和特定于任务的学习趋势进行建模。具体而言，就学习趋势而言，我们的方法决定了随着任务特定学习的进展以及以前的更新历史记录在当前更新中的有助于，更新方向的形状。我们的方法易于实施，并证明融合更快。我们报告了许多FSL数据集的性能改进。

Meta-learning stands for 'learning to learn' such that generalization to new tasks is achieved. Among these methods, Gradient-based meta-learning algorithms are a specific sub-class that excel at quick adaptation to new tasks with limited data. This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning. However, the existing meta-learning approaches only depend on the current task information during the adaptation, and do not share the meta-knowledge of how a similar task has been adapted before. To address this gap, we propose a 'Path-aware' model-agnostic meta-learning approach. Specifically, our approach not only learns a good initialization for adaptation, it also learns an optimal way to adapt these parameters to a set of task-specific parameters, with learnable update directions, learning rates and, most importantly, the way updates evolve over different time-steps. Compared to the existing meta-learning methods, our approach offers: (a) The ability to learn gradient-preconditioning at different time-steps of the inner-loop, thereby modeling the dynamic learning behavior shared across tasks, and (b) The capability of aggregating the learning context through the provision of direct gradient-skip connections from the old time-steps, thus avoiding overfitting and improving generalization. In essence, our approach not only learns a transferable initialization, but also models the optimal update directions, learning rates, and task-specific learning trends. Specifically, in terms of learning trends, our approach determines the way update directions shape up as the task-specific learning progresses and how the previous update history helps in the current update. Our approach is simple to implement and demonstrates faster convergence. We report significant performance improvements on a number of FSL datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题