元周期性退火时间表：避免荟萃化错误的简单方法

论文标题

元周期性退火时间表：避免荟萃化错误的简单方法

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error

论文作者

Hayashi, Yusuke, Suzuki, Taiji

论文摘要

用少量数据学习新概念的能力是智力的关键方面，这证明对深度学习方法具有挑战性。几次学习的元学习提供了一个潜在的解决方案：通过学习从许多以前的任务中学习跨数据，很少有学习算法可以发现任务之间的结构，以便快速学习新任务。但是，几次学习中的一个关键挑战是任务歧义：即使可以从大量先前的任务中获得强大的先验学习，新任务的小数据集也可能简单地模棱两可，可以为该任务获得单个模型。贝叶斯元学习模型可以通过提出先验的分布来自然解决此问题，并通过贝叶斯决策理论使后验良好。但是，目前已知的贝叶斯元学习程序（例如Versa）遭受所谓的{\ IT信息偏好问题}，即，后验分布被退化为一个点，远离确切的分布。为了应对这一挑战，我们使用{\ it周期退火计划}和{\ IT最大平均差异}（MMD）标准设计了一个新颖的元批准目标。周期性退火时间表非常有效地避免这种退化解决方案。此过程包括一个难以实现的KL-Divergence估计，但是我们通过使用MMD而不是KL-Divergence解决了问题。实验结果表明，我们的方法基本上优于标准元学习算法。

The ability to learn new concepts with small amounts of data is a crucial aspect of intelligence that has proven challenging for deep learning methods. Meta-learning for few-shot learning offers a potential solution to this problem: by learning to learn across data from many previous tasks, few-shot learning algorithms can discover the structure among tasks to enable fast learning of new tasks. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be very ambiguous to acquire a single model for that task. The Bayesian meta-learning models can naturally resolve this problem by putting a sophisticated prior distribution and let the posterior well regularized through Bayesian decision theory. However, currently known Bayesian meta-learning procedures such as VERSA suffer from the so-called {\it information preference problem}, that is, the posterior distribution is degenerated to one point and is far from the exact one. To address this challenge, we design a novel meta-regularization objective using {\it cyclical annealing schedule} and {\it maximum mean discrepancy} (MMD) criterion. The cyclical annealing schedule is quite effective at avoiding such degenerate solutions. This procedure includes a difficult KL-divergence estimation, but we resolve the issue by employing MMD instead of KL-divergence. The experimental results show that our approach substantially outperforms standard meta-learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题