在开放世界环境中的分布意识到的自我训练

论文标题

在开放世界环境中的分布意识到的自我训练

Out-distribution aware Self-training in an Open World Setting

论文作者

Augustin, Maximilian, Hein, Matthias

论文摘要

深度学习在很大程度上取决于大型标记的数据集，从而限制了进一步的改进。尽管无标记的数据大量可用，尤其是在图像识别中，但它不能满足封闭的世界范围内学习的封闭世界假设，即所有未标记的数据都与任务相关。本文的目的是利用开放世界环境中未标记的数据来进一步提高预测性能。为此，我们介绍了外部意识的自我训练，其中包括基于分类器的信心的仔细选择策略。尽管正常的自我训练会恶化预测性能，但我们的迭代方案使用最初标记的数据的数量提高了15倍。此外，我们的分类器是通过设计量分布意识到的，因此可以将与任务相关的输入与无关的输入区分开。

Deep Learning heavily depends on large labeled datasets which limits further improvements. While unlabeled data is available in large amounts, in particular in image recognition, it does not fulfill the closed world assumption of semi-supervised learning that all unlabeled data are task-related. The goal of this paper is to leverage unlabeled data in an open world setting to further improve prediction performance. For this purpose, we introduce out-distribution aware self-training, which includes a careful sample selection strategy based on the confidence of the classifier. While normal self-training deteriorates prediction performance, our iterative scheme improves using up to 15 times the amount of originally labeled data. Moreover, our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题