Hirl：层次图像表示学习的一般框架学习

论文标题

Hirl：层次图像表示学习的一般框架学习

HIRL: A General Framework for Hierarchical Image Representation Learning

论文作者

Xu, Minghao, Guo, Yuanfan, Zhu, Xuanyu, Li, Jiawen, Sun, Zhenbang, Tang, Jian, Xu, Yi, Ni, Bingbing

论文摘要

对学习自我监督的图像表示已进行了广泛的研究，以增强各种视觉理解任务。现有方法通常学习单个图像语义级别，例如成对语义相似性或图像聚类模式。但是，这些方法几乎无法捕获图像数据集中自然存在的多个语义信息，例如，在物种图像数据库中编码的“波斯猫到猫到哺乳动物”的语义层次结构。因此，尚不清楚任意图像自我监督学习（SSL）方法是否可以从学习此类层次语义中受益。为了回答这个问题，我们为分层图像表示学习（HIRL）提出了一个一般框架。该框架旨在学习每个图像的多个语义表示形式，这些表示形式的结构是编码从细粒到粗粒的图像语义。基于概率分解，Hirl通过现成的图像SSL方法学习了最细粒度的语义，并通过一种新颖的语义路径歧视方案来学习多个粗粒语义。我们采用六种代表性图像SSL方法作为基准，并研究它们在Hirl下的表现。通过严格的公平比较，在所有六种方法的下游任务上都可以观察到绩效增长，这首先验证了学习层次图像语义的总体有效性。所有源代码和型号的权重均在https://github.com/hirl-team/hirl上获得

Learning self-supervised image representations has been broadly studied to boost various visual understanding tasks. Existing methods typically learn a single level of image semantics like pairwise semantic similarity or image clustering patterns. However, these methods can hardly capture multiple levels of semantic information that naturally exists in an image dataset, e.g., the semantic hierarchy of "Persian cat to cat to mammal" encoded in an image database for species. It is thus unknown whether an arbitrary image self-supervised learning (SSL) approach can benefit from learning such hierarchical semantics. To answer this question, we propose a general framework for Hierarchical Image Representation Learning (HIRL). This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained. Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme. We adopt six representative image SSL methods as baselines and study how they perform under HIRL. By rigorous fair comparison, performance gain is observed on all the six methods for diverse downstream tasks, which, for the first time, verifies the general effectiveness of learning hierarchical image semantics. All source code and model weights are available at https://github.com/hirl-team/HIRL

下载PDF全文

下载文献需遵守相关版权规定

论文标题