基于变压器的多元透明特征，用于无监督的人重新识别

论文标题

基于变压器的多元透明特征，用于无监督的人重新识别

Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification

论文作者

Li, Jiachen, Wang, Menglin, Gong, Xiaojin

论文摘要

从卷积神经网络（CNN）中提取的多元透明特征已经证明了他们在监督人员重新识别（RE-ID）任务中的强大歧视能力。这项工作受到他们的启发，研究了从纯变压器网络中提取多元素功能的方式，以解决无标记但更具挑战性的无监督重新ID问题。为此，我们基于修改后的视觉变压器（VIT）构建双分支网络架构。每个分支中的本地令牌输出被重塑，然后将其均匀分配为多个条纹以生成零件级的特征，而将两个分支的全局标记进行平均以产生全局功能。此外，基于离线相关的摄像头感知代理（O2CAP），这是一种表现最佳的无监督的重新ID方法，我们针对全球和零件级别的特征定义了离线和在线对比学习损失，以进行无监督的学习。对三个人重新ID数据集进行的广泛实验表明，该提出的方法通过相当大的差距优于最先进的无监督方法，从而大大减轻了对受监督的对应物的差距。代码将很快在https://github.com/rikoli/wacv23-workshop-tmgf上找到。

Multi-grained features extracted from convolutional neural networks (CNNs) have demonstrated their strong discrimination ability in supervised person re-identification (Re-ID) tasks. Inspired by them, this work investigates the way of extracting multi-grained features from a pure transformer network to address the unsupervised Re-ID problem that is label-free but much more challenging. To this end, we build a dual-branch network architecture based upon a modified Vision Transformer (ViT). The local tokens output in each branch are reshaped and then uniformly partitioned into multiple stripes to generate part-level features, while the global tokens of two branches are averaged to produce a global feature. Further, based upon offline-online associated camera-aware proxies (O2CAP) that is a top-performing unsupervised Re-ID method, we define offline and online contrastive learning losses with respect to both global and part-level features to conduct unsupervised learning. Extensive experiments on three person Re-ID datasets show that the proposed method outperforms state-of-the-art unsupervised methods by a considerable margin, greatly mitigating the gap to supervised counterparts. Code will be available soon at https://github.com/RikoLi/WACV23-workshop-TMGF.

下载PDF全文

下载文献需遵守相关版权规定

论文标题