论文标题
基于变压器的多元透明特征,用于无监督的人重新识别
Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification
论文作者
论文摘要
从卷积神经网络(CNN)中提取的多元透明特征已经证明了他们在监督人员重新识别(RE-ID)任务中的强大歧视能力。这项工作受到他们的启发,研究了从纯变压器网络中提取多元素功能的方式,以解决无标记但更具挑战性的无监督重新ID问题。为此,我们基于修改后的视觉变压器(VIT)构建双分支网络架构。每个分支中的本地令牌输出被重塑,然后将其均匀分配为多个条纹以生成零件级的特征,而将两个分支的全局标记进行平均以产生全局功能。此外,基于离线相关的摄像头感知代理(O2CAP),这是一种表现最佳的无监督的重新ID方法,我们针对全球和零件级别的特征定义了离线和在线对比学习损失,以进行无监督的学习。对三个人重新ID数据集进行的广泛实验表明,该提出的方法通过相当大的差距优于最先进的无监督方法,从而大大减轻了对受监督的对应物的差距。代码将很快在https://github.com/rikoli/wacv23-workshop-tmgf上找到。
Multi-grained features extracted from convolutional neural networks (CNNs) have demonstrated their strong discrimination ability in supervised person re-identification (Re-ID) tasks. Inspired by them, this work investigates the way of extracting multi-grained features from a pure transformer network to address the unsupervised Re-ID problem that is label-free but much more challenging. To this end, we build a dual-branch network architecture based upon a modified Vision Transformer (ViT). The local tokens output in each branch are reshaped and then uniformly partitioned into multiple stripes to generate part-level features, while the global tokens of two branches are averaged to produce a global feature. Further, based upon offline-online associated camera-aware proxies (O2CAP) that is a top-performing unsupervised Re-ID method, we define offline and online contrastive learning losses with respect to both global and part-level features to conduct unsupervised learning. Extensive experiments on three person Re-ID datasets show that the proposed method outperforms state-of-the-art unsupervised methods by a considerable margin, greatly mitigating the gap to supervised counterparts. Code will be available soon at https://github.com/RikoLi/WACV23-workshop-TMGF.