通过占位持有人学习原型以零拍识别

论文标题

通过占位持有人学习原型以零拍识别

Learning Prototype via Placeholder for Zero-shot Recognition

论文作者

Yang, Zaiquan, Liu, Yang, Xu, Wenjia, Huang, Chong, Zhou, Lei, Tong, Chao

论文摘要

零拍学习（ZSL）旨在通过利用所见类和看不见的类之间共享的语义描述来识别看不见的类。当前的方法表明，通过将语义嵌入将视觉空间投射到视觉空间中是类原型，学习视觉语义对齐是有效的。但是，这样的投影函数仅与可见的类有关。当应用于看不见的类别时，原型通常由于域移位而次优。在本文中，我们建议通过称为LPL的占位符学习原型，以消除可见阶层和看不见的阶级之间的域转移。具体来说，我们将看到的课程结合在一起，以使新课程成为视觉和语义空间中看不见的班级的占位符。占位持有人放置在可见的班级之间，鼓励观察类的原型被高度分散。插入良好的看不见的空间也可以保留更多的空间。从经验上讲，分离良好的原型有助于抵消由域转移引起的视觉声音错位。此外，我们利用一种新颖的面向语义的微调来保证占位符的语义可靠性。在五个基准数据集上进行的广泛实验表明，LPL在最新方法上的显着性能提高。代码可在https://github.com/zaiquanyang/lpl上找到。

Zero-shot learning (ZSL) aims to recognize unseen classes by exploiting semantic descriptions shared between seen classes and unseen classes. Current methods show that it is effective to learn visual-semantic alignment by projecting semantic embeddings into the visual space as class prototypes. However, such a projection function is only concerned with seen classes. When applied to unseen classes, the prototypes often perform suboptimally due to domain shift. In this paper, we propose to learn prototypes via placeholders, termed LPL, to eliminate the domain shift between seen and unseen classes. Specifically, we combine seen classes to hallucinate new classes which play as placeholders of the unseen classes in the visual and semantic space. Placed between seen classes, the placeholders encourage prototypes of seen classes to be highly dispersed. And more space is spared for the insertion of well-separated unseen ones. Empirically, well-separated prototypes help counteract visual-semantic misalignment caused by domain shift. Furthermore, we exploit a novel semantic-oriented fine-tuning to guarantee the semantic reliability of placeholders. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of LPL over the state-of-the-art methods. Code is available at https://github.com/zaiquanyang/LPL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题