基于图形的视觉声音纠缠网络，用于零拍图像识别

论文标题

基于图形的视觉声音纠缠网络，用于零拍图像识别

Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

论文作者

Hu, Yang, Wen, Guihua, Chapman, Adriane, Yang, Pei, Luo, Mingnan, Xu, Yingxue, Dai, Dan, Hall, Wendy

论文摘要

零射击学习使用语义属性来连接看不见对象的搜索空间。近年来，尽管深度卷积网络为ZSL任务带来了强大的视觉建模功能，但其视觉特征具有严重的模式惯性，并且缺乏语义关系的表示，从而导致严重的偏见和歧义。 In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN反馈将图形的信息建立给CNN功能； 2。它使用属性单词向量作为GCN的图形语义建模的目标，它形成了用于图形建模的自洽回归，并监督GCN以学习更多个性化的属性关系； 3。它融合并补充了通过将图形建模改进到视觉嵌入中的层次视觉语义特征。我们的方法通过促进视觉特征的语义链接建模，优于多个代表性ZSL数据集的最先进方法：AWA2，CUB和SUN。

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题