sgram：通过抽象含义表示形式改进场景图形解析

论文标题

sgram：通过抽象含义表示形式改进场景图形解析

SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation

论文作者

Choi, Woo Suk, Heo, Yu-Jung, Zhang, Byoung-Tak

论文摘要

场景图是结构化的语义表示，可以将图像和文本从图像和文本中建模为图形。基于图像的场景图生成研究一直积极进行，直到最近才进行，而基于文本的场景图生成研究尚未进行。在本文中，我们关注场景图从视觉场景的文本描述中解析的问题。核心思想是使用抽象含义表示（AMR），而不是以前研究中主要使用的依赖性解析。 AMR是一种基于图的自然语言的语义形式主义，它在句子中抽象单词的概念，与依赖解析相反，该句子认为依赖性关系对句子中的所有单词。为此，我们设计了一个简单而有效的两阶段场景图形解析框架，利用抽象含义表示，sgram（通过抽象含义表示的场景图解析）：1）将图像的文本描述转换为AMR图（文本到AMR）（文本到AMR）和2）将AMR图编码到基于变速箱的语言模型中，以生成一个基于变速箱的语言模型（AMR到AMR到AMR到SG）。实验结果表明，我们框架生成的场景图的表现优于基于依赖关系解析的模型11.61 \％，并且使用预训练的变压器语言模型比3.78 \％使用了先前的最新模型。此外，我们将sgram应用于图像检索任务，这是场景图的下游任务之一，并确认我们框架生成的场景图的有效性。

Scene graph is structured semantic representation that can be modeled as a form of graph from images and texts. Image-based scene graph generation research has been actively conducted until recently, whereas text-based scene graph generation research has not. In this paper, we focus on the problem of scene graph parsing from textual description of a visual scene. The core idea is to use abstract meaning representation (AMR) instead of the dependency parsing mainly used in previous studies. AMR is a graph-based semantic formalism of natural language which abstracts concepts of words in a sentence contrary to the dependency parsing which considers dependency relationships on all words in a sentence. To this end, we design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation, SGRAM (Scene GRaph parsing via Abstract Meaning representation): 1) transforming a textual description of an image into an AMR graph (Text-to-AMR) and 2) encoding the AMR graph into a Transformer-based language model to generate a scene graph (AMR-to-SG). Experimental results show the scene graphs generated by our framework outperforms the dependency parsing-based model by 11.61\% and the previous state-of-the-art model using a pre-trained Transformer language model by 3.78\%. Furthermore, we apply SGRAM to image retrieval task which is one of downstream tasks for scene graph, and confirm the effectiveness of scene graphs generated by our framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题