协作变压器，用于扎根状况认可

论文标题

协作变压器，用于扎根状况认可

Collaborative Transformers for Grounded Situation Recognition

论文作者

Cho, Junhyeong, Yoon, Youngseok, Kwak, Suha

论文摘要

扎根的情况识别是预测主要活动，实体在活动中扮演某些角色的实体以及给定图像中实体的边界框基础。为了有效地处理这项具有挑战性的任务，我们介绍了一种新颖的方法，其中两个用于活动分类和实体估计的过程是互动和互补的。为了实现这一想法，我们提出了由两个模块组成的协作Glance凝视变压器（Coformer）：用于活动分类的Glance Transformer和实体估计的凝视变压器。 Glance Transformer在分析实体及其关系的凝视变压器的帮助下预测了主要活动，而凝视变压器仅通过仅专注于与Glance Transformer预测的活动相关的实体来估计扎根实体。我们的coformer在SWIG数据集上的所有评估指标中都达到了最新的现状。培训代码和模型权重可从https://github.com/jhcho99/coformer获得。

Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. To implement this idea, we propose Collaborative Glance-Gaze TransFormer (CoFormer) that consists of two modules: Glance transformer for activity classification and Gaze transformer for entity estimation. Glance transformer predicts the main activity with the help of Gaze transformer that analyzes entities and their relations, while Gaze transformer estimates the grounded entities by focusing only on the entities relevant to the activity predicted by Glance transformer. Our CoFormer achieves the state of the art in all evaluation metrics on the SWiG dataset. Training code and model weights are available at https://github.com/jhcho99/CoFormer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题