视觉命名实体链接：一个新数据集和基线

论文标题

视觉命名实体链接：一个新数据集和基线

Visual Named Entity Linking: A New Dataset and A Baseline

论文作者

Sun, Wenxiang, Fan, Yixing, Guo, Jiafeng, Zhang, Ruqing, Cheng, Xueqi

论文摘要

视觉实体链接（VEL）是将图像区域与其相应实体联系起来（KBS）的任务，这对许多计算机视觉任务（例如图像检索，图像标题和视觉问题答案）都是有益的。虽然VEL中的现有任务要么依赖文本数据来补充多模式链接，要么仅将对象与通用实体链接，而通用实体未能执行大量图像数据的命名实体。在本文中，我们考虑了一个纯粹基于视觉的命名实体链接（VNEL）任务，其中输入仅由图像组成。该任务是在图像中识别感兴趣的对象（即视觉实体提及的对象），并将其链接到KBS中的相应命名实体。由于每个实体通常在KB中包含丰富的视觉和文本信息，因此我们提出了三个不同的子任务，即视觉到视觉实体链接（V2VEL），视觉与文本实体链接（V2TEL）以及视觉文本实体链接（V2VTEL）（V2VTEL）。此外，我们还提出了一个高质量的人类通知的视觉人，将数据集链接了名为Wikiperson。基于Wikiperson，我们为每个子任务的解决方案建立了一系列基线算法，并进行实验以验证拟议数据集的质量以及基线方法的有效性。我们设想这项工作有助于在将来招募更多有关Vnel的作品。这些代码和数据集可在https://github.com/ict-bigdatalab/vnel上公开获取。

Visual Entity Linking (VEL) is a task to link regions of images with their corresponding entities in Knowledge Bases (KBs), which is beneficial for many computer vision tasks such as image retrieval, image caption, and visual question answering. While existing tasks in VEL either rely on textual data to complement a multi-modal linking or only link objects with general entities, which fails to perform named entity linking on large amounts of image data. In this paper, we consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image. The task is to identify objects of interest (i.e., visual entity mentions) in images and link them to corresponding named entities in KBs. Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL). In addition, we present a high-quality human-annotated visual person linking dataset, named WIKIPerson. Based on WIKIPerson, we establish a series of baseline algorithms for the solution of each sub-task, and conduct experiments to verify the quality of proposed datasets and the effectiveness of baseline methods. We envision this work to be helpful for soliciting more works regarding VNEL in the future. The codes and datasets are publicly available at https://github.com/ict-bigdatalab/VNEL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题