Bert-HLSTMS：视觉讲故事的Bert和分层LSTM

论文标题

Bert-HLSTMS：视觉讲故事的Bert和分层LSTM

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

论文作者

Su, Jing, Dai, Qingyun, Guerin, Frank, Zhou, Mian

论文摘要

视觉讲故事是一项富有创造力且具有挑战性的任务，旨在自动为一系列图像生成类似故事的描述。以前的视觉讲故事方法产生的描述缺乏连贯性，因为它们使用了单词级序列生成方法，并且不能充分考虑句子级别的依赖性。为了解决这个问题，我们提出了一个新颖的层次视觉讲故事框架，该框架分别建模句子级别和文字级别的语义。我们使用基于变压器的BERT获得句子和单词的嵌入。然后，我们采用分层LSTM网络：底部LSTM作为输入句子向量表示，从BERT中学习与图像相对应的句子之间的依赖关系，而顶级LSTM负责生成相应的单词向量表示，从底部LSTM中获取输入。实验结果表明，在自动评估指标和苹果酒下，我们的模型优于最密切相关的基线，并且还显示了我们通过人类评估的方法的有效性。

Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题