自动关闭：神经建筑搜索的图像字幕

论文标题

自动关闭：神经建筑搜索的图像字幕

AutoCaption: Image Captioning with Neural Architecture Search

论文作者

Zhu, Xinxin, Wang, Weining, Guo, Longteng, Liu, Jing

论文摘要

图像字幕将复杂的视觉信息转换为抽象的自然语言以进行表示，这可以帮助计算机快速理解世界。但是，由于真实环境的复杂性，它需要识别关键对象并实现其联系，并进一步产生自然语言。整个过程涉及一个视觉理解模块和语言生成模块，这比其他任务给深层神经网络的设计带来了更多的挑战。神经体系结构搜索（NAS）在各种图像识别任务中表现出了重要的作用。此外，RNN在图像字幕任务中起着至关重要的作用。我们引入了一种自动关闭方法，以更好地设计图像字幕的解码器模块，在该模块中我们使用NAS自动设计称为Autornn的解码器模块。我们使用基于共享参数的加固学习方法有效地设计自动设计。自动启动的搜索空间包括两层层和操作之间的连接，并且可以使Autornn Express Express更多的架构。特别是，RNN等同于我们搜索空间的一个子集。 MSCOCO数据集上的实验表明，我们的自动启动模型可以比传统的手工设计方法获得更好的性能。

Image captioning transforms complex visual information into abstract natural language for representation, which can help computers understanding the world quickly. However, due to the complexity of the real environment, it needs to identify key objects and realize their connections, and further generate natural language. The whole process involves a visual understanding module and a language generation module, which brings more challenges to the design of deep neural networks than other tasks. Neural Architecture Search (NAS) has shown its important role in a variety of image recognition tasks. Besides, RNN plays an essential role in the image captioning task. We introduce a AutoCaption method to better design the decoder module of the image captioning where we use the NAS to design the decoder module called AutoRNN automatically. We use the reinforcement learning method based on shared parameters for automatic design the AutoRNN efficiently. The search space of the AutoCaption includes connections between the layers and the operations in layers both, and it can make AutoRNN express more architectures. In particular, RNN is equivalent to a subset of our search space. Experiments on the MSCOCO datasets show that our AutoCaption model can achieve better performance than traditional hand-design methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题