食品图像和烹饪食谱的学习计划表示

论文标题

食品图像和烹饪食谱的学习计划表示

Learning Program Representations for Food Images and Cooking Recipes

论文作者

Papadopoulos, Dim P., Mora, Enrique, Chepurko, Nadiia, Huang, Kuan Wei, Ofli, Ferda, Torralba, Antonio

论文摘要

在本文中，我们有兴趣建模如何具有有意义且丰富的高级代表性，例如烹饪配方，例如烹饪食谱。具体来说，我们建议代表烹饪食谱和食物图像作为烹饪计划。程序提供了任务的结构化表示形式，以图形形式捕获了动作的烹饪语义和顺序关系。这使他们可以轻松地由用户操纵并由代理执行。为此，我们构建了一个经过训练的模型，该模型通过自学人员进行了学习食谱和食物图像之间的关节嵌入，并共同从该嵌入的程序中共同生成一个序列的程序。为了验证我们的想法，我们众包烹饪食谱计划，并表明：（a）将图像型嵌入式嵌入到程序中，可以带来更好的跨模式检索结果；（b）与预测原始烹饪说明相比，从图像中生成程序的识别结果更好；（c）我们可以通过优化GAN的潜在代码来操纵程序来生成食物图像。代码，数据和模型可在线提供。

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题