在机器人动作和语言描述之间学习灵活的翻译

论文标题

在机器人动作和语言描述之间学习灵活的翻译

Learning Flexible Translation between Robot Actions and Language Descriptions

论文作者

Özdemir, Ozan, Kerzel, Matthias, Weber, Cornelius, Lee, Jae Hee, Wermter, Stefan

论文摘要

灵活地处理各种机器人动作语言翻译任务是机器人与人之间自然相互作用的必不可少的要求。以前的方法需要在推理过程中每项任务的模型体系结构的配置更改，这破坏了多任务学习的前提。在这项工作中，我们提出了配对的门控自动编码器（PGAE），以在桌面对象操纵方案中的机器人动作和语言描述之间进行灵活翻译。我们通过将每个动作与包含信号通知翻译方向的信号的适当描述配对，以端到端的方式训练我们的模型。在推断期间，我们的模型可以灵活地从动作转化为语言，反之亦然，根据给定的语言信号。此外，为了选择使用预验证的语言模型作为语言编码器，我们的模型有可能识别看不见的自然语言输入。我们模型的另一个功能是，它可以通过使用机器人演示来识别和模仿另一个代理的动作。该实验结果突出了我们方法的灵活双向翻译能力，并能够推广到对立剂的作用。

Handling various robot action-language translation tasks flexibly is an essential requirement for natural interaction between a robot and a human. Previous approaches require change in the configuration of the model architecture per task during inference, which undermines the premise of multi-task learning. In this work, we propose the paired gated autoencoders (PGAE) for flexible translation between robot actions and language descriptions in a tabletop object manipulation scenario. We train our model in an end-to-end fashion by pairing each action with appropriate descriptions that contain a signal informing about the translation direction. During inference, our model can flexibly translate from action to language and vice versa according to the given language signal. Moreover, with the option to use a pretrained language model as the language encoder, our model has the potential to recognise unseen natural language input. Another capability of our model is that it can recognise and imitate actions of another agent by utilising robot demonstrations. The experiment results highlight the flexible bidirectional translation capabilities of our approach alongside with the ability to generalise to the actions of the opposite-sitting agent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题