蒙版模仿学习：在多模式演示中发现环境不变的方式

论文标题

蒙版模仿学习：在多模式演示中发现环境不变的方式

Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations

论文作者

Hao, Yilun, Wang, Ruinan, Cao, Zhangjie, Wang, Zihan, Cui, Yuchen, Sadigh, Dorsa

论文摘要

多模式演示为机器人提供了大量信息，以使世界有意义。但是，当从人类示威中学习感觉运动控制政策时，这种丰度可能并不总是会导致良好的表现。无关的数据模式可以导致状态过度规格，在该状态中包含的模式不仅可以在决策中无用，而且可以改变跨环境的数据分布。州过度规格会导致诸如学习策略之类的问题，而不是在培训数据分布之外推广。在这项工作中，我们提出了掩盖的模仿学习（MIL），以选择性地使用信息方式来解决状态过度指定。具体来说，我们设计了带有二进制掩码的蒙版策略网络，以阻止某些方式。我们开发了一种双层优化算法，该算法可以学习此面具以准确过滤过度指定的方式。我们从经验上证明，使用Robomimic DataSet，MIL的模拟域中的基线算法均优于模拟域中的基线算法，并有效地在真正的机器人上收集的多模式数据集上有效地恢复了环境不变模式。我们的项目网站在以下网址介绍了我们的结果的补充详细信息和视频：https：//tinyurl.com/masked-il

Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains including MuJoCo and a robot arm environment using the Robomimic dataset, and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Our project website presents supplemental details and videos of our results at: https://tinyurl.com/masked-il

下载PDF全文

下载文献需遵守相关版权规定

论文标题