通过对抗置信度转移从不完美的演示中学习

论文标题

通过对抗置信度转移从不完美的演示中学习

Learning from Imperfect Demonstrations via Adversarial Confidence Transfer

论文作者

Cao, Zhangjie, Wang, Zihan, Sadigh, Dorsa

论文摘要

现有的从演示算法学习通常会访问专家演示。但是，由于收集到的示范可能是次优甚至由故障案例组成，因此该假设在许多现实世界中都限制。因此，我们通过学习置信度预测指标来研究从不完美的演示中学习的问题。具体而言，我们依靠演示以及他们来自不同的通讯环境（源环境）的置信价值来学习一个置信度预测因素，我们旨在在我们（目标环境中）学习策略（我们只有未标记的演示）。我们通过对对抗性分布的匹配来学习一个常见的潜在空间，以启用置信度的多长长轨迹匹配，以启用置信度跨越源和目标环境的转移。博学的信心重新启动了演示，使能够从内容丰富的演示中学习并丢弃无关的示威。我们在三个模拟环境中进行的实验和达到任务的真正机器人表明，我们的方法学习了预期回报最高的政策。

Existing learning from demonstration algorithms usually assume access to expert demonstrations. However, this assumption is limiting in many real-world applications since the collected demonstrations may be suboptimal or even consist of failure cases. We therefore study the problem of learning from imperfect demonstrations by learning a confidence predictor. Specifically, we rely on demonstrations along with their confidence values from a different correspondent environment (source environment) to learn a confidence predictor for the environment we aim to learn a policy in (target environment -- where we only have unlabeled demonstrations.) We learn a common latent space through adversarial distribution matching of multi-length partial trajectories to enable the transfer of confidence across source and target environments. The learned confidence reweights the demonstrations to enable learning more from informative demonstrations and discarding the irrelevant ones. Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.

下载PDF全文

下载文献需遵守相关版权规定

论文标题