TAMPC：用于逃避陷阱的控制器

论文标题

TAMPC：用于逃避陷阱的控制器

TAMPC: A Controller for Escaping Traps in Novel Environments

论文作者

Zhong, Sheng, Zhang, Zhenyuan, Fazeli, Nima, Berenson, Dmitry

论文摘要

我们在挑战性的混合和不连续的动态案例中提出了一种在线模型适应和控制的方法，在给定控制器下，动作可能导致难以散布的“陷阱”状态。我们首先从随机收集的训练集中学习系统的动态（因为我们不知道会在线遇到哪些陷阱）。这些“名义”动态使我们能够在动态匹配训练数据的情况下执行任务，但是当执行中出现意外陷阱时，我们必须找到一种方法来调整我们的动态和控制策略并继续尝试任务。我们的方法是陷阱感知模型预测控制（TAMPC），是一种两级层次控制算法，其原因是陷阱和非社交动力学来决定目标和恢复策略之间。我们方法的一个重要要求是，即使我们遇到了分布式w.r.t训练数据的数据，也能够识别名义动态。我们通过学习在名义环境中利用不变性的动力学的表示来实现这一目标，从而可以更好地泛化。我们在模拟平面推动和孔洞中评估了我们的方法，以及针对自适应控制，增强学习，陷阱处理基线的真实机器人钉问题，由于我们仅通过接触才能观察到的意外障碍，因此出现了陷阱。我们的结果表明，我们的方法在困难任务上优于基准，并且与更容易的任务的先前陷阱处理方法相媲美。

We propose an approach to online model adaptation and control in the challenging case of hybrid and discontinuous dynamics where actions may lead to difficult-to-escape "trap" states, under a given controller. We first learn dynamics for a system without traps from a randomly collected training set (since we do not know what traps will be encountered online). These "nominal" dynamics allow us to perform tasks in scenarios where the dynamics matches the training data, but when unexpected traps arise in execution, we must find a way to adapt our dynamics and control strategy and continue attempting the task. Our approach, Trap-Aware Model Predictive Control (TAMPC), is a two-level hierarchical control algorithm that reasons about traps and non-nominal dynamics to decide between goal-seeking and recovery policies. An important requirement of our method is the ability to recognize nominal dynamics even when we encounter data that is out-of-distribution w.r.t the training data. We achieve this by learning a representation for dynamics that exploits invariance in the nominal environment, thus allowing better generalization. We evaluate our method on simulated planar pushing and peg-in-hole as well as real robot peg-in-hole problems against adaptive control, reinforcement learning, trap-handling baselines, where traps arise due to unexpected obstacles that we only observe through contact. Our results show that our method outperforms the baselines on difficult tasks, and is comparable to prior trap-handling methods on easier tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题