机器人学习具有崩溃的限制

论文标题

机器人学习具有崩溃的限制

Robot Learning with Crash Constraints

论文作者

Marco, Alonso, Baumann, Dominik, Khadiv, Majid, Hennig, Philipp, Righetti, Ludovic, Trimpe, Sebastian

论文摘要

在过去的十年中，已经证明许多机器学习算法成功地学习了控制真正的机器人系统的最佳政策。但是，随着学习循环的进展，遇到失败的行为是很常见的。具体而言，在失败的机器人应用中，失败是不希望的，但不是灾难性的，许多算法在利用从失败中获得的数据而挣扎。这通常是由（i）过早结束的失败实验引起的，或（ii）所获得的数据稀缺或损坏。两者都使适当奖励功能的设计变得复杂，以惩罚失败。在本文中，我们提出了一个解决这些问题的框架。我们将失败的行为视为违反约束的行为，并通过碰撞约束解决学习问题，而在违反约束时没有获得数据。 NO-DATA情况是由新型GP模型（GPCR）解决的，该模型将离散事件（失败/成功）与连续观察（仅在成功时获得）结合在一起。我们证明了我们的框架对模拟基准的有效性以及在实际跳跃四倍的四足动物上，其中约束阈值未知。直接在实际机器人上通过受约束的贝叶斯优化收集实验数据。我们的结果优于手动调整和GPCR，证明对估计约束阈值有用。

In the past decade, numerous machine learning algorithms have been shown to successfully learn optimal policies to control real robotic systems. However, it is common to encounter failing behaviors as the learning loop progresses. Specifically, in robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. Both complicate the design of proper reward functions to penalize failures. In this paper, we propose a framework that addresses those issues. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints, where no data is obtained upon constraint violation. The no-data case is addressed by a novel GP model (GPCR) for the constraint that combines discrete events (failure/success) with continuous observations (only obtained upon success). We demonstrate the effectiveness of our framework on simulated benchmarks and on a real jumping quadruped, where the constraint threshold is unknown a priori. Experimental data is collected, by means of constrained Bayesian optimization, directly on the real robot. Our results outperform manual tuning and GPCR proves useful on estimating the constraint threshold.

下载PDF全文

下载文献需遵守相关版权规定

论文标题