贝叶斯干扰注入：强大的模仿机器人操纵弹性政策的学习

论文标题

贝叶斯干扰注入：强大的模仿机器人操纵弹性政策的学习

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation

论文作者

Oh, Hanbit, Sasaki, Hikaru, Michael, Brendan, Matsubara, Takamitsu

论文摘要

人类在执行任务时会表现出各种有趣的行为特征，例如在看似等效的最佳动作，从最佳轨迹偏离最佳轨迹时执行恢复动作，或者以响应感知的风险来调节动作。但是，模仿学习，试图教机器人从人类示威的观察中执行相同的任务，通常无法捕捉这种行为。具体而言，通常使用的学习算法体现了学习假设（例如，单个最佳动作）与实际人类行为（例如，多重最佳动作）之间固有的矛盾，从而限制了机器人的普遍性，适用性和示范性的可行性。为了解决这个问题，本文建议设计模仿学习算法，重点是利用人类的行为特征，从而体现了捕获和利用实际演示者行为特征的原理。本文介绍了第一个模仿学习框架，即贝叶斯干扰注射（BDI），它通过结合模型的灵活性，鲁棒化和风险敏感性来代表人类的行为特征。贝叶斯推论用于学习灵活的非参数多动作策略，同时通过注入风险敏感的干扰来诱导人类的恢复行动并确保证明可行性，同时鲁棒性策略。我们的方法是通过使用UR5E 6-DOF机器人臂的风险敏感模拟和实体机器人实验（例如，扫描任务，轴上到轴的任务和轴插入任务）来评估的，以证明行为的改善表征。结果通过提高灵活性，鲁棒性和演示可行性显示了任务绩效的显着改善。

Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题