从人类定向纠正中学习

论文标题

从人类定向纠正中学习

Learning from Human Directional Corrections

论文作者

Jin, Wanxin, Murphey, Todd D., Lu, Zehui, Mou, Shaoshuai

论文摘要

本文提出了一种新颖的方法，该方法使机器人能够从人类方向校正中逐步学习目标函数。现有方法从人类的校正中学习；由于人类需要仔细选择每种校正的大小，因此这些方法很容易导致过度纠正和学习效率低下。所提出的方法仅需要人为校正 - 仅表示输入变化的方向而不指示其大小的校正。我们仅假设每个校正，无论其幅度如何，都指向一个方向，该方向相对于未知的目标函数改善了机器人的当前运动。满足此假设的允许校正是输入空间的一半的允许校正，而不是必须位于收缩级别集合的幅度校正。对于每个定向校正，提出的方法基于切割平面方法更新了目标函数的估计，该方法具有几何解释。我们已经建立了理论结果，以显示学习过程的融合。所提出的方法已在数值示例中进行了测试，两个人类机器人游戏的用户研究以及一个现实世界中的四极管实验。结果证实了该方法的收敛性，并进一步表明该方法比最先进的机器人学习框架更有效（更高的成功率），高效/毫无疑问（所需的人类更正）以及可能更易于访问（更少的早期浪费试验）。

This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections -- corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题