从稀疏示范中学习

论文标题

从稀疏示范中学习

Learning from Sparse Demonstrations

论文作者

Jin, Wanxin, Murphey, Todd D., Kulić, Dana, Ezer, Neta, Mou, Shaoshuai

论文摘要

本文开发了连续的蓬松蛋白可区分编程（连续PDP）的方法，该方法使机器人能够从少数稀疏的关键帧中学习一个目标函数。带有一些时间戳记的密钥帧是所需的任务空间输出，机器人有望顺序遵循。密钥帧的时间戳可能与机器人的实际执行时间不同。该方法共同找到了目标函数和一个盘时间巡游函数，以使机器人的轨迹顺序遵循关键帧，并以最小的差异损失。连续的PDP通过有效求解机器人轨迹的梯度相对于未知参数，可以最大程度地减少投影梯度下降的差异损失。该方法首先在模拟机器人臂上进行评估，然后应用于6-DOF四极管，以学习在未建模环境中运动计划的目标函数。结果表明，该方法的效率，其处理密钥帧和机器人执行之间的时间错位的能力以及将客观学习概括为看不见的运动条件。

This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题