部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

论文作者

Mohtasib, Abdalkarim, Neumann, Gerhard, Cuayahuitl, Heriberto

论文摘要

在现实世界中学习机器人任务仍然是高度挑战性的，有效的实用解决方案仍有待发现。在该领域使用的传统方法是模仿学习和增强学习，但是当应用于真正的机器人时，它们都有局限性。将强化学习与预采用的示范相结合是一种有前途的方法，可以帮助学习控制机器人任务的控制政策。在本文中，我们提出了一种使用新技术来利用离线和在线培训来利用离线专家数据的算法，以获得更快的融合和改善的性能。拟议的算法（AWET）用新颖的代理优势权重对批评损失进行了加权，以改善专家数据。此外，AWET利用自动的早期终止技术来停止和丢弃与专家轨迹不同的策略推出，以防止脱离专家数据。在一项消融研究中，与四个标准机器人任务的最新基线相比，AWET表现出改善和有希望的表现。

Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control policies to solve robotic tasks. In this paper, we propose an algorithm that uses novel techniques to leverage offline expert data using offline and online training to obtain faster convergence and improved performance. The proposed algorithm (AWET) weights the critic losses with a novel agent advantage weight to improve over the expert data. In addition, AWET makes use of an automatic early termination technique to stop and discard policy rollouts that are not similar to expert trajectories -- to prevent drifting far from the expert data. In an ablation study, AWET showed improved and promising performance when compared to state-of-the-art baselines on four standard robotic tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题