论文标题
加权最大熵逆增强学习
Weighted Maximum Entropy Inverse Reinforcement Learning
论文作者
论文摘要
我们研究逆增强学习(IRL)和模仿学习(IM),这是从专家所证明的轨迹中恢复奖励或政策功能的问题。我们提出了一种通过在最大的熵框架中添加权重功能来改善学习过程的新方法,并具有学习和恢复专家政策的随机性(或有限理性)的动机。我们的框架和算法允许学习奖励(或策略)功能以及添加到马尔可夫决策过程中的熵条款的结构,从而增强了学习过程。我们使用人类和模拟演示的数值实验以及通过离散和连续的IRL/IM任务表明,我们的方法表现优于先前的算法。
We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a weight function to the maximum entropy framework, with the motivation of having the ability to learn and recover the stochasticity (or the bounded rationality) of the expert policy. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the learning procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL/IM tasks show that our approach outperforms prior algorithms.