静脉注：可解释的政策证书的逆价估计

论文标题

静脉注：可解释的政策证书的逆价估计

IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates

论文作者

Lopez-Guevara, Tatiana, Burke, Michael, Taylor, Nicholas K., Subr, Kartic

论文摘要

无模型增强学习（RL）是学习广泛的机器人技能和政策的强大工具。但是，缺乏政策可解释性可以抑制其在下游应用程序中的成功部署，尤其是当环境条件上的差异可能导致不可预测的行为或概括失败时。结果，在机器学习中越来越重视在模型中包含更强的诱导偏见以改善概括。本文提出了一种替代策略，即可解释的政策证书（IV-Posterior）的逆价估计，该策略旨在确定预先培训的政策已经持有的归纳偏见或理想化的操作条件，然后使用此信息来指导其部署。 IV-Posterior使用MaskEdautoreRexression Flow来拟合分布，以在策略可能有效的一组条件或环境参数上。然后，此分布可以用作下游应用程序中的策略证书。我们说明了在两个环境中使用静脉输液剂的使用，并表明当策略选择结合了这些政策所具有的归纳偏见的知识时，可以获得可观的性能提高。

Model-free reinforcement learning (RL) is a powerful tool to learn a broad range of robot skills and policies. However, a lack of policy interpretability can inhibit their successful deployment in downstream applications, particularly when differences in environmental conditions may result in unpredictable behaviour or generalisation failures. As a result, there has been a growing emphasis in machine learning around the inclusion of stronger inductive biases in models to improve generalisation. This paper proposes an alternative strategy, inverse value estimation for interpretable policy certificates (IV-Posterior), which seeks to identify the inductive biases or idealised conditions of operation already held by pre-trained policies, and then use this information to guide their deployment. IV-Posterior uses MaskedAutoregressive Flows to fit distributions over the set of conditions or environmental parameters in which a policy is likely to be effective. This distribution can then be used as a policy certificate in downstream applications. We illustrate the use of IV-Posterior across a two environments, and show that substantial performance gains can be obtained when policy selection incorporates knowledge of the inductive biases that these policies hold.

下载PDF全文

下载文献需遵守相关版权规定

论文标题