论文标题
不确定性感知政策优化:一种强大的自适应信任区域方法
Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach
论文作者
论文摘要
为了使强化学习技术在现实世界决策过程中有用,它们必须能够从有限的数据中产生强大的性能。深度政策优化方法在复杂的任务上取得了令人印象深刻的结果,但是它们的现实领养仍然有限,因为它们通常需要大量的数据才能成功。当与小样本量结合使用时,这些方法由于依赖于高维样品的估计值而导致学习不稳定。在这项工作中,我们开发了控制这些估算引入的不确定性的技术。我们利用这些技术提出了一种深层的政策优化方法,即使数据稀缺,旨在产生稳定的性能。由此产生的算法,不确定性感知信任区域策略优化,生成了适应整个学习过程中存在的不确定性级别的强大策略更新。
In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data. Deep policy optimization methods have achieved impressive results on complex tasks, but their real-world adoption remains limited because they often require significant amounts of data to succeed. When combined with small sample sizes, these methods can result in unstable learning due to their reliance on high-dimensional sample-based estimates. In this work, we develop techniques to control the uncertainty introduced by these estimates. We leverage these techniques to propose a deep policy optimization approach designed to produce stable performance even when data is scarce. The resulting algorithm, Uncertainty-Aware Trust Region Policy Optimization, generates robust policy updates that adapt to the level of uncertainty present throughout the learning process.