稳定基线中深Q学习的Q值的演变

论文标题

稳定基线中深Q学习的Q值的演变

Evolution of Q Values for Deep Q Learning in Stable Baselines

论文作者

Andrews, Matthew, Dibek, Cemil, Palyutina, Karina

论文摘要

我们研究了稳定基线库中实施深Q学习（DQL）的Q值的演变。稳定的基线结合了最新的强化学习技术，并在许多游戏环境中实现了超人的性能。但是，对于一些简单的非游戏环境，稳定基线中的DQL可能难以找到正确的动作。在本文中，我们旨在了解这种次优行为可能发生的环境类型，并研究单个状态的Q值的相应演变。我们将智能的交通灯环境（性能很差）与AI Gym Frozenlake环境（性能非常完美）进行了比较。我们观察到DQL在交通光中挣扎，因为动作是可逆的，因此在给定状态下的Q值比在Frozenlake中更接近。然后，我们使用Achiam等人的最新分解技术研究了Q值的演变。我们观察到，对于交通光，函数近似误差和状态之间的复杂关系导致某些Q值蜿蜒曲折的情况远非最佳。

We investigate the evolution of the Q values for the implementation of Deep Q Learning (DQL) in the Stable Baselines library. Stable Baselines incorporates the latest Reinforcement Learning techniques and achieves superhuman performance in many game environments. However, for some simple non-game environments, the DQL in Stable Baselines can struggle to find the correct actions. In this paper we aim to understand the types of environment where this suboptimal behavior can happen, and also investigate the corresponding evolution of the Q values for individual states. We compare a smart TrafficLight environment (where performance is poor) with the AI Gym FrozenLake environment (where performance is perfect). We observe that DQL struggles with TrafficLight because actions are reversible and hence the Q values in a given state are closer than in FrozenLake. We then investigate the evolution of the Q values using a recent decomposition technique of Achiam et al.. We observe that for TrafficLight, the function approximation error and the complex relationships between the states lead to a situation where some Q values meander far from optimal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题