可解释的期权发现使用深度Q学习和变异自动编码器

论文标题

可解释的期权发现使用深度Q学习和变异自动编码器

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

论文作者

Andersen, Per-Arne, Granmo, Ole-Christoffer, Goodwin, Morten

论文摘要

无疑是一个强大的框架，可以在各种学科中训练自主代理。但是，传统的深层且无模型的RL算法的样本效率低，稀疏状态空间的概括不足。带有时间抽象的选项框架可能是解决这些问题的最有前途的方法，但仍然存在明显的缺点。它仅保证当地的融合，并且在实际上是手工制作的启动和终止条件的自动化和终止条件的挑战。我们的建议，深层的Q-Network（DVQN）结合了深层生成和增强学习。该算法从高斯分布式潜在空间中找到了良好的政策，这对于定义选项特别有用。 DVQN算法将MSE使用KL-Divergence作为正则化，并结合传统的Q学习更新。该算法学习了一个潜在空间，该潜在空间代表具有选项的状态簇的良好政策。我们表明，DVQN算法是确定基于期权的增强学习的启动和终止条件的有前途的方法。实验表明，具有自动启动和终止的DVQN算法的性能与彩虹相当，并且在收敛后长时间训练时可以保持稳定性。

Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题