可证明使用无监督学习的增强学习的效率探索

论文标题

可证明使用无监督学习的增强学习的效率探索

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

论文作者

Feng, Fei, Wang, Ruosong, Yin, Wotao, Du, Simon S., Yang, Lin F.

论文摘要

由利用无监督学习进行有效探索（RL）问题（Tang2017探索，Bellemare2016unifeing）进行有效探索的范式的动机，我们研究了该范式何时有效。我们研究了情节马尔可夫决策过程，并从少数潜在状态产生了丰富的观察结果。我们提出了一个基于两个组件的一般算法框架：一种无监督的学习算法和一种无需重新编写的表格RL算法。从理论上讲，我们证明，只要不监督的学习算法享有多项式样本复杂性保证，我们就可以在潜在状态数量中找到具有样本复杂性多项式的近乎最佳的策略，这要小于观察的数量。从经验上讲，我们将我们的框架实例化，以证明我们理论的实用性。

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题