论文标题
$ Q $ -Munchausen增强学习
$q$-Munchausen Reinforcement Learning
论文作者
论文摘要
最近成功的Munchausen强化学习(M-RL)通过与当前随机策略的对数增强奖励功能,以隐式Kullback-Leibler(KL)正则化。尽管Boltzmann SoftMax政策已经显示出显着的改进,但是当考虑到Tsallis Sparsemax政策时,增强导致几乎所有考虑的问题都会导致平坦的学习曲线。我们表明,这是由于传统对数与tsallis熵的非同伴(广义)性质之间的不匹配所致。从Tsallis统计文献中汲取灵感,我们建议借助$ q $ -logarithm/指数函数纠正M-RL的不匹配。提出的配方导致在最大Tsallis熵框架下隐式TSALLIS KL正则化。我们显示了M-RL的这种表述再次在基准问题上取得了出色的性能,并用各种熵索引$ Q $阐明了更通用的M-RL。
The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy. Though significant improvement has been shown with the Boltzmann softmax policy, when the Tsallis sparsemax policy is considered, the augmentation leads to a flat learning curve for almost every problem considered. We show that it is due to the mismatch between the conventional logarithm and the non-logarithmic (generalized) nature of Tsallis entropy. Drawing inspiration from the Tsallis statistics literature, we propose to correct the mismatch of M-RL with the help of $q$-logarithm/exponential functions. The proposed formulation leads to implicit Tsallis KL regularization under the maximum Tsallis entropy framework. We show such formulation of M-RL again achieves superior performance on benchmark problems and sheds light on more general M-RL with various entropic indices $q$.