论文标题
迈向统一的增强学习结构:一种优化方法
Towards an Unified Structure for Reinforcement Learning: an Optimization Approach
论文作者
论文摘要
最佳值函数和最佳策略都可以根据Bellman方程确定的二元性来建模最佳控制器。即使使用这种二元性,也没有参数模型能够使用公共参数集输出策略和价值函数。在本文中,提出了一个带有参数优化问题的统一结构。该结构模拟的策略和值函数共享所有参数,这使得在继续学习的同时可以在增强算法之间进行无缝切换。详细介绍了基于提议的结构的Q学习和政策梯度。基于该结构的参与者批评算法,其演员和评论家均由线性和非线性控制验证。
Both the optimal value function and the optimal policy can be used to model an optimal controller based on the duality established by the Bellman equation. Even with this duality, no parametric model has been able to output both policy and value function with a common parameter set. In this paper, a unified structure is proposed with a parametric optimization problem. The policy and the value function modelled by this structure share all parameters, which enables seamless switching among reinforcement learning algorithms while continuing to learn. The Q-learning and policy gradient based on the proposed structure is detailed. An actor-critic algorithm based on this structure, whose actor and critic are both modelled by the same parameters, is validated by both linear and nonlinear control.