论文标题
马尔可夫决策过程中具有乘法奖励作为一般框架的确切解决方案
An exact solution in Markov decision process with multiplicative rewards as a general framework
论文作者
论文摘要
我们使用有限的地平线以及连续的状态和动作空间开发了马尔可夫决策过程的确切解决框架。我们首先回顾了具有线性跃迁和高斯噪声的常规线性二次调节的精确解,其最佳策略不取决于高斯噪声,这在存在明显的噪声的情况下是不希望的。它激发了我们研究取决于噪声的确切解决方案。为此,我们将奖励积累推广为一般的二进制交换和关联操作。通过新的乘法积累,我们获得了一个精确的优化解决方案,假设具有高斯噪声的线性跃迁,最佳策略取决于噪声,这取决于添加剂的积累。此外,我们还表明,乘法方案是一个通用框架,它以任意精度覆盖添加剂,这是独立于模型的原理。
We develop an exactly solvable framework of Markov decision process with a finite horizon, and continuous state and action spaces. We first review the exact solution of conventional linear quadratic regulation with a linear transition and a Gaussian noise, whose optimal policy does not depend on the Gaussian noise, which is an undesired feature in the presence of significant noises. It motivates us to investigate exact solutions which depend on noise. To do so, we generalize the reward accumulation to be a general binary commutative and associative operation. By a new multiplicative accumulation, we obtain an exact solution of optimization assuming linear transitions with a Gaussian noise and the optimal policy is noise dependent in contrast to the additive accumulation. Furthermore, we also show that the multiplicative scheme is a general framework that covers the additive one with an arbitrary precision, which is a model-independent principle.