论文标题
使用Q-功能的策略迭代:带有乘法噪声的线性动力学
Policy iteration using Q-functions: Linear dynamics with multiplicative noise
论文作者
论文摘要
本文介绍了一种新型的无模型和完全数据驱动的政策迭代方案,用于使用状态和输入 - 刺激性噪声对线性动力学进行二次调节。该实现类似于马尔可夫决策过程的最小二乘时间差异方案,通过解决仪器变量的最小二乘问题来估算Q-功能。通过数值实验将该方案与基于模型的系统识别方案和自然策略梯度进行了比较。
This paper presents a novel model-free and fully data-driven policy iteration scheme for quadratic regulation of linear dynamics with state- and input-multiplicative noise. The implementation is similar to the least-squares temporal difference scheme for Markov decision processes, estimating Q-functions by solving a least-squares problem with instrumental variables. The scheme is compared with a model-based system identification scheme and natural policy gradient through numerical experiments.