减少学习困难：基于逆变器的VAR控制的一步两步的深度加固学习

论文标题

减少学习困难：基于逆变器的VAR控制的一步两步的深度加固学习

Reducing Learning Difficulties: One-Step Two-Critic Deep Reinforcement Learning for Inverter-based Volt-Var Control

论文作者

Liu, Qiong, Guo, Ye, Deng, Lirong, Liu, Haotian, Li, Dongyu, Sun, Hongbin, Huang, Wenqi

论文摘要

本文提出了一种基于逆变器的Volt-VAR控制（IB-VVC）的一步两级深度加固学习（OSTC-DRL）方法。首先，考虑IB-VVC可以作为单周期优化问题提出，我们将IB-VVC作为一个步骤马尔可夫决策过程而不是标准的Markov决策过程，从而简化了DRL学习任务。然后，我们设计了一步演员 - 批评的DRL方案，该方案是最近DRL算法的简化版本，它避免了Q值超过估计的问题。此外，考虑VVC的两个目标：最大程度地减少功率损失并消除违反电压，我们利用两个批评家分别近似两个目标的回报。它简化了每个评论家的近似任务，并避免了评论家学习过程中两个目标之间的相互作用效果。 OSTC-DRL方法集成了单步角色批判性DRL方案和两批评技术。基于OSTC-DRL，我们设计了两种集中式DRL算法。此外，我们将OSTC-DRL扩展到分散的IB-VVC的多代理OSTC-DRL，并设计两个多代理DRL算法。模拟表明，所提出的OSTC-DRL具有更快的收敛速度和更好的控制性能，并且多代理OSTC-DRL对于分散的IB-VVC问题很好地效果很好。

A one-step two-critic deep reinforcement learning (OSTC-DRL) approach for inverter-based volt-var control (IB-VVC) in active distribution networks is proposed in this paper. Firstly, considering IB-VVC can be formulated as a single-period optimization problem, we formulate the IB-VVC as a one-step Markov decision process rather than the standard Markov decision process, which simplifies the DRL learning task. Then we design the one-step actor-critic DRL scheme which is a simplified version of recent DRL algorithms, and it avoids the issue of Q value overestimation successfully. Furthermore, considering two objectives of VVC: minimizing power loss and eliminating voltage violation, we utilize two critics to approximate the rewards of two objectives separately. It simplifies the approximation tasks of each critic, and avoids the interaction effect between two objectives in the learning process of critic. The OSTC-DRL approach integrates the one-step actor-critic DRL scheme and the two-critic technology. Based on the OSTC-DRL, we design two centralized DRL algorithms. Further, we extend the OSTC-DRL to multi-agent OSTC-DRL for decentralized IB-VVC and design two multi-agent DRL algorithms. Simulations demonstrate that the proposed OSTC-DRL has a faster convergence rate and a better control performance, and the multi-agent OSTC-DRL works well for decentralized IB-VVC problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题