论文标题
优化方法辅助合奏深入增强学习算法来解决单位承诺问题
An Optimization Method-Assisted Ensemble Deep Reinforcement Learning Algorithm to Solve Unit Commitment Problems
论文作者
论文摘要
单位承诺(UC)是日益投资市场的基本问题,有效解决UC问题至关重要。加州大学问题通常采用数学优化技术,例如动态编程,拉格朗日放松和混合二次二次编程(MIQP)。但是,这些方法的计算时间随着发电机和能源的数量而以指数级的速度增加,这仍然是行业中的主要瓶颈。人工智能的最新进展证明了加固学习(RL)解决UC问题的能力。不幸的是,当UC问题的大小增加时,现有关于解决RL的UC问题的研究受到维数的诅咒。为了解决这些问题,我们提出了优化方法辅助的集合深钢筋学习算法,其中将UC问题提出为Markov决策过程(MDP),并在集合框架中通过多步进深度学习解决。所提出的算法通过解决量身定制的优化问题来确保相对较高的性能和操作约束的满意度来建立候选行动。 IEEE 118和300总线系统的数值研究表明,我们的算法的表现优于基线RL算法和MIQP。此外,所提出的算法在无法预见的操作条件下显示出强大的概括能力。
Unit commitment (UC) is a fundamental problem in the day-ahead electricity market, and it is critical to solve UC problems efficiently. Mathematical optimization techniques like dynamic programming, Lagrangian relaxation, and mixed-integer quadratic programming (MIQP) are commonly adopted for UC problems. However, the calculation time of these methods increases at an exponential rate with the amount of generators and energy resources, which is still the main bottleneck in industry. Recent advances in artificial intelligence have demonstrated the capability of reinforcement learning (RL) to solve UC problems. Unfortunately, the existing research on solving UC problems with RL suffers from the curse of dimensionality when the size of UC problems grows. To deal with these problems, we propose an optimization method-assisted ensemble deep reinforcement learning algorithm, where UC problems are formulated as a Markov Decision Process (MDP) and solved by multi-step deep Q-learning in an ensemble framework. The proposed algorithm establishes a candidate action set by solving tailored optimization problems to ensure a relatively high performance and the satisfaction of operational constraints. Numerical studies on IEEE 118 and 300-bus systems show that our algorithm outperforms the baseline RL algorithm and MIQP. Furthermore, the proposed algorithm shows strong generalization capacity under unforeseen operational conditions.