论文标题
使用加固学习对多辆电动汽车充电点进行计算有效的联合协调
Computationally efficient joint coordination of multiple electric vehicle charging points using reinforcement learning
论文作者
论文摘要
当今电网的主要挑战是管理电动汽车(EV)充电的负载增加。需求响应(DR)解决方案旨在利用其中的灵活性,即,随着时间的推移转移EV充电的能力,从而避免过度峰值或实现更好的平衡。尽管大多数现有研究的工作要么专注于单个EV充电器的控制策略,要么使用多步进方法(例如,第一高级别的总控制决策步骤,随后是个别EV控制决策),但我们宁愿提出一个单步解决方案,该解决方案立即共同协调多个充电点。在本文中,我们进一步完善了使用加强学习(RL)的初步建议,专门针对将限制其在实践中部署的计算挑战。更确切地说,我们设计了Markov决策过程(MDP)对EV充电协调过程的公式,仅显示线性空间和时间复杂性(与较早的二次空间复杂性相反)。因此,我们改进了早期的最新最新,证明了使用现实世界中EV充电会话数据减少了30%的培训时间。然而,与启发式政策相比,我们并没有牺牲实现DR目标的结果:我们的新RL解决方案仍然可以提高收费需求协调协调的绩效40-50%(在抵达时完全充电EV)和20-30%(与启发式政策相比(这会统一地传播单独的EV充电)。
A major challenge in todays power grid is to manage the increasing load from electric vehicle (EV) charging. Demand response (DR) solutions aim to exploit flexibility therein, i.e., the ability to shift EV charging in time and thus avoid excessive peaks or achieve better balancing. Whereas the majority of existing research works either focus on control strategies for a single EV charger, or use a multi-step approach (e.g., a first high level aggregate control decision step, followed by individual EV control decisions), we rather propose a single-step solution that jointly coordinates multiple charging points at once. In this paper, we further refine an initial proposal using reinforcement learning (RL), specifically addressing computational challenges that would limit its deployment in practice. More precisely, we design a new Markov decision process (MDP) formulation of the EV charging coordination process, exhibiting only linear space and time complexity (as opposed to the earlier quadratic space complexity). We thus improve upon earlier state-of-the-art, demonstrating 30% reduction of training time in our case study using real-world EV charging session data. Yet, we do not sacrifice the resulting performance in meeting the DR objectives: our new RL solutions still improve the performance of charging demand coordination by 40-50% compared to a business-as-usual policy (that charges EV fully upon arrival) and 20-30% compared to a heuristic policy (that uniformly spreads individual EV charging over time).