Kullback-Leibler-季度最佳控制

论文标题

Kullback-Leibler-季度最佳控制

Kullback-Leibler-Quadratic Optimal Control

论文作者

Cammardella, Neil, Bušić, Ana, Meyn, Sean

论文摘要

本文介绍了通过对多代理系统的分布式控制的启发的均值控制方法。控制解决方案基于凸优化问题，其域是一组概率质量函数（PMFS）。主要贡献如下：1。Kullback-Leibler-Quadratic（KLQ）最佳控制是一种特殊情况，其中，目标函数由候选PMF和名义中的Kullback-Leibler发散形式的控制成本组成，以及在Marginals的序列上，以及Quadratic的成本。本文中的理论扩展了对确定性控制系统的先前工作，确定最佳解决方案是标称PMF的指数倾斜。引入了转换技术以降低KLQ解决方案的复杂性，这是由于需要考虑比可靠控制所需的抽样时间更长的时间范围的动机。 2。无限 - 马KLQ导致具有吸引人特性的状态反馈控制解决方案。它可以表示为状态反馈，其中状态是边缘PMF的序列，或获得更容易计算的开放环解决方案。 3。在应用对住宅负载的分布式控制以提供网格服务的应用中，对数值实验进行了调查，类似于公用事业规模的电池存储。结果表明，KLQ最佳控制可以使柔性负载集合的总功耗能够跟踪随时间变化的参考信号，同时确保每个单独的负载都满足其自身的服务质量约束。关键字：平均现场游戏，分布式控制，马尔可夫决策过程，需求调度。

This paper presents approaches to mean-field control, motivated by distributed control of multi-agent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). The main contributions follow: 1. Kullback-Leibler-Quadratic (KLQ) optimal control is a special case, in which the objective function is composed of a control cost in the form of Kullback-Leibler divergence between a candidate pmf and the nominal, plus a quadratic cost on the sequence of marginals. Theory in this paper extends prior work on deterministic control systems, establishing that the optimal solution is an exponential tilting of the nominal pmf. Transform techniques are introduced to reduce complexity of the KLQ solution, motivated by the need to consider time horizons that are much longer than the inter-sampling times required for reliable control. 2. Infinite-horizon KLQ leads to a state feedback control solution with attractive properties. It can be expressed as either state feedback, in which the state is the sequence of marginal pmfs, or an open loop solution is obtained that is more easily computed. 3. Numerical experiments are surveyed in an application of distributed control of residential loads to provide grid services, similar to utility-scale battery storage. The results show that KLQ optimal control enables the aggregate power consumption of a collection of flexible loads to track a time-varying reference signal, while simultaneously ensuring each individual load satisfies its own quality of service constraints. Keywords: Mean field games, distributed control, Markov decision processes, Demand Dispatch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题