即时学习的最佳控制：未知漂移的系统

论文标题

即时学习的最佳控制：未知漂移的系统

Optimal Control with Learning on the Fly: System with Unknown Drift

论文作者

Gurevich, Daniel, Goswami, Debdipta, Fefferman, Charles L., Rowley, Clarence W.

论文摘要

本文为具有恒定漂移和加性控制输入的简单随机动力系统提供了最佳控制策略。由物理系统的示例具有意外变化的动态变化，我们将漂移参数以未知，因此必须在控制系统时学习。通过具有高斯噪声的线性观察模型观察系统的状态。与大多数以前的工作相反，该工作集中在无限时间范围内控制器的渐近性能，我们在有限的时间范围内最大程度地减少了二次成本函数。通过将其成本与具有完全了解参数了解的最佳控制器产生的成本进行比较，可以量化我们的控制策略的性能。这种方法引起了几种“遗憾”的概念。我们得出了一套控制策略，可将最大的遗憾最小化；这些来自贝叶斯策略，这些策略在漂移参数上假定特定的固定先验。这项工作表明，研究贝叶斯策略可能会导致具有未知参数的更大类别的现实动力学模型，从而导致最佳或近乎最佳的控制策略。

This paper derives an optimal control strategy for a simple stochastic dynamical system with constant drift and an additive control input. Motivated by the example of a physical system with an unexpected change in its dynamics, we take the drift parameter to be unknown, so that it must be learned while controlling the system. The state of the system is observed through a linear observation model with Gaussian noise. In contrast to most previous work, which focuses on a controller's asymptotic performance over an infinite time horizon, we minimize a quadratic cost function over a finite time horizon. The performance of our control strategy is quantified by comparing its cost with the cost incurred by an optimal controller that has full knowledge of the parameters. This approach gives rise to several notions of "regret." We derive a set of control strategies that provably minimize the worst-case regret; these arise from Bayesian strategies that assume a specific fixed prior on the drift parameter. This work suggests that examining Bayesian strategies may lead to optimal or near-optimal control strategies for a much larger class of realistic dynamical models with unknown parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题