论文标题
用于弱耦合随机动态程序的整数编程以及部分信息
Integer programming for weakly coupled stochastic dynamic programs with partial information
论文作者
论文摘要
本文介绍了有关决策者必须控制由多个组件组成的系统的问题,并且只能访问每个组件状态的部分信息。由于部分观察结果,并且由于成分数量增加时出现的维度诅咒,因此这些问题很困难。引入了部分可观察到的马尔可夫决策过程(POMDP)来应对第一个挑战,而弱耦合的随机动态程序则解决了第二个挑战。从文献的这两个分支中汲取灵感,我们介绍了弱耦合POMDP的概念。目的是找到一项政策,最大程度地提高有限视野的预期奖励。我们的算法依赖两种成分。第一个可以独立使用,是用于计算最佳无内存策略的通用POMDP的混合整数线性公式。基于对随机变量之间依赖性的概率解释的有效削减的有效削减,其线性放松可以加强该公式,其线性放松实际上在最佳依赖历史依赖性策略的价值上提供了一个紧密的上限。第二个是数学编程公式和算法的集合,这些制定和算法为弱耦合的POMDP提供了可拖动的策略和上限。拉格朗日放松,流体近似以及几乎确定的约束放松,可以打破维度的诅咒。我们在基准实例上测试了通用POMDPS公式,形成了文献,并且在维护问题上我们弱耦合的POMDP算法。数值实验显示了我们方法的效率。
This paper introduces algorithms for problems where a decision maker has to control a system composed of several components and has access to only partial information on the state of each component. Such problems are difficult because of the partial observations, and because of the curse of dimensionality that appears when the number of components increases. Partially observable Markov decision processes (POMDPs) have been introduced to deal with the first challenge, while weakly coupled stochastic dynamic programs address the second. Drawing from these two branches of the literature, we introduce the notion of weakly coupled POMDPs. The objective is to find a policy maximizing the total expected reward over a finite horizon. Our algorithms rely on two ingredients. The first, which can be used independently, is a mixed integer linear formulation for generic POMDPs that computes an optimal memoryless policy. The formulation is strengthened with valid cuts based on a probabilistic interpretation of the dependence between random variables, and its linear relaxation provide a practically tight upper bound on the value of an optimal history-dependent policies. The second is a collection of mathematical programming formulations and algorithms which provide tractable policies and upper bounds for weakly coupled POMDPs. Lagrangian relaxations, fluid approximations, and almost sure constraints relaxations enable to break the curse of dimensionality. We test our generic POMDPs formulations on benchmark instance forms the literature, and our weakly coupled POMDP algorithms on a maintenance problem. Numerical experiments show the efficiency of our approach.