论文标题

Whittle指数政策的渐近最优性的指数收敛率

Exponential Convergence Rate for the Asymptotic Optimality of Whittle Index Policy

论文作者

Gast, Nicolas, Gaujal, Bruno, Yan, Chen

论文摘要

当土匪的数量增加时,我们评估了不安的马尔可夫匪徒的晶体索引政策的性能。在[30]中证明,如果匪徒可索引并且相关的确定性系统具有全球吸引子的固定点,则该性能是渐近的最佳选择。在本文中,我们表明,在相同的条件下,收敛速率在匪徒的数量中是指数的,除非固定点是单数(稍后定义)。我们的证明是基于管理随机系统的确定性方程的性质:我们表明,它是匪徒经验度量的单纯形式的分段仿射连续的动力学系统。使用模拟和数值求解器,我们还研究了违反指数速率定理条件的情况,特别是在吸引极限周期出现时或固定点是单数时。我们在文献中对马尔可夫褪色渠道模型进行了很好的研究。最后,我们将同步模型结果扩展到异步模型。

We evaluate the performance of Whittle index policy for restless Markovian bandits, when the number of bandits grows. It is proven in [30] that this performance is asymptotically optimal if the bandits are indexable and the associated deterministic system has a global attractor fixed point. In this paper we show that, under the same conditions, the convergence rate is exponential in the number of bandits, unless the fixed point is singular (to be defined later). Our proof is based on the nature of the deterministic equation governing the stochastic system: We show that it is a piecewise affine continuous dynamical system inside the simplex of the empirical measure of the bandits. Using simulations and numerical solvers, we also investigate the cases where the conditions for the exponential rate theorem are violated, notably when attracting limit cycles appear, or when the fixed point is singular. We illustrate our theorem on a Markovian fading channel model, which has been well studied in the literature. Finally, we extend our synchronous model results to the asynchronous model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源