无线边缘缓存具有线性函数近似的无线边缘缓存的基于Whittle索引的学习

论文标题

无线边缘缓存具有线性函数近似的无线边缘缓存的基于Whittle索引的学习

Whittle Index based Q-Learning for Wireless Edge Caching with Linear Function Approximation

论文作者

Xiong, Guojun, Wang, Shufan, Li, Jian, Singh, Rahul

论文摘要

我们考虑在无线边缘的内容缓存的问题，可以通过不可靠的无线通道为一组最终用户服务，以最大程度地减少由于无线边缘高速缓存能力的约束而最终用户所经历的平均潜伏期。我们将这个问题提出为马尔可夫决策过程，或者更具体地说是一个不安的多军强盗问题，事实证明这很难解决。我们首先调查折扣价，并证明它承认阈值类型的最佳政策。然后，我们证明此结果也适用于平均延迟问题。利用这种结构性结果，我们确定了问题的索引性，并采用了Whittle指数政策来最大程度地减少平均延迟。由于诸如内容请求率和无线通道条件之类的系统参数通常是未知的，并且时间变化，因此我们进一步开发了一种称为Q^{+}的无模型增强学习算法 - 依赖于Whittle Index策略的Whittle。但是，Q^{+} - Whittle需要为所有状态行动对存储Q功能值，而无线边缘缓存的数量可能非常大。为此，我们通过具有较小尺寸的参数化函数类近似Q功能，并进一步设计了具有线性函数近似值的Q^{+} - Whittle算法，称为q^{+} - whittle-lfa。我们在q^{+} -whittle-lfa的均方误差上提供有限的时间限制。使用真实痕迹的仿真结果表明Q^{+} - Whittle-LFA产生了出色的经验性能。

We consider the problem of content caching at the wireless edge to serve a set of end users via unreliable wireless channels so as to minimize the average latency experienced by end users due to the constrained wireless edge cache capacity. We formulate this problem as a Markov decision process, or more specifically a restless multi-armed bandit problem, which is provably hard to solve. We begin by investigating a discounted counterpart, and prove that it admits an optimal policy of the threshold-type. We then show that this result also holds for average latency problem. Using this structural result, we establish the indexability of our problem, and employ the Whittle index policy to minimize average latency. Since system parameters such as content request rates and wireless channel conditions are often unknown and time-varying, we further develop a model-free reinforcement learning algorithm dubbed as Q^{+}-Whittle that relies on Whittle index policy. However, Q^{+}-Whittle requires to store the Q-function values for all state-action pairs, the number of which can be extremely large for wireless edge caching. To this end, we approximate the Q-function by a parameterized function class with a much smaller dimension, and further design a Q^{+}-Whittle algorithm with linear function approximation, which is called Q^{+}-Whittle-LFA. We provide a finite-time bound on the mean-square error of Q^{+}-Whittle-LFA. Simulation results using real traces demonstrate that Q^{+}-Whittle-LFA yields excellent empirical performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题