马尔可夫噪声和异质性下的联合随机近似：增强学习中的应用

论文标题

马尔可夫噪声和异质性下的联合随机近似：增强学习中的应用

Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning

论文作者

Khodadadian, Sajad, Sharma, Pranay, Joshi, Gauri, Maguluri, Siva Theja

论文摘要

由于众所周知，增强学习算法是数据密集型的，因此从环境中进行采样观测的任务通常在多个代理之间分配。但是，将这些观察结果从代理转移到中心位置可能会非常昂贵，并且也可能损害每个代理人的本地行为政策的隐私。联合强化学习是一个框架，在该框架中，$ n $代理商在不共享其个人数据和政策的情况下协作学习全球模型。该全球模型是$ n $本地运营商平均值的独特固定点，对应于$ n $代理。每个代理都维护全局模型的本地副本，并使用本地采样数据对其进行更新。在本文中，我们表明，通过仔细协作解决此关节固定点问题，我们可以找到更快的全球模型$ n $ times，也称为线性速度。我们首先提出了一个通用框架，用于使用马尔可夫噪声和异质性，以表明收敛性线性加速。然后，我们将此框架应用于联合加强学习算法，研究了联合会上的TD，Off-Policy TD和$ Q $ - 学习的融合。

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of communication cost, and it can also compromise the privacy of each agent's local behavior policy. Federated reinforcement learning is a framework in which $N$ agents collaboratively learn a global model, without sharing their individual data and policies. This global model is the unique fixed point of the average of $N$ local operators, corresponding to the $N$ agents. Each agent maintains a local copy of the global model and updates it using locally sampled data. In this paper, we show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model $N$ times faster, also known as linear speedup. We first propose a general framework for federated stochastic approximation with Markovian noise and heterogeneity, showing linear speedup in convergence. We then apply this framework to federated reinforcement learning algorithms, examining the convergence of federated on-policy TD, off-policy TD, and $Q$-learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题