当地的差异隐私在强化学习中遗憾的最小化遗憾

论文标题

当地的差异隐私在强化学习中遗憾的最小化遗憾

Local Differential Privacy for Regret Minimization in Reinforcement Learning

论文作者

Garcelon, Evrard, Perchet, Vianney, Pike-Burke, Ciara, Pirotta, Matteo

论文摘要

强化学习算法被广泛用于提供个性化服务的领域。在这些域中，常见的用户数据包含需要保护第三方的敏感信息。在此激励的情况下，我们在有限的马尔可夫决策过程（MDP）的背景下研究隐私，要求在用户方面混淆信息。我们通过利用当地差异隐私（LDP）框架来为RL制定这种隐私概念。我们在有限的摩托车MDP中建立了一个遗憾最小化的下限，并提供有限公司保证，这表明保证隐私会对遗憾产生乘法影响。该结果表明，虽然自然党是一个吸引人的隐私概念，但它使学习问题变得更加复杂。最后，我们提出了一种乐观的算法，该算法同时满足$ \ varepsilon $ -LDP的要求，并在任何有限的horizon mdp中都能达到$ \ sqrt {k}/\ varepsilon $遗憾。

Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We establish a lower bound for regret minimization in finite-horizon MDPs with LDP guarantees which shows that guaranteeing privacy has a multiplicative effect on the regret. This result shows that while LDP is an appealing notion of privacy, it makes the learning problem significantly more complex. Finally, we present an optimistic algorithm that simultaneously satisfies $\varepsilon$-LDP requirements, and achieves $\sqrt{K}/\varepsilon$ regret in any finite-horizon MDP after $K$ episodes, matching the lower bound dependency on the number of episodes $K$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题