多任务匪徒和MDP中的可证明的通用函数类表示学习

论文标题

多任务匪徒和MDP中的可证明的通用函数类表示学习

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

论文作者

Lu, Rui, Zhao, Andrew, Du, Simon S., Huang, Gao

论文摘要

尽管多任务表示学习已成为增强学习（RL）的一种流行方法，以提高样本效率，但对为什么以及如何工作的理论理解仍然受到限制。大多数先前的分析工作只能假设代理或线性函数类已经知道表示函数，因为分析一般函数类表示遇到非平凡的技术障碍，例如概括性保证，构建置信度，在抽象功能空间中绑定的置信度等等。但是，线性案例分析在线性分析中均与真实的函数类别相同，而实际上是neeal necartion central conserion central inneear contraction neear fuctions fuctions futction futctions futctions。这大大降低了其适用性。在这项工作中，我们将分析扩展到通用函数类表示。具体而言，我们考虑使用我们建议的广义功能上置信度算法（GFUCB）从特定函数类别$φ$中提取共享表示函数$ ϕ $同时考虑播放$ M $上下文的强盗（或MDP）。从理论上讲，我们首次验证了通用功能类中多任务表示学习的好处，并首次为线性MDP验证了。最后，我们进行实验以证明我们算法的有效性，并通过神经净表示。

While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited. Most previous analytical works could only assume that the representation function is already known to the agent or from linear function class, since analyzing general function class representation encounters non-trivial technical obstacles such as generalization guarantee, formulation of confidence bound in abstract function space, etc. However, linear-case analysis heavily relies on the particularity of linear function class, while real-world practice usually adopts general non-linear representation functions like neural networks. This significantly reduces its applicability. In this work, we extend the analysis to general function class representations. Specifically, we consider an agent playing $M$ contextual bandits (or MDPs) concurrently and extracting a shared representation function $ϕ$ from a specific function class $Φ$ using our proposed Generalized Functional Upper Confidence Bound algorithm (GFUCB). We theoretically validate the benefit of multitask representation learning within general function class for bandits and linear MDP for the first time. Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题