对动态变化环境的增强学习算法的调查

论文标题

对动态变化环境的增强学习算法的调查

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

论文作者

Padakandla, Sindhu

论文摘要

加固学习（RL）算法在库存控制，推荐系统，车辆交通管理，云计算和机器人技术中找到应用。在这些域中出现的许多任务的现实并发症使它们难以解决经典RL算法的基本假设。这些应用程序中的RL代理通常需要做出反应并适应不断变化的操作条件。当对固定环境模型的基本假设放松时，对单药RL技术的研究的重要部分集中在开发算法上。本文提供了用于处理动态变化环境模型的RL方法的调查。不受平稳性假设限制的方法的目的是帮助自主剂适应不同的操作条件。通过最大程度地减少RL代理学习期间损失的奖励，或者通过为RL代理找到合适的政策，从而导致基础系统的有效运行，这是可能的。这些算法的代表性集合在这项工作中详细讨论了它们的分类以及它们的相对优点和缺点。此外，我们还审查了针对应用程序域而定制的作品。最后，我们讨论了该领域的未来增强功能。

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing and robotics. The real-world complications of many tasks arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This paper provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent which leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally we also review works which are tailored to application domains. Finally, we discuss future enhancements for this field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题