论文标题
在随着时变的系统中的增强学习神秘
Demystifying Reinforcement Learning in Time-Varying Systems
论文作者
论文摘要
最近的研究转向了加强学习(RL)来解决具有挑战性的决策问题,以替代手工调整的启发式方法。 RL可以学习良好的政策,而无需对环境的动态进行建模。尽管有这样的承诺,但RL仍然是许多现实世界系统问题的不切实际的解决方案。当环境随时间变化时,即它表现出非平稳性时,就会发生一个特别具有挑战性的案例。在这项工作中,我们表征了非平稳性所带来的挑战,阐明了他们的方法范围,并开发了一个强大的框架,以解决它们在实时系统中训练RL代理的培训。这样的代理必须探索和学习新环境,而不会损害系统的性能,并随着时间的推移记住它们。为此,我们的框架(i)识别了实时系统遇到的不同环境,(ii)必要时触发探索,(iii)采取预防措施来保留先前环境中的知识,(iv)在RL代理会误入歧途时使用保障措施来保护系统的性能。我们将框架应用于两个系统问题,即缓解和自适应视频流,并根据使用现实世界和合成数据对多种替代方法进行评估。我们表明,框架的所有组件都是应对非平稳性并为每个组件提供替代设计选择的指导所必需的。
Recent research has turned to Reinforcement Learning (RL) to solve challenging decision problems, as an alternative to hand-tuned heuristics. RL can learn good policies without the need for modeling the environment's dynamics. Despite this promise, RL remains an impractical solution for many real-world systems problems. A particularly challenging case occurs when the environment changes over time, i.e. it exhibits non-stationarity. In this work, we characterize the challenges introduced by non-stationarity, shed light on the range of approaches to them and develop a robust framework for addressing them to train RL agents in live systems. Such agents must explore and learn new environments, without hurting the system's performance, and remember them over time. To this end, our framework (i) identifies different environments encountered by the live system, (ii) triggers exploration when necessary, (iii) takes precautions to retain knowledge from prior environments, and (iv) employs safeguards to protect the system's performance when the RL agent makes mistakes. We apply our framework to two systems problems, straggler mitigation and adaptive video streaming, and evaluate it against a variety of alternative approaches using real-world and synthetic data. We show that all components of the framework are necessary to cope with non-stationarity and provide guidance on alternative design choices for each component.