数据驱动的评估培训行动空间用于增强学习

论文标题

数据驱动的评估培训行动空间用于增强学习

Data-Driven Evaluation of Training Action Space for Reinforcement Learning

论文作者

Ghosh, Rajat, Dutta, Debojyoti

论文摘要

培训行动空间选择强化学习（RL）由于复杂的国家行动关系而容易出现冲突。为了应对这一挑战，本文提出了一种以沙普利为灵感的方法，用于培训行动空间分类和排名。为了减少指数时间的沙普利计算，该方法包括蒙特卡洛模拟，以避免不必要的探索。使用云基础架构调整案例研究来说明方法的有效性。它将搜索空间降低了80 \％，并将训练操作集分为可分配和必不可少的组。此外，它排名不同的培训措施，以促进高性能但具有成本效益的RL模型设计。所提出的数据驱动方法可扩展到不同的领域，用例和增强学习算法。

Training action space selection for reinforcement learning (RL) is conflict-prone due to complex state-action relationships. To address this challenge, this paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation to avoid unnecessary explorations. The effectiveness of the methodology is illustrated using a cloud infrastructure resource tuning case study. It reduces the search space by 80\% and categorizes the training action sets into dispensable and indispensable groups. Additionally, it ranks different training actions to facilitate high-performance yet cost-efficient RL model design. The proposed data-driven methodology is extensible to different domains, use cases, and reinforcement learning algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题