论文标题

马尔可夫决策过程中的隐藏对称和模型减少:解释并应用于多周期新闻供应商问题

Hidden Symmetries and Model Reduction in Markov Decision Processes: Explained and Applied to the Multi-period Newsvendor Problem

论文作者

Joosten, Tobias, Küfer, Karl-Heinz

论文摘要

对称破坏是模型减少马尔可夫决策过程(MDP)的常见方法。该方法仅使用直接访问的对称性(例如几何对称性)。对于某些MDP,可以等效地转换它们以使对称性变得可以访问 - 我们称这种类型的对称性隐藏的对称性。对于这些MDP,与直接可访问的对称性相比,隐藏的对称性可以大大降低模型。主要思想是通过改变奖励结构,然后通过形成商MDP来揭示隐藏的对称性。商MDP是减少的MDP,因为它足以解决商MDP而不是原始的MDP。在本文中,我们介绍了隐藏的对称性和模型还原的相关概念。我们在多周期新闻供应商问题上证明了这个概念,这是新闻供应商的问题在无限的天数中考虑的问题。通过这种方式,我们表明隐藏的对称性可以减少直接访问对称性无法访问的问题,并提出了在多期问题中揭示隐藏对称性的基本思想。提出的方法可以扩展到更复杂的问题。

Symmetry breaking is a common approach for model reduction of Markov decision processes (MDPs). This approach only uses directly accessible symmetries such as geometric symmetries. For some MDPs, it is possible to transform them equivalently such that symmetries become accessible -- we call this type of symmetries hidden symmetries. For these MDPs, hidden symmetries allow substantially better model reduction compared to directly accessible symmetries. The main idea is to reveal a hidden symmetry by altering the reward structure and then exploit the revealed symmetry by forming a quotient MDP. The quotient MDP is the reduced MDP, since it is sufficient to solve the quotient MDP instead of the original one. In this paper, we introduce hidden symmetries and the associated concept of model reduction. We demonstrate this concept on the multi-period newsvendor problem, which is the newsvendor problem considered over an infinite number of days. In this way, we show that hidden symmetries can reduce problems that directly accessible symmetries cannot, and present a basic idea of revealing hidden symmetries in multi-period problems. The presented approach can be extended to more sophisticated problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源