土匪，上下文匪徒和RL中人类行为剂的统一模型

论文标题

土匪，上下文匪徒和RL中人类行为剂的统一模型

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

论文作者

Lin, Baihan, Cecchi, Guillermo, Bouneffouf, Djallel, Reinen, Jenna, Rish, Irina

论文摘要

通常根据其一致的行为和绩效来评估人造行为代理，以在环境中采取顺序行动，以最大程度地提高累积奖励的概念。但是，现实生活中的人类决策通常涉及导致相同经验结果的不同策略和行为轨迹。以广泛的神经和精神疾病的临床文献促进，我们在这里提出了一个更通用和灵活的参数框架，用于顺序决策，涉及两流奖励处理机制。我们证明，该框架具有灵活性和统一性，足以纳入一个跨越多武器匪徒（MAB），上下文匪徒（CB）和增强学习（RL）的问题，该框架分解了不同级别的顺序决策过程。受许多精神障碍的已知奖励处理异常的启发，我们的临床启发者表现出有趣的行为轨迹和具有特殊奖励分布的模拟任务上的可比性和可比性的性能，在赌博任务中捕获人类决策，以及在终身学习中遇到不同奖励态度的现实决策。

Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life usually involves different strategies and behavioral trajectories that lead to the same empirical outcome. Motivated by clinical literature of a wide range of neurological and psychiatric disorders, we propose here a more general and flexible parametric framework for sequential decision making that involves a two-stream reward processing mechanism. We demonstrated that this framework is flexible and unified enough to incorporate a family of problems spanning multi-armed bandits (MAB), contextual bandits (CB) and reinforcement learning (RL), which decompose the sequential decision making process in different levels. Inspired by the known reward processing abnormalities of many mental disorders, our clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题