逆在线学习：了解非平稳和反动政策

论文标题

逆在线学习：了解非平稳和反动政策

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

论文作者

Chan, Alex J., Curth, Alicia, van der Schaar, Mihaela

论文摘要

众所周知，人类决策是不完美的，并且在试图帮助或提高决策者执行任务的能力时，单独分析此类过程的能力至关重要，例如提醒他们潜在的偏见或监督。为此，有必要根据代理如何做出决策以及随着时间的流逝而随着代理商在线学习以反应应计经验时如何随着时间的推移而变化。为了了解一组观察到的轨迹的决策过程，我们将政策推断问题视为与此在线学习问题的反相反。通过在潜在结果框架内解释行动，我们基于选择了他们认为具有最大治疗效果的动作的代理人引入了有意义的映射。我们介绍了一种实用算法，用于回顾性地估计这种感知的效果，以及代理商对其进行更新的过程，并使用一种基于富有表现力的深层状态模型的新型体系结构进行了更新。通过对UNOS器官捐赠接受决策的分析，我们证明了我们的方法可以为控制决策过程以及它们如何随着时间而变化的因素带来宝贵的见解。

Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker's ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience. To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题