论文标题
同质选择标签问题的最佳政策
Optimal Policies for the Homogeneous Selective Labels Problem
论文作者
论文摘要
选择性标签是结果决策应用程序的共同特征,指的是在可能的决策之一下缺乏观察到的结果。本文报告了面对选择性标签的学习决策政策的工作正在进行中。所考虑的设置既是简化的均质,也无视个人的特征以促进确定最佳政策,又是一个在线政策,以平衡在学习与未来实用程序中所产生的成本。为了最大程度地提高折扣的总奖励,最佳政策被证明是一个门槛政策,问题是最佳停止之一。相比之下,对于未交望的无限马平均奖励,最佳政策在所有州都有积极的接受概率。讨论了这些结果的未来工作。
Selective labels are a common feature of consequential decision-making applications, referring to the lack of observed outcomes under one of the possible decisions. This paper reports work in progress on learning decision policies in the face of selective labels. The setting considered is both a simplified homogeneous one, disregarding individuals' features to facilitate determination of optimal policies, and an online one, to balance costs incurred in learning with future utility. For maximizing discounted total reward, the optimal policy is shown to be a threshold policy, and the problem is one of optimal stopping. In contrast, for undiscounted infinite-horizon average reward, optimal policies have positive acceptance probability in all states. Future work stemming from these results is discussed.