高维协变量的在线批处理决策

论文标题

高维协变量的在线批处理决策

Online Batch Decision-Making with High-Dimensional Covariates

论文作者

Wang, Chi-Hua, Cheng, Guang

论文摘要

我们提出并研究了一类新算法，用于顺序决策制定，这些算法与\ textit {一批用户}相互作用，而不是在每个决策时代的\ textit {a用户}。这种类型的批处理模型是由互动营销和临床试验的动机，在该试验中，一群人同时对待，整个小组的结果是在下一阶段决策之前收集的。在这种情况下，我们的目标是根据观察到的高维用户协变量分配一批治疗方法，以最大程度地提高治疗功效。我们提供了一个名为\ textit {Teamwork Lasso Bandit算法}的解决方案，该解决方案通过在整个决策过程中在团队合作阶段和自私阶段之间切换来解决Explore-expoit困境的批处理版本。这是基于适应一系列批次观察的治疗功效的LASSO估计值的统计特性而成为可能的。通常，提出了最佳分配条件的速率来描述数据收集方案的探索和剥削权衡，这足以确定观察到的用户协变量的最佳处理。提供了对拟议算法的预期累积后悔的上限。

We propose and investigate a class of new algorithms for sequential decision making that interacts with \textit{a batch of users} simultaneously instead of \textit{a user} at each decision epoch. This type of batch models is motivated by interactive marketing and clinical trial, where a group of people are treated simultaneously and the outcomes of the whole group are collected before the next stage of decision. In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates. We deliver a solution, named \textit{Teamwork LASSO Bandit algorithm}, that resolves a batch version of explore-exploit dilemma via switching between teamwork stage and selfish stage during the whole decision process. This is made possible based on statistical properties of LASSO estimate of treatment efficacy that adapts to a sequence of batch observations. In general, a rate of optimal allocation condition is proposed to delineate the exploration and exploitation trade-off on the data collection scheme, which is sufficient for LASSO to identify the optimal treatment for observed user covariates. An upper bound on expected cumulative regret of the proposed algorithm is provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题