论文标题
强大的上下文线性匪徒
Robust Contextual Linear Bandits
论文作者
论文摘要
模型错误指定是统计方法和机器学习应用中的主要考虑因素。但是,在上下文匪徒中通常会忽略它。本文研究了一种常见的错误指定形式,这是一种未被上下文捕获的武器间异质性。为了解决这个问题,我们假设异质性是由于手臂特异性的随机变量而产生的,可以学习。我们称此设置为强大的上下文强盗。特定于手臂的变量解释了未知的臂间异质性,我们将它们纳入了对平均奖励及其不确定性的强大上下文估计器中。我们为我们的设置开发了两种有效的Bandit算法:一种称为Rolinucb的UCB算法和称为Rolints的后验采样算法。我们分析了算法并束缚了他们的$ n $ rond贝叶斯遗憾。我们的实验表明,当错误指定较低时,当不指定识别较高时,Rolints在统计学上与经典方法相当有效,并且比其幼稚实现更加强大,并且计算上的效率更高。
Model misspecification is a major consideration in applications of statistical methods and machine learning. However, it is often neglected in contextual bandits. This paper studies a common form of misspecification, an inter-arm heterogeneity that is not captured by context. To address this issue, we assume that the heterogeneity arises due to arm-specific random variables, which can be learned. We call this setting a robust contextual bandit. The arm-specific variables explain the unknown inter-arm heterogeneity, and we incorporate them in the robust contextual estimator of the mean reward and its uncertainty. We develop two efficient bandit algorithms for our setting: a UCB algorithm called RoLinUCB and a posterior-sampling algorithm called RoLinTS. We analyze both algorithms and bound their $n$-round Bayes regret. Our experiments show that RoLinTS is comparably statistically efficient to the classic methods when the misspecification is low, more robust when the misspecification is high, and significantly more computationally efficient than its naive implementation.