论文标题
匪徒的匪徒
Bandits with adversarial scaling
论文作者
论文摘要
我们研究了“对抗缩放”,这是一种多臂匪徒模型,其中奖励具有随机性和对抗性成分。我们的模型捕获了显示广告,其中“点击率率”可以分解为(跨时间)的ARM质量组件和非策略的用户 - 权力组件(固定在臂上)。尽管我们的模型具有相对的随机性,但我们演示了大多数强盗算法遭受的两个设置。从积极的一面来看,我们证明了两种算法,一种是消除动作的算法,另一种是从镜子下降家族中的自适应,足以对对抗性缩放。我们的结果阐明了随机匪徒自适应参数选择的鲁棒性,这可能是独立的。
We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.