论文标题

最佳武器识别的分位数匪徒

Quantile Bandits for Best Arms Identification

论文作者

Zhang, Mengyan, Ong, Cheng Soon

论文摘要

我们考虑随机多臂匪徒中最佳手臂识别任务的变体。由于规避风险的决策问题,我们的目标是确定在固定预算内具有最高$τ$ Quantile值的$ M $武器。我们证明了不对称的双面浓度不平等的不平等现象,这些统计量和随机变量的分位数可能具有无抵押危险率,这可能具有独立的关注。对于这些不平等,我们分析了连续的接受和拒绝(Q-SAR)的分位版本。我们为ARM错误识别的概率提供了上限,这是针对固定预算多种最佳武器识别的基于分位数的算法的第一个理由。我们展示了最佳手臂识别的说明性实验。

We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $τ$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源