多武器匪徒探索中的资源分配：通过自适应平行性克服sublinear缩放

论文标题

多武器匪徒探索中的资源分配：通过自适应平行性克服sublinear缩放

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism

论文作者

Thananjeyan, Brijen, Kandasamy, Kirthevasan, Stoica, Ion, Jordan, Michael I., Goldberg, Ken, Gonzalez, Joseph E.

论文摘要

当我们可以访问可分配的资源时，我们可以研究随机多臂匪徒的探索，该资源可以分配给手臂拉动。我们特别关注分布式计算资源的分配，我们可以通过每次拉力分配更多资源来更快地获得结果，但由于非线性缩放，我们可能会减少吞吐量。例如，在基于模拟的科学研究中，可以通过在多个核心上运行昂贵的模拟来加快仿真。然而，这种加速的部分被核之间的通信所抵消，这导致吞吐量要比每次试验分配较少的核心以并行进行更多试验的情况下的吞吐量较低。在本文中，我们在两个环境中探讨了这些权衡。首先，在固定的置信环境中，我们需要尽快找到具有给定目标成功概率的最佳臂。我们提出了一种算法，该算法在信息积累和吞吐量之间进行了交易，并表明所花费的时间可以由动态程序的解决方案上限，该程序的输入是亚最佳和最佳臂之间的差距。我们还证明了匹配的硬度结果。其次，我们为固定的截止日期设置提供了一种算法，在该设置中，我们将获得时间截止日期，并需要最大程度地找到最佳的ARM的可能性。我们通过模拟实验证实了我们的理论见解，这些实验表明算法在各种问题实例上始终匹配或胜过基线算法。

We study exploration in stochastic multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull, but might have reduced throughput due to nonlinear scaling. For example, in simulation-based scientific studies, an expensive simulation can be sped up by running it on multiple cores. This speed-up however, is partly offset by the communication among cores, which results in lower throughput than if fewer cores were allocated per trial to run more trials in parallel. In this paper, we explore these trade-offs in two settings. First, in a fixed confidence setting, we need to find the best arm with a given target success probability as quickly as possible. We propose an algorithm which trades off between information accumulation and throughput and show that the time taken can be upper bounded by the solution of a dynamic program whose inputs are the gaps between the sub-optimal and optimal arms. We also prove a matching hardness result. Second, we present an algorithm for a fixed deadline setting, where we are given a time deadline and need to maximize the probability of finding the best arm. We corroborate our theoretical insights with simulation experiments that show that the algorithms consistently match or outperform baseline algorithms on a variety of problem instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题