变分带宽瓶颈：信息预算的随机评估

论文标题

变分带宽瓶颈：信息预算的随机评估

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

论文作者

Goyal, Anirudh, Bengio, Yoshua, Botvinick, Matthew, Levine, Sergey

论文摘要

在许多应用程序中，希望仅从复杂输入数据中提取相关信息，这涉及对哪些输入功能进行决定。信息瓶颈方法通过保持压缩之间的最佳权衡（丢弃无关的输入信息）并预测目标，将其形式化为信息理论优化问题。在许多问题设置中，包括我们在这项工作中考虑的强化学习问题，我们可能希望仅压缩一部分输入。通常情况下，当我们拥有标准条件输入（例如状态观察）和“特权”输入时，这通常是这种情况，这可能与任务的目标，昂贵的计划算法的输出或与其他代理商进行通信。在这种情况下，我们可能宁愿压缩特权输入，以实现更好的概括（例如，就目标而言）或最大程度地减少对昂贵信息的访问（例如，在通信的情况下）。基于变异推理的信息瓶颈的实际实现需要访问特权输入，以计算瓶颈变量，因此尽管它们执行压缩，但此压缩操作本身需要无限制，无损访问。在这项工作中，我们提出了各种带宽瓶颈，该瓶颈在查看之前的每个示例中都决定了特权信息的估计值，即仅基于标准输入，然后相应地随机选择是否访问特权输入。我们在此框架中制定了可拖动的近似值，并在一系列强化学习实验中证明了它可以改善概括并减少对计算成本昂贵的信息的访问。

In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision about which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题