贝叶斯嗅觉搜索湍流中的最佳政策

论文标题

贝叶斯嗅觉搜索湍流中的最佳政策

Optimal policies for Bayesian olfactory search in turbulent flows

论文作者

Heinonen, Robin A., Biferale, Luca, Celani, Antonio, Vergassola, Massimo

论文摘要

在许多实际情况下，飞行的昆虫必须寻找大气风延伸的发出的提示来源。在宏观尺度上，湍流倾向于在非常低浓度的背景上将提示混合成相对较高浓度的斑块，因此昆虫只会间歇性地检测提示，并且不能依靠仅攀升浓度梯度的趋化策略。在这项工作中，我们以部分可观察到的马尔可夫决策过程（POMDP）的语言提出了这个搜索问题，并使用Perseus算法来计算到达时间几乎最佳的策略。我们在大型二维网格上测试了计算的策略，介绍了由此产生的轨迹和到达时间统计，并将其与几种启发式策略的相应结果进行比较，包括（太空感知）信息触发，汤普森采样和QMDP。我们发现，通过我们的Perseus实施，我们通过多种措施测试的所有启发式方法所发现的近乎最佳的政策。我们使用近乎最佳的政策来研究搜索难度如何取决于起始位置。我们还讨论了最初信念的选择以及政策对环境变化的鲁棒性。最后，我们进行了有关使用奖励成型功能的珀尔修斯算法的实施的详细和教学讨论，包括利益和陷阱。

In many practical scenarios, a flying insect must search for the source of an emitted cue which is advected by the atmospheric wind. On the macroscopic scales of interest, turbulence tends to mix the cue into patches of relatively high concentration over a background of very low concentration, so that the insect will only detect the cue intermittently and cannot rely on chemotactic strategies which simply climb the concentration gradient. In this work, we cast this search problem in the language of a partially observable Markov decision process (POMDP) and use the Perseus algorithm to compute strategies that are near-optimal with respect to the arrival time. We test the computed strategies on a large two-dimensional grid, present the resulting trajectories and arrival time statistics, and compare these to the corresponding results for several heuristic strategies, including (space-aware) infotaxis, Thompson sampling, and QMDP. We find that the near-optimal policy found by our implementation of Perseus outperforms all heuristics we test by several measures. We use the near-optimal policy to study how the search difficulty depends on the starting location. We discuss additionally the choice of initial belief and the robustness of the policies to changes in the environment. Finally, we present a detailed and pedagogical discussion about the implementation of the Perseus algorithm, including the benefits -- and pitfalls -- of employing a reward shaping function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题