一项有关学习与路由游戏中的平衡相关的实验研究

论文标题

一项有关学习与路由游戏中的平衡相关的实验研究

An Experimental Study on Learning Correlated Equilibrium in Routing Games

论文作者

Zhu, Yixian, Savla, Ketan

论文摘要

我们在重复的路由游戏中研究路线选择，其中不确定的自然状态决定了链接延迟功能，并且代理会收到私人路线建议。该状态在I.I.D.在公开分布中的每一轮方式中，这些建议是由随机策略产生的，该策略的映射是公开的。在一次性的环境中，据说代理商会遵守建议，如果它给出了后验预期的最小旅行时间。重复设置的合理扩展是，在一轮中遵循建议的可能性与以前的一轮遗憾有关。如果对默认选择的遗憾是令人满意的类型，并且在过去和所有代理商中平均，那么在听话建议政策下的渐近结果与单杆结果相吻合。我们报告一次在计算机上重复选择路线选择决策的一个参与者的实验发现。在每一轮中，都会向参与者显示每条路线的旅行时间分布，一项由听话政策产生的路线推荐以及建议以推荐质量的先前参与者的平均经验。进入路线选择后，揭示了实际的旅行时间。参与者通过提交审查来评估推荐质量。这与历史评论相结合，以更新下一轮的评级。来自33个参与者的数据分析有100发子弹表明显示额定评级与平均遗憾之间的中等负相关，以及评级与遵循建议的可能性之间的牢固正相关。总体而言，在听话建议策略下，在实验结束时，额定值与非常高的以下建议相结合。

We study route choice in a repeated routing game where an uncertain state of nature determines link latency functions, and agents receive private route recommendation. The state is sampled in an i.i.d. manner in every round from a publicly known distribution, and the recommendations are generated by a randomization policy whose mapping from the state is known publicly. In a one-shot setting, the agents are said to obey recommendation if it gives the smallest travel time in a posteriori expectation. A plausible extension to repeated setting is that the likelihood of following recommendation in a round is related to regret from previous rounds. If the regret is of satisficing type with respect to a default choice and is averaged over past rounds and over all agents, then the asymptotic outcome under an obedient recommendation policy coincides with the one-shot outcome. We report findings from an experiment with one participant at a time engaged in repeated route choice decision on computer. In every round, the participant is shown travel time distribution for each route, a route recommendation generated by an obedient policy, and a rating suggestive of average experience of previous participants with the quality of recommendation. Upon entering route choice, the actual travel times are revealed. The participant evaluates the quality of recommendation by submitting a review. This is combined with historical reviews to update rating for the next round. Data analysis from 33 participants each with 100 rounds suggests moderate negative correlation between the display rating and the average regret, and a strong positive correlation between the rating and the likelihood of following recommendation. Overall, under obedient recommendation policy, the rating converges close to its maximum value by the end of the experiments in conjunction with very high frequency of following recommendations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题