论文标题
奖励还不够:我们可以从加强学习范式中解放出AI吗?
Reward is not enough: can we liberate AI from the reinforcement learning paradigm?
论文作者
论文摘要
我提出了反对银,Singh,Precup和Sutton提出的假设的论点(https://www.sciencectirect.com/science/science/article/pii/pii/s0004370221000862):奖励最大化不足以与许多与自然和人工知识相关的解释活动,包括自然和人工知识,学习,perction,社交,社交,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,练习,效仿。我表明,这种减少的卢克鲁姆(Lucrum)具有其智力起源,这是经济经济的政治经济学,并且与行为主义的激进版本重叠。我展示了为什么强化学习范式在某些实际应用中证明了有用性,但它是智力的不完整框架 - 自然和人为的。智能行为的复杂性不仅仅是奖励最大化的二阶并发症。这一事实对实际上可用,智能,安全和强大的人工智能代理人的发展具有深远的影响。
I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( https://www.sciencedirect.com/science/article/pii/S0004370221000862 ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.