奖励还不够：我们可以从加强学习范式中解放出AI吗？

论文标题

奖励还不够：我们可以从加强学习范式中解放出AI吗？

Reward is not enough: can we liberate AI from the reinforcement learning paradigm?

论文作者

Glukhov, Vacslav

论文摘要

我提出了反对银，Singh，Precup和Sutton提出的假设的论点（https://www.sciencectirect.com/science/science/article/pii/pii/s0004370221000862）：奖励最大化不足以与许多与自然和人工知识相关的解释活动，包括自然和人工知识，学习，perction，社交，社交，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，练习，效仿。我表明，这种减少的卢克鲁姆（Lucrum）具有其智力起源，这是经济经济的政治经济学，并且与行为主义的激进版本重叠。我展示了为什么强化学习范式在某些实际应用中证明了有用性，但它是智力的不完整框架 - 自然和人为的。智能行为的复杂性不仅仅是奖励最大化的二阶并发症。这一事实对实际上可用，智能，安全和强大的人工智能代理人的发展具有深远的影响。

I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( https://www.sciencedirect.com/science/article/pii/S0004370221000862 ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题