作为操纵大脑的新兴特性

论文标题

作为操纵大脑的新兴特性

A Number Sense as an Emergent Property of the Manipulating Brain

论文作者

Kondapaneni, Neehar, Perona, Pietro

论文摘要

在童年时期出现了理解和操纵数量和数量的能力，但是人类获得和发展这种能力的机制仍然很少了解。我们假设学习者能够从其选择的位置和地点放置小物体，并将自发进行这种无方向的操作，从而通过模型探索这个问题。我们进一步假设学习者的视觉系统将监视场景中对象的变化布置，并通过将感知与电机系统的监督信号进行比较来预测每个动作的效果。我们使用标准深网的感知对特征提取和分类以及梯度下降学习进行建模。我们的主要发现是，从学习行动预测任务中，出现了意外的图像表示形式，表现出规律性，预示了数字和数量的感知和表示。这些包括零的不同类别和前几个自然数，严格的数字排序以及与数值相关的一维信号。结果，我们的模型获得了估计数值的能力，即场景中的对象数量以及子键化，即能够一目了然地识别小场景中的对象数量。值得注意的是，尺寸和数字估计推断到包含许多对象的场景，远远超出了训练过程中使用的三个对象。我们得出的结论是，通过简单的预训练任务的监督，可以从数量和数量的设施的重要方面学习。我们的观察结果表明，跨模式学习是一种强大的学习机制，可以在人工智能中利用。

The ability to understand and manipulate numbers and quantities emerges during childhood, but the mechanism through which humans acquire and develop this ability is still poorly understood. We explore this question through a model, assuming that the learner is able to pick up and place small objects from, and to, locations of its choosing, and will spontaneously engage in such undirected manipulation. We further assume that the learner's visual system will monitor the changing arrangements of objects in the scene and will learn to predict the effects of each action by comparing perception with a supervisory signal from the motor system. We model perception using standard deep networks for feature extraction and classification, and gradient descent learning. Our main finding is that, from learning the task of action prediction, an unexpected image representation emerges exhibiting regularities that foreshadow the perception and representation of numbers and quantity. These include distinct categories for zero and the first few natural numbers, a strict ordering of the numbers, and a one-dimensional signal that correlates with numerical quantity. As a result, our model acquires the ability to estimate numerosity, i.e. the number of objects in the scene, as well as subitization, i.e. the ability to recognize at a glance the exact number of objects in small scenes. Remarkably, subitization and numerosity estimation extrapolate to scenes containing many objects, far beyond the three objects used during training. We conclude that important aspects of a facility with numbers and quantities may be learned with supervision from a simple pre-training task. Our observations suggest that cross-modal learning is a powerful learning mechanism that may be harnessed in artificial intelligence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题