论文标题
从原始输入中学习单词参考映射和概念
Learning word-referent mappings and concepts from raw inputs
论文作者
论文摘要
儿童如何通过嘈杂,模棱两可,自然主义的意见来学习语言与世界之间的对应关系?一个假设是通过跨词语学习:在多种情况下跟踪单词及其可能的指南,使学习者可以消除正确的单词参考映射(Yu&Smith,2007年)。但是,以前的跨词语单词学习模型在高度简化的表示上运作,侧向实际学习问题的两个重要方面。首先,如何从图像之类的原始输入中学到单词参考映射?其次,这些学到的映射如何概括为已知单词的新颖实例?在本文中,我们提出了一个神经网络模型,该模型通过自我划分从头开始训练,该模型将原始图像和单词作为输入作为输入,并表明它可以通过跨信号学习从完全模棱两可的场景和话语中学习单词参考映射。此外,该模型概括为新颖的单词实例,在场景中找到单词的引用,并显示对相互排他性的偏爱。
How do children learn correspondences between the language and the world from noisy, ambiguous, naturalistic input? One hypothesis is via cross-situational learning: tracking words and their possible referents across multiple situations allows learners to disambiguate correct word-referent mappings (Yu & Smith, 2007). However, previous models of cross-situational word learning operate on highly simplified representations, side-stepping two important aspects of the actual learning problem. First, how can word-referent mappings be learned from raw inputs such as images? Second, how can these learned mappings generalize to novel instances of a known word? In this paper, we present a neural network model trained from scratch via self-supervision that takes in raw images and words as inputs, and show that it can learn word-referent mappings from fully ambiguous scenes and utterances through cross-situational learning. In addition, the model generalizes to novel word instances, locates referents of words in a scene, and shows a preference for mutual exclusivity.