将差距弥合为以对象为中心的学习

论文标题

将差距弥合为以对象为中心的学习

Bridging the Gap to Real-World Object-Centric Learning

论文作者

Seitzer, Maximilian, Horn, Max, Zadaianchuk, Andrii, Zietlow, Dominik, Xiao, Tianjun, Simon-Gabriel, Carl-Johann, He, Tong, Zhang, Zheng, Schölkopf, Bernhard, Brox, Thomas, Locatello, Francesco

论文摘要

人类自然会将其环境分解为在世界范围内采取行动的适当水平的实体。允许机器学习算法以无监督的方式得出这种分解已成为重要的研究。但是，当前方法仅限于模拟数据，或者需要运动或深度形式的其他信息才能成功发现对象。在这项工作中，我们通过表明以自我监督方式训练的模型的重建功能来克服了这一限制，这是一个充分的训练信号，以完全无监督的方式出现以中心的形式。我们的方法，恐龙在模拟数据上大大超过了基于图像的以对象为中心的学习模型，并且是第一个无监督对象的模型，它扩展到可可和Pascal VOC等真实世界数据集。恐龙在概念上很简单，与计算机视觉文献中更多有关管道相比，恐龙表现出竞争性的性能。

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题