论文标题
用插槽变压器解决推理任务
Solving Reasoning Tasks with a Slot Transformer
论文作者
论文摘要
为了推理时间和空间,将世界雕刻成有用的抽象的能力是智力的关键组成部分。为了成功地使用感官有效地感知和行动,我们必须解析并压缩大量信息,以进一步进行下游推理,从而使越来越复杂的概念出现。如果有希望扩展表示形式学习方法来与现实世界相处和时间动态的合作,那么必须有一种方法来学习整个时间的准确,简洁和合成的抽象。我们介绍了插槽变压器,该体系结构在视频场景数据上利用插槽的关注,变压器和迭代变异推断来推断此类表示。我们评估了Clevrer,Kinetics-600和Cater日期的插槽变压器,并证明该方法使我们能够围绕复杂行为进行稳健的建模和推理,并在这些数据集上进行分数,这些数据集比现有基础线相比有利。最后,我们评估了体系结构关键组成部分的有效性,模型的表示能力及其从不完整输入中预测的能力。
The ability to carve the world into useful abstractions in order to reason about time and space is a crucial component of intelligence. In order to successfully perceive and act effectively using senses we must parse and compress large amounts of information for further downstream reasoning to take place, allowing increasingly complex concepts to emerge. If there is any hope to scale representation learning methods to work with real world scenes and temporal dynamics then there must be a way to learn accurate, concise, and composable abstractions across time. We present the Slot Transformer, an architecture that leverages slot attention, transformers and iterative variational inference on video scene data to infer such representations. We evaluate the Slot Transformer on CLEVRER, Kinetics-600 and CATER datesets and demonstrate that the approach allows us to develop robust modeling and reasoning around complex behaviours as well as scores on these datasets that compare favourably to existing baselines. Finally we evaluate the effectiveness of key components of the architecture, the model's representational capacity and its ability to predict from incomplete input.