论文标题

迭代生成的可扩展自适应计算

Scalable Adaptive Computation for Iterative Generation

论文作者

Jabri, Allan, Fleet, David, Chen, Ting

论文摘要

自然数据是冗余但主要的体系结构在其输入和输出空间上均匀的瓷砖计算。我们提出了Recurrent接口网络(RIN),这是一种基于注意力的体系结构,将其核心计算与数据的维度分解,从而实现自适应计算,从而可以更可扩展地生成高维数据。 RIN将大部分计算(即全球自我注意力)集中在一组潜在代币上,并使用跨注意力读写(即路线)潜在和数据代币之间的信息。堆叠RIN块允许自下而上(数据对潜在)和自上而下(潜在数据)反馈,从而导致更深,更具表现力的路由。尽管此路由引入了挑战,但在复发计算设置中,任务(和路由问题)逐渐变化,例如具有扩散模型的迭代生成。我们通过在反向扩散过程的每个正向传递中与先前计算的那些(即潜在的自我调节)调节潜在传递过程的潜在令牌,从而展示了如何利用复发。 RIN产生用于图像和视频生成的最先进的像素扩散模型,比例比2D和3D U-nets缩放到没有级联反应或指导的1024x1024图像,同时是域 - 不可替代的图像。

Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源