论文标题
CASCADE:一个用于粗粒的可重新配置阵列的申请管道工具包
Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays
论文作者
论文摘要
虽然粗颗粒的可重新配置阵列(CGRA)已成为有前途的可编程加速器架构,但需要在CGRA上运行的管道应用程序以确保高最大时钟频率。当前的CGRA编译器要么缺少管道技术,导致性能低,要么进行详尽的管道填充,从而导致高能量和资源消耗。我们介绍了CASCADE,这是一种用于CGRA的应用程序管道工具包,包括CGRA应用频率模型,用于使用密度和稀疏应用程序的CGRA应用程序编译器的自动化管道技术,以及用于提高应用程序频率的硬件优化。级联在各种密集的图像处理和机器学习工作负载中启用7-34倍的临界路径延迟和7-190x降低EDP,以及与没有管道的编译器相比,稀疏工作量的临界路径延迟和稀疏工作负载的降低2-4.4倍。
While coarse-grained reconfigurable arrays (CGRAs) have emerged as promising programmable accelerator architectures, pipelining applications running on CGRAs is required to ensure high maximum clock frequencies. Current CGRA compilers either lack pipelining techniques resulting in low performance or perform exhaustive pipelining resulting in high energy and resource consumption. We introduce Cascade, an application pipelining toolkit for CGRAs, including a CGRA application frequency model, automated pipelining techniques for CGRA application compilers that work with both dense and sparse applications, and hardware optimizations for improving application frequency. Cascade enables 7 - 34x lower critical path delays and 7 - 190x lower EDP across a variety of dense image processing and machine learning workloads, and 2 - 4.4x lower critical path delays and 1.5 - 4.2x lower EDP on sparse workloads, compared to a compiler without pipelining.