用于卷积操作加速的基于FPGA的解决方案

论文标题

用于卷积操作加速的基于FPGA的解决方案

An FPGA-based Solution for Convolution Operation Acceleration

论文作者

Pham, Trung Dinh, Bach, Bao Gia, Luu, Lam Trinh, Nguyen, Minh Dinh, Pham, Hai Duc, Anh, Khoa Bui, Nguyen, Xuan Quang, Quoc, Cuong Pham

论文摘要

基于硬件的加速度是促进许多计算密集型数学操作的广泛尝试。本文提出了一个基于FPGA的体系结构，以加速卷积操作 - 在许多卷积神经网络模型中出现的复杂且昂贵的计算步骤。我们将设计定为标准卷积操作，打算以边缘-AI解决方案启动产品。该项目的目的是产生一个可以一次处理卷积层的FPGA IP核心。系统开发人员可以使用Verilog HDL作为体系结构的主要设计语言来部署IP核心。实验结果表明，我们在简单的边缘计算FPGA板上合成的单个计算核心可以提供0.224 GOPS。当董事会充分利用时，可以实现4.48 GOP。

Hardware-based acceleration is an extensive attempt to facilitate many computationally-intensive mathematics operations. This paper proposes an FPGA-based architecture to accelerate the convolution operation - a complex and expensive computing step that appears in many Convolutional Neural Network models. We target the design to the standard convolution operation, intending to launch the product as an edge-AI solution. The project's purpose is to produce an FPGA IP core that can process a convolutional layer at a time. System developers can deploy the IP core with various FPGA families by using Verilog HDL as the primary design language for the architecture. The experimental results show that our single computing core synthesized on a simple edge computing FPGA board can offer 0.224 GOPS. When the board is fully utilized, 4.48 GOPS can be achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题