FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters

201881 citationsJournal Article

Authors

Tong Geng · Boston University

Tianqi Wang · University of Science and Technology of China

Ahmed Sanaullah · Boston University

Chen Yang · Boston University

Rui Xu · University of Science and Technology of China

Rushi Patel · Boston University

Martin Herbordt ·

Abstract

FPGA-based CNN accelerators have advantages in flexibility and power efficiency and so are being deployed by a number of cloud computing service providers, including Microsoft, Amazon, Tencent, and Alibaba. Given the increasing complexity of neural networks, however, it is becoming challenging to efficiently map CNNs to multi-FPGA platforms. In this work, we present a scalable framework, FPDeep, which helps engineers map a specific CNN's training logic to a multi-FPGA cluster or cloud and to build RTL implementations for the target network. With FPDeep, multi-FPGA accelerators work in a deeply-pipelined manner using a simple 1-D topology; this enables the accelerators to map directly onto many existing platforms, including Catapult, Catapult2, and almost any tightly-coupled FPGA cluster. FPDeep uses two mechanisms to facilitate high-performance and energy-efficiency. First, FPDeep provides a strategy to balance workload among FPGAs, leading to improved utilization. Second, training of CNNs is executed in a fine-grained inter- and intra-layer pipelined manner, minimizing the time that features need to remain available while waiting for back-propagation. This reduces the storage demand to where only on-chip memory is required for convolution layers. Experiments show that FPDeep has good scalability to a large number of FPGAs, with the limiting factor being the FPGA-to-FPGA bandwidth. Using six transceivers per FPGA, FPDeep shows linearity up to 60 FPGAs. We evaluate energy efficiency in GOPs/J and find that FPDeep provides up to 3.4 times higher energy efficiency than the Tesla K80 GPU.

Topics & Keywords

Advanced Neural Network Applications Advanced Memory and Neural Computing Adversarial Robustness in Machine Learning

UN Sustainable Development Goals

Affordable and clean energy

Publication Details

DOI: 10.1109/fccm.2018.00021

Field-Weighted Citation Impact: 6.06