Search for a command to run...
Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$7197\times $ </tex-math></inline-formula> over a CPU and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$17.81\times $ </tex-math></inline-formula> over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5\times $ </tex-math></inline-formula> over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.3\times $ </tex-math></inline-formula> over AWB-GCN (which runs only GCNs), despite using <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3.4\times $ </tex-math></inline-formula> fewer processing units.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume 42, Issue 12, pp. 4844-4857