Search for a command to run...
In recent years, Convolutional Neural Networks (CNN) have been extensively used in machine learning algorithms related to images due to their exceptional accuracy. The multiplication-accumulation (MAC) in convolutional layers makes them computationally expensive, and these layers account for 90% of the total computation. Several researchers have taken advantage of pruning the weights and activations to overcome high computation bandwidth. These techniques are divided into two categories: 1) unstructured pruning of the weights can achieve heavy pruning, but in the process, it unbalances data access and computation processes. Consequently, compression coding for indexing non-zero data increases, which causes much more memory volume. 2) Structured pruning by the specified pattern prunes the weights and regularizes both computations and memory access but does not support high pruning amounts compared to unstructured pruning. In this paper, we proposed Quasi Structured Pruning (QSP) that profits from the high pruning ratio of unstructured pruning. The load balancing property in structured pruning has also been included in the QSP scheme. Implementation results of our accelerator using VGG16 on a Xilinx XC7Z100 indicate 616.94 GOP/s and 1437.7 GOP/s at just 7.8 watts power consumption for dense and sparse mode, respectively. Experimental results show that the accelerator is 1.38×, 1.1×, 2.77×, 2.87×, 1.91×, and 1.18× better in terms of DSP efficiency than previous accelerators in dense mode. As well, our accelerator has achieved 1.9×, 2.92×, 1.67×, and 1.11× higher DSP efficiency besides 4.52×, 5.31×, 10.38×, and 1.1× better energy efficiency than other state-of-the-art sparse accelerators.