ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator

Abstract

In this paper, we present our ReRAM-Sharing, a software-hardware co-design scheme, to explore fined-grained weight sharing compression for ReRAM-based accelerators. Due to the limits of ADC bandwidth and ADC numbers, DNN computation on ReRAM crossbars is conducted in a smaller granularity, denoted as Operation Unit (OU). Motivated by this, we propose ReRAM-Sharing algorithm that applies weight-sharing on OU-level to exploit fine-grained sparsity. Our proposed ReRAM-Sharing reduces the redundancy of DNNs while maintaining the representation capability. Moreover, as the ReRAM-Sharing algorithm is orthogonal with the traditional pruning techniques, we can integrate them to shrink NN model size further. We then propose the ReRAM-Sharing architecture, which introduces the index table and adders to the traditional ReRAM-based accelerator, to support the ReRAM-Sharing algorithm.