Circuit Techniques for Efficient Acceleration of Deep Neural Network Inference with Analog-AI

Abstract

By performing parallelized multiply–accumulate operations in the analog domain at the location of weight data, crossbar-array “tiles” of analog non-volatile memory (NVM) devices can potentially accelerate the forward-inference of Deep Neural Networks (DNNs). Such systems will need to 1) achieve high neural network classification accuracies, indistinguishable from those achieved with conventional approaches, and 2) be highly-efficient when performing analog-AI operations at each tile, and when conveying the resulting neuron-excitation data vectors from tile to tile. Towards the first goal, we describe row-wise Phase-Change Memory (PCM) programming schemes for rapid yet accurate weight-programming. Towards the second, we describe micro-architectural design ideas including source-follower-based readout, array segmentation, and transmit-by-duration.