Performance Walls in Machine Learning and Neuromorphic Systems

Abstract

At the fundamental level, an energy imbalance exists between training and inference in machine learning (ML) systems. While inference involves recall using a fixed or learned set of parameters that can be energy-optimized using compression and sparsification techniques, training involves searching over the entire set of parameters and hence requires repeated memorization, caching, pruning, and annealing. In this paper, we introduce three performance walls that determine the training energy efficiency, namely, the memory-wall, the update-wall, and the consolidation-wall. While the emerging compute-in-memory ML architectures can address the memory-wall bottleneck (or energy-dissipated due to repeated memory access) the approach is agnostic to energy-dissipated due to the number and precision required for the training updates (the update-wall) and information transfer between short-term and long-term memories (the consolidation-wall). To overcome these performance walls, we propose a learning-in-memory (LIM) paradigm that prescribes ML system memories with metaplasticity and whose thermodynamical properties match the physics and energetics of learning.