Skip to main content
Video s3
    Details
    Poster
    Presenter(s)
    Yan Xiong Headshot
    Display Name
    Yan Xiong
    Affiliation
    Affiliation
    Arizona State University
    Country
    Abstract

    Recent work on neural network architectures has focused on bridging the gap between performance/efficiency and programmability. We consider implementations of three popular neural networks, ResNet, AlexNet and ASGD weight-dropped Recurrent Neural Network (AWD RNN) on a low power programmable architecture. Transformer consists of light-weight cores interconnected by caches and crossbars that support run-time reconfiguration between shared and private cache mode operations. We present efficient implementations of key neural network kernels and evaluate the performance of each kernel when operating in different cache modes. The best-performing cache modes are then used in the implementation of the end-to-end network. Simulation results show superior performance with ResNet, AlexNet and AWD RNN achieving 196.0 GOPS/W, 150.5 GOPS/W and 120.7 GOPS/W, respectively, in the 14 nm technology node.

    Slides