Skip to main content
Video s3
    Details
    Author(s)
    Display Name
    Chen Shen
    Affiliation
    Affiliation
    VIRTUS, Nanyang Technological University
    Display Name
    Junran Pu
    Affiliation
    Affiliation
    Nanyang Technological University
    Display Name
    Yi Sheng Chong
    Affiliation
    Affiliation
    Nanyang Technological University
    Display Name
    Zhongyi Zhang
    Affiliation
    Affiliation
    VIRTUS, Nanyang Technological University
    Display Name
    Wang Ling Goh
    Affiliation
    Affiliation
    Nanyang Technological University
    Display Name
    Bin Zhao
    Affiliation
    Affiliation
    Institute of Materials Research and Engineering, Agency for Science, Technology and Research
    Display Name
    Anh Tuan Do
    Affiliation
    Affiliation
    Agency for Science, Technology and Research
    Display Name
    Yuan Gao
    Affiliation
    Affiliation
    Institute of Microelectronics, Agency for Science, Technology and Research
    Abstract

    This paper presents an ultra-low power keyword spotting (KWS) chip for Artificial Intelligence of Thing (AIoT) device’s always-on ambient sensing function. The core KWS engine is based on a spiking convolutional neural network (SCNN) model for its attractive features of sparse activation and addition only operations inside the spiking neurons. The proposed SCNN model improves the existing frame-wise incremental computation structure by adding a spike processing unit (SPU) to reduce the computation cycles. The power and latency of the whole system is reduced by 16.5% and 43.2% respectively. Extensive network quantization reduced the weight bit-length to 4-bit and only 1-bit activation is required. The chip also supports power gating by an energy- based voice activity detection (VAD) module to further reduce power consumption in random and sparse event (RSE) scenarios. Full chip simulation results show that the chip consumes only 110nw with 2.15% False alarm rate and 3.00% False reject rate in a 10% voice event stream test. It achieves the state-of-art recognition accuracy of 99% and 96% for one and two keyword detection tasks.