Enabling tinyML by Expoliting Heterogenous Multi-Core AI Processing

Abstract

TinyML strives for powerful machine inference in resource-scarce distributed devices. To allow intelligent applications at ultra-low energy and low latency, one needs 1.) compact compute and memory structures; 2.) which are used at very high utilization. This has resulted in a wide variety of accelerator designs proposed in the SotA. However, it becomes increasingly clear that every intelligent edge device will need to be equipped with a diverse set of many heterogeneous co-processors, which allow to run every workload at the most compatible (combination of) accelerators. Moreover, by using multiple cores in parallel, streaming data between the cores, the required amount of on-chip memory and IO bandwidth can be reduced, leading to area, energy and latency savings. This talk will introduce the benefits and challenges of such heterogeneous ML systems, supported through practical examples for efficient deep inference.

Video Not Available

Enabling tinyML by Expoliting Heterogenous Multi-Core AI Processing