Details
![Arne Symons Headshot](https://confcats-catavault.s3.amazonaws.com/CATAVault/ieeecass/master/files/styles/cc_user_photo/s3/user-pictures/11111.jpg?h=f025acff&itok=Guro3cji)
- Affiliation
-
AffiliationKU Leuven
- Country
-
CountryBelgium
The scheduling or temporal mapping of a neural network (NN) on a given hardware (HW) accelerator strongly impacts its execution energy and latency. Unfortunately the mapping space is huge and varies a lot in function of the NN-HW combination. Many design space exploration (DSE) frameworks aim at automatically exploring this vast mapping space. Yet, SotA frameworks suffer from being slow (e.g. exhaustive search), inflexible across a wide range of HW architectures (e.g. no support for uneven mapping), or cannot guarantee global optimality (e.g. either from relying on user-defined constraints or on random sampling). Moreover, existing frameworks are typically unable to predict required CPU run-time and peak CPU memory requirements in advance and as such unable to trade-off search time with optimality in a deterministic manner. This work proposes LOMA, a fast auto-scheduling methodology through loop-order-based memory allocation, which overcomes above bottlenecks. LOMA's capabilities are demonstrated at scale in finding the optimal schedule of complete MobileNetV3, resp. NASNet.