An SRAM optimized approach for constant memory consumption and ultra-fast execution of ML classifiers on TinyML hardware
View/ Open
Date
2021-11-15Author
Sudharsan, Bharath
Yadav, Piyush
Breslin, John G.
Ali, Muhammad Intizar
Metadata
Show full item recordUsage
This item's downloads: 66 (view details)
Cited 10 times in Scopus (view citations)
Recommended Citation
Sudharsan, Bharath, Yadav, Piyush, Breslin, John G., & Ali, Muhammad Intizar. (2021). An SRAM optimized approach for constant memory consumption and ultra-fast execution of ML classifiers on TinyML hardware. Paper presented at the IEEE International Conference on Services Computing (SCC), Online Virtual Congress, 05 -11 September. doi:10.1109/SCC53864.2021.00045
Published Version
Abstract
With the introduction of ultra-low-power machine
learning (TinyML), IoT devices are becoming smarter as they are
driven by Machine Learning (ML) models. However, any increase
in the training data results in a linear increase in the space
complexity of the ML models. It is highly challenging to deploy
such ML models on IoT devices with limited memory (TinyML
hardware). To alleviate such memory issues, in this paper, we
present an SRAM-optimized classifier porting, stitching, and
efficient deployment approach. The proposed method enables
large classifiers to be comfortably executed on microcontroller
unit (MCU) based IoT devices and perform ultra-fast classifications while consuming 0 bytes of SRAM. We tested our
SRAM optimized approach by utilizing it to port and execute
7 dataset-trained classifiers on 7 popular MCU boards, and
report their inference time and memory (Flash and SRAM)
consumption. It is apparent from the experimental results that;
(i) the classifiers ported using our proposed approach are of
varied sizes but have constant SRAM consumption. Thus, the
approach enabled the deployment of larger ML classifier models
even on tiny Atmega328P MCU-based Arduino Nano, which has
only 8 kB SRAM; (ii) even the resource-constrained 8-bit MCUs
performed faster unit inference (in less than a millisecond) than
a NVIDIA Jetson Nano GPU and Raspberry Pi 4 CPU; (iii) the
majority of models produced 1-4x times faster inference results
in comparison with the models ported by the sklearn-porter,
m2cgen, and emlearn libraries.