NeuroPilot-Micro is a framework for executing ultra-low power, machine learning applications on microcontrollers (MCUs). It includes a micro-runtime as well as op libraries that can optimize machine learning inference to be executed efficiently with low-power consumption (mW), limited memory capacity (MB), and low computing power (in MHz).
Tiny Machine Learning (TinyML) is an emerging software technology that enables machines to learn and become smarter overtime. The ability to deploy machine learning algorithms onto small embedded devices, such as MT3620 MCU, has for the first time allowed vision, audio and sensor based AI Applications to operate in an always-on mode with ultra-low power consumption.
The ability to incorporate machine-learning models into microcontroller applications will enable a whole new ecosystem. However, the following challenges remain:
Machine learning software algorithms must be small enough to fit into microcontrollers and the reference results must still yield good accuracy.
The hardware/software design
Well-integrated software design that leverages the limited hardware resources is mandatory. Unlike cloud servers, microcontrollers usually offer limited resources such as smaller memory sizes, lower energy, and fewer computational capabilities.
To address the above challenges and realize the benefits of AI applications, MediaTek has created a Tiny ML framework offering called NeuroPilot-Micro. The main objective of NeuroPilot-Micro is to simplify and optimize the deployment of custom, application-specific machine learning models on microcontrollers. This, coupled with MediaTek AIoT solutions, can enable more streamlined and optimized high-performing products.
The above video shows some always-on applications including object detection; handwriting recognition; shelf detection; person detection; and dual-core processing demo. These have been implemented using MediaTek's NeuroPilot-Micro framework and deployed on the MT3620, a high performance, tri-core AIoT microcontroller.
While Flash memory is normally used in microcontrollers, one of its limitations is that the access speed is generally slow. Response times for detection can often take several seconds if the machine learning models directly run using Flash memory. With NeuroPilot-Micro optimizations deployed on the MT3620, we can avoid this problem. As shown in the video above, response times show great improvement, decreasing the response time from several seconds to approximately 1 second.
TensorFlow Lite for Microcontrollers (TFLm) is the input model for NeuroPilot-Micro. If you use other AI frameworks, like TensorFlow, Caffe, ONNX, etc., they will need to be converted into the TensorFlow Lite (TFLite) model. Some useful converter tools are available as open source or third-party software. When considering the development activities on microcontrollers, though, the TFLm data model supports fewer operations than TensorFlow Lite.
The TFLm NN model training and inference flow is depicted in the above Figure.
The figure above illustrates the working flow of NeuroPilot-Micro. The box labeled "Edge" is the NeuroPilot inference framework, and the box labeled "Tiny" is the NeuroPilot-Micro inference framework, where NeuroPilot-Micro framework leverages some functionality from the NeuroPilot SDK.
The components in the orange colored boxes are the major functions provided by NeuroPilot-Micro framework.
- NeuroPilot ML Kit (Quantization, Pruning)
For model optimization, the TensorFlow model can be quantized into a fixed-point format or the model size can be reduced before deployment with pruning tools. The NeuroPilot ML Kit tool includes the Model Converter, Quantization Tool and the Network Reduction Tool.
The micro-runtime offers memory and low power optimization mechanisms to enable the trained inference to run quickly and efficiently on microcontrollers.
- Optimized library (Optimized lib)
The NeuroPilot-Micro SDK supports optimized operations by different microcontrollers. The MT3620 platform includes an ARM CMSIS NN Software Library that delivers optimized performance.