1. Article purpose[edit | edit source]
This article describes how to run an inference on the STM32MPx using the STAI MPU C++ API. It is an example based on an image classification application. The unified architecture of the API allows deploying the same application on all the STM32MPx platforms.
Information |
This article provides a simple inferencing example using the STAI MPU C++ API. If you wish to explore all the functions provided by the API, please refer to the STAI MPU Cpp Reference. |
2. STAI MPU C++ API[edit | edit source]
STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite[1] and ONNX™[2]. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs.
In the next section we explore, with a basic image-classification example, how to inference your models on the board using the STAI MPU C++ API whether you are running an NBG, TFLite™ or ONNX™ model on either STM32MP2 series' boards or STM32MP1 series' boards .
3. Running an inference using the STAI MPU C++ API[edit | edit source]
3.1. Install runtime prerequisites on the target[edit | edit source]
After having configured the AI OpenSTLinux package, you can install the X-LINUX-AI components and the packages needed to run the example.
Then, we will need to install the API plugins required during runtime depending on the model format used for the inference:
- If you are using a TFLite™ model, please run the following command:
x-linux-ai -i stai-mpu-tflite
- If you are using an ONNX™ model, please run the following command:
x-linux-ai -i stai-mpu-ort
- If you are running your model on an STM32MP2 series' boards and using a NBG model, please run the following command:
x-linux-ai -i stai-mpu-ovx
Information |
The package stai-mpu-ovx is not available on STM32MP1 series' boards . The TFLite™ and ONNX™ runtimes supported by the API are running exclusively on CPU. |
3.2. Install and launch of the X-LINUX-AI SDK[edit | edit source]
First of all, the installation of the X-LINUX-AI SDK on your host machine is required to be able to cross-compile AI applications for STM32 boards.
Information |
The SDK environment setup script must be run once on each new working terminal on which you cross-compile. |
Once the OpenSTLinux SDK is installed, go to the installation directory and source the environment:
cd <working directory absolute path>/Developer-Package/SDK
source environment-setup-cortexa35-ostl-linux
export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa35-ostl-linux"
source environment-setup-cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi
export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi"
3.3. Write a simple NN inference C++ program[edit | edit source]
The example below shows how to load a NN model using the STAI MPU API, read input and output tensor information, access quantization parameters from model and run inference to get output prediction. We start by creating the following C++ source file and save it as stai_mpu_img_cls.cc in the sources/stai_mpu/examples directory:
#include "stai_mpu_network.h"
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <sys/time.h>
#include <fstream>
#include <chrono>
/////////////////////////////////////////////////
/// Helper function to read labels ///
/////////////////////////////////////////////////
std::vector<std::string> readLabels(const std::string& filename) {
std::vector<std::string> labels;
std::ifstream file(filename);
std::string line;
if (file.is_open()) {
while (std::getline(file, line)) {
labels.push_back(line);
}
file.close();
} else {
std::cerr << "Unable to open file: " << filename << std::endl;
}
return labels;
}
int main (int argc, char* argv[]){
if (argc != 4)
return 0;
/////////////////////////////////////////////////
/// Loading the model and metadata ///
/////////////////////////////////////////////////
std::string model_path = argv[1]; // .onnx or .tflite or .nb file
stai_mpu_network stai_model = stai_mpu_network(model_path, true);
/* Getting the number of input and output tensors */
int num_inputs = stai_model.get_num_inputs();
int num_outputs = stai_model.get_num_outputs();
/* Instanciating quantization parameters for per tensor quantization */
float input_scale, output_scale;
uint32_t input_zp, output_zp;
/* Reading the input and output tensors information */
std::vector<stai_mpu_tensor> input_infos = stai_model.get_input_infos();
std::vector<stai_mpu_tensor> output_infos = stai_model.get_output_infos();
std::vector<int> input_shape(input_infos[0].get_rank());
std::vector<int> output_shape(output_infos[0].get_rank());
/* Accessing and printing the input tensors informations */
for (int i = 0; i < num_inputs; i++) {
stai_mpu_tensor input_info = input_infos[i];
std::cout << "** Input node: " << i;
std::cout << " -Input name: " << input_info.get_name();
std::cout << " -Input dims: " << input_info.get_rank();
std::cout << " -Input type: " << input_info.get_dtype();
input_shape = input_info.get_shape();
std::cout << std::endl;
/* Reading quantization parameters from inputs */
if (input_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_STATIC_AFFINE){
/* Instanciating the stai_mpu_quant_params that contains all quant params */
stai_mpu_quant_params input_qparams = input_info.get_qparams();
/* Accessing quant params depending on quantization type, here static affine */
input_scale = input_qparams.static_affine.scale;
input_zp = input_qparams.static_affine.zero_point;
}
}
/* Accessing and printing the output tensors informations */
for (int i = 0; i < num_outputs; i++) {
stai_mpu_tensor output_info = output_infos[i];
std::cout << "** Output node: " << i;
std::cout << " -Output name: " << output_info.get_name();
std::cout << " -Output dims: " << output_info.get_rank();
std::cout << " -Output type: " << output_info.get_dtype();
output_shape = output_info.get_shape();
std::cout << std::endl;
/* Reading quantization parameters from outputs */
if (output_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_STATIC_AFFINE){
/* Instanciating the stai_mpu_quant_params that contains all quant params */
stai_mpu_quant_params output_qparams = output_info.get_qparams();
/* Accessing quant params depending on quantization type, here static_affine */
output_scale = output_qparams.static_affine.scale;
output_zp = output_qparams.static_affine.zero_point;
}
}
////////////////////////////////////////////////////
/// Pre-processing the Image ///
////////////////////////////////////////////////////
/* Read the labels file in to vector of labels */
std::string labels_path = argv[3]; // the labels file
std::vector<std::string> labels = readLabels(labels_path);
/* Prepare the input image with pre-processing */
std::string image_path = argv[2]; // the image .jpg file
cv::Mat img_bgr = cv::imread(image_path);
cv::Mat img_nn;
cv::Size size_nn(input_width, input_height);
cv::resize(img_bgr, img_nn, size_nn);
cv::cvtColor(img_nn, img_nn, cv::COLOR_BGR2RGB);
uint8_t* input_data = img_nn.data;
bool floating_model = false;
float input_mean = 127.5f;
float input_std = 127.5f;
///////////////////////////////////////////////////
/// Setting input and infer ///
///////////////////////////////////////////////////
int input_width = input_shape[1];
int input_height = input_shape[2];
int input_channels = input_shape[3];
auto size_in_bytes = input_height * input_width * input_channels;
/* Prepare the input tensors in uint8_t type */
uint8_t* input_tensor_int = new uint8_t[size_in_bytes];
float* input_tensor_f = new float[size_in_bytes];
if (input_infos[0].get_dtype() == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32
|| input_infos[0].get_dtype() == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT16)
floating_model = true;
/* Check if the model is in floating point and make normalization*/
if (floating_model) {
for (int i = 0; i < size_in_bytes; i++)
input_tensor_f[i] = (input_data[i] - input_mean) / input_std;
stai_model.set_input(0, input_tensor_f);
} else {
for (int i = 0; i < size_in_bytes; i++)
input_tensor_int[i] = input_data[i];
stai_model.set_input(0, input_tensor_int);
}
///////////////////////////////////////////////////
/// Run the inference ///
///////////////////////////////////////////////////
auto start = std::chrono::high_resolution_clock::now();
/* Run the inference */
stai_model.run();
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> duration = end - start;
std::cout << "Inference time: " << duration.count() * 1000 << "ms" << std::endl;
///////////////////////////////////////////////////
/// Reading and post-processing output ///
///////////////////////////////////////////////////
void* outputs_tensor = stai_model.get_output(0);
/* Read the number of output dimensions */
int output_dims = output_infos[0].get_rank();
/* Read the output data type */
stai_mpu_dtype output_dtype = output_infos[0].get_dtype();
/* Read the output shape */
output_shape = output_infos[0].get_shape();
int output_size = output_shape[output_dims-1];
std::vector<int> results_idx(5);
std::vector<float> results_accu(5);
/* Cast the data depending on the output data type */
if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32 || output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT16) {
float* output_data = static_cast<float*>(outputs_tensor);
/* Pick the index of the max value of predictions */
for (int i = 0; i < 5; i++) {
results_idx[i] = std::distance(&output_data[0],
std::max_element(&output_data[0], &output_data[output_size]));
results_accu[i] = output_data[results_idx[i]];
output_data[results_idx[i]] = 0;
}
/* Cast the data depending on the output data type */
} else if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_UINT8){
uint8_t* output_data = static_cast<uint8_t*>(outputs_tensor);
/* Pick the index of the max value of predictions */
for (int i = 0; i < 5; i++) {
results_idx[i] = std::distance(&output_data[0],
std::max_element(&output_data[0], &output_data[output_size]));
results_accu[i] = output_data[results_idx[i]] / 255.0;
output_data[results_idx[i]] = 0;
}
}
/* Check if the model used is an NBG and free memory to avoid memory leak*/
if (stai_model.get_backend_engine() == stai_mpu_backend_engine::STAI_MPU_OVX_NPU_ENGINE)
free(outputs_tensor);
for (int i = 0; i < 5; i++) {
std::cout << results_accu[i] << ": " << labels[results_idx[i]] << std::endl;
}
}
3.4. Create the Makefile[edit | edit source]
Create the following Makefile in the sources/stai_mpu/examples directory:
OPENCV_PKGCONFIG?="opencv4"
ARCHITECTURE?=""
TARGET_BIN = stai_mpu_img_cls
CXXFLAGS += -Wall $(shell pkg-config --cflags $(OPENCV_PKGCONFIG))
CXXFLAGS += -std=c++17 -O3
CXXFLAGS += -I$(SYSROOT)/usr/include/stai_mpu
LDFLAGS += -lpthread -lopencv_core -lopencv_imgproc -lopencv_imgcodecs
LDFLAGS += -lstai_mpu -ldl
SRCS = stai_mpu_img_cls.cc
OBJS = $(SRCS:.cc=.o)
all: $(TARGET_BIN)
$(TARGET_BIN): $(OBJS)
$(CXX) -o $@ $^ $(LDFLAGS)
$(OBJS): $(SRCS)
$(CXX) $(CXXFLAGS) -c $^
clean:
rm -rf $(OBJS) $(TARGET_BIN)
Information |
The runtime plugin libraries such as libstai_mpu_ovx.so, libstai_mpu_tflite.so or libstai_mpu_ort.so are loaded dynamically during runtime, so there is no need to use them in the linking of you application during build time. Only the -lstai_mpu linking flag is required to link with the STAI MPU C++ API. |
3.5. Download and prepare test data[edit | edit source]
First create the directory to store test data:
mkdir stai_mpu_cpp_example
Next download the models and the test pictures:
wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.nb wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.tflite https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.tflite wget -O stai_mpu_cpp_example/labels_imagenet_2012.txt https://raw.githubusercontent.com/STMicroelectronics/meta-st-x-linux-ai/refs/heads/main/recipes-samples/image-classification/models/files/labels_imagenet_2012.txt wget -O stai_mpu_cpp_example/bird.jpg https://farm3.staticflickr.com/8008/7523974676_40bbeef7e3_o.jpg wget -O stai_mpu_cpp_example/plant.jpg https://c2.staticflickr.com/1/62/184682050_db90d84573_o.jpg
Once you have downloaded the data and the labels files needed for the inference, it is time to cross-compile you STAI MPU C++ based application.
Information |
If you wish to run own NBG model from your quantized TFLite™ or ONNX™ , follow this article to convert your model to NBG format. |
3.6. Cross-compilation and launch[edit | edit source]
From an SDK sourced terminal, run the cross-compilation:
cd .. make
Once the compilation is finished, a binary file named stai_mpu_img_cls should be created.
Copy the binary file and the test data directory onto the board:
scp -r stai_mpu_cpp_example/ root@<board_ip>:/path/ scp stai_mpu_img_cls root@<board_ip>:/path/
Information |
The corresponding runtime plugin to your model should installed before running the binary. |
Connect to the board and launch the example:
./stai_mpu_img_cls stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb stai_mpu_cpp_example/bird.jpg stai_mpu_cpp_example/labels_imagenet_2012.txt
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.5 ** Input node: 0 -Input name: -Input dims: 4 -Input type: 4 ** Output node: 0 -Output name: -Output dims: 2 -Output type: 11 Inference time: 13.7656ms 0.972656: chickadee 0.0078125: junco 0.00390625: magpie 0: tench 0: tench
Where the max value index represents the index of the class detected and the value represents the confidence. On these particular pictures, the bird detected is a poecile atricapillus (black-capped chickadee) and the plant is a helianthus annuus (daisy). The index and the name of each class are available in the labels_imagenet_2012.txt stored in the stai_mpu_cpp_example directory.
4. References[edit | edit source]