How to run inference using the STAI MPU Cpp API

Applicable for

STM32MP13x lines, STM32MP15x lines, STM32MP25x lines

1. Article purpose[edit | edit source]↑

This article describes how to run an inference on the STM32MPx using the STAI MPU C++ API. It is an example based on an image classification application. The unified architecture of the API allows deploying the same application on all the STM32MPx platforms.

Information

This article provides a simple inferencing example using the STAI MPU C++ API. If you wish to explore all the functions provided by the API, please refer to the STAI MPU C++ Reference.

2. STAI MPU C++ API[edit | edit source]↑

STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite^[1] and ONNX™^[2]. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs.
In the next section we explore, with a basic image-classification example, how to inference your models on the board using the STAI MPU C++ API whether you are running an NBG, TFLite™ or ONNX™ model on either STM32MP2 series' boards or STM32MP1 series' boards .

Important

The STM32MP1 series' boards come with no AI hardware acceleration chip, so inferencing an NBG model on these platforms will result in an error.

3. Running an inference using the STAI MPU C++ API[edit | edit source]↑

3.1. Install runtime prerequisites on the target[edit | edit source]↑

After having configured the AI OpenSTLinux package, you can install the X-LINUX-AI components and the packages needed to run the example.

Then, we will need to install the API plugins required during runtime depending on the model format used for the inference:

If you are using a TFLite™ model, please run the following command:

 x-linux-ai -i  stai-mpu-tflite

If you are using an ONNX™ model, please run the following command:

 x-linux-ai -i  stai-mpu-ort

If you are running your model on an STM32MP2 series' boards and running and NBG model, please run the following command:

 x-linux-ai -i  stai-mpu-ovx

Information

The package stai-mpu-ovx is not available on STM32MP1 series' boards . The TFLite™ and ONNX™ runtimes supported by the API are running exclusively on CPU.

Warning

The software package is provided AS IS, and by downloading it, you agree to be bound to the terms of the software license agreement (SLA0048). The detailed content licenses can be found here.

3.2. Install and launch of the X-LINUX-AI SDK[edit | edit source]↑

First of all, the installation of the X-LINUX-AI SDK on your host machine is required to be able to cross-compile AI applications for STM32 boards.

Information

The SDK environment setup script must be run once on each new working terminal on which you cross-compile.

Once the OpenSTLinux SDK is installed, go to the installation directory and source the environment:

 cd <working directory absolute path>/Developer-Package/SDK

On STM32MP2 series' boards :

  source environment-setup-cortexa35-ostl-linux
  export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa35-ostl-linux"

On STM32MP1 series' boards :

  source environment-setup-cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi
  export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi"

Warning

The path to the SDK must be adapted depending of your working configuration (board used).

3.3. Write a simple NN inference C++ program[edit | edit source]↑

The example below shows how to load a NN model using the STAI MPU API, read input and output tensor information, access quantization parameters from model and run inference to get output prediction. We start by creating the following C++ source file and save it as stai_mpu_img_cls.cc in the sources/stai_mpu/examples directory:

#include "stai_mpu_network.h"
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <sys/time.h>
#include <fstream>
#include <chrono>

/////////////////////////////////////////////////
///     Helper function to read labels        ///
/////////////////////////////////////////////////
std::vector<std::string> readLabels(const std::string& filename) {
    std::vector<std::string> labels;
    std::ifstream file(filename);
    std::string line;
    if (file.is_open()) {
        while (std::getline(file, line)) {
            labels.push_back(line);
        }
        file.close();
    } else {
        std::cerr << "Unable to open file: " << filename << std::endl;
    }
    return labels;
}

int main (int argc, char* argv[]){
    if (argc != 4)
        return 0;
    /////////////////////////////////////////////////
    ///     Loading the model and metadata        ///
    /////////////////////////////////////////////////
    std::string model_path = argv[1]; // .onnx or .tflite or .nb file
    stai_mpu_network stai_model = stai_mpu_network(model_path);

    /* Getting the number of input and output tensors */
    int num_inputs = stai_model.get_num_inputs();
    int num_outputs = stai_model.get_num_outputs();

    /* Instanciating quantization parameters for per tensor quantization */
    float input_scale, output_scale;
    uint32_t input_zp, output_zp;

    /* Reading the input and output tensors information  */
    std::vector<stai_mpu_tensor> input_infos = stai_model.get_input_infos();
    std::vector<stai_mpu_tensor> output_infos = stai_model.get_output_infos();
    std::vector<int> input_shape(input_infos[0].get_rank());
    std::vector<int> output_shape(output_infos[0].get_rank());

    /* Accessing and printing the input tensors informations  */
    for (int i = 0; i < num_inputs; i++) {
        stai_mpu_tensor input_info = input_infos[i];
        std::cout << "** Input node: " << i;
        std::cout << " -Input name: " << input_info.get_name();
        std::cout << " -Input dims: " << input_info.get_rank();
        std::cout << " -Input type: " << input_info.get_dtype();
        input_shape = input_info.get_shape();
        std::cout << std::endl;
        /* Reading quantization parameters from inputs */
        if (input_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_AFFINE_PER_TENSOR){
            /* Instanciating the stai_mpu_quant_params that contains all quant params */
            stai_mpu_quant_params input_qparams = input_info.get_qparams();
            /* Accessing quant params depending on quantization type, here affine_per_tensor */
            input_scale = input_qparams.affine_per_tensor.scale;
            input_zp = input_qparams.affine_per_tensor.zero_point;
        }
    }

    /* Accessing and printing the output tensors informations  */
    for (int i = 0; i < num_outputs; i++) {
        stai_mpu_tensor output_info = output_infos[i];
        std::cout << "** Output node: " << i;
        std::cout << " -Output name: " << output_info.get_name();
        std::cout << " -Output dims: " << output_info.get_rank();
        std::cout << " -Output type: " << output_info.get_dtype();
        output_shape = output_info.get_shape();
        std::cout << std::endl;
        /* Reading quantization parameters from outputs */
        if (output_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_AFFINE_PER_TENSOR){
            /* Instanciating the stai_mpu_quant_params that contains all quant params */
            stai_mpu_quant_params output_qparams = output_info.get_qparams();
            /* Accessing quant params depending on quantization type, here affine_per_tensor */
            output_scale = output_qparams.affine_per_tensor.scale;
            output_zp = output_qparams.affine_per_tensor.zero_point;
        }
    }

    ////////////////////////////////////////////////////
    ///           Pre-processing the Image           ///
    ////////////////////////////////////////////////////
    /* Read the labels file in to vector of labels */
    std::string labels_path = argv[3]; // the labels file
    std::vector<std::string> labels = readLabels(labels_path);

    /* Prepare the input image with pre-processing */
    std::string image_path = argv[2]; // the image .jpg file
    cv::Mat img_bgr = cv::imread(image_path);
    cv::Mat img_nn;
    cv::Size size_nn(input_width, input_height);
    cv::resize(img_bgr, img_nn, size_nn);
    cv::cvtColor(img_nn, img_nn, cv::COLOR_BGR2RGB);
    uint8_t* input_data = img_nn.data;
    bool floating_model = false;
    float input_mean = 127.5f;
    float input_std = 127.5f;

    ///////////////////////////////////////////////////
    ///         Setting input and infer             ///
    ///////////////////////////////////////////////////
    int input_width = input_shape[1];
    int input_height = input_shape[2];
    int input_channels = input_shape[3];
    auto size_in_bytes = input_height * input_width * input_channels;

    /* Prepare the input tensors in uint8_t type */
    uint8_t* input_tensor_int = new uint8_t[size_in_bytes];
    float* input_tensor_f = new float[size_in_bytes];
    if (input_infos[0].get_dtype() == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32)
        floating_model = true;

    /* Check if the model is in floating point and make normalization*/
    if (floating_model) {
        for (int i = 0; i < size_in_bytes; i++)
            input_tensor_f[i] = (input_data[i] - input_mean) / input_std;
        stai_model.set_input(0, input_tensor_f);
    } else {
        for (int i = 0; i < size_in_bytes; i++)
            input_tensor_int[i] = input_data[i];
        stai_model.set_input(0, input_tensor_int);
    }

    ///////////////////////////////////////////////////
    ///             Run the inference               ///
    ///////////////////////////////////////////////////
    auto start = std::chrono::high_resolution_clock::now();
    /* Run the inference */
    stai_model.run();
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = end - start;
    std::cout << "Inference time: " << duration.count() * 1000 << "ms" << std::endl;

    ///////////////////////////////////////////////////
    ///      Reading and post-processing output     ///
    ///////////////////////////////////////////////////
    void* outputs_tensor = stai_model.get_output(0);
    /* Read the number of output dimensions */
    int output_dims = output_infos[0].get_rank();
    /* Read the output data type  */
    stai_mpu_dtype output_dtype = output_infos[0].get_dtype();
    /* Read the output shape  */
    output_shape = output_infos[0].get_shape();
    int output_size = output_shape[output_dims-1];
    std::vector<int> results_idx(5);
    std::vector<float> results_accu(5);
    /* Cast the data depending on the output data type */
    if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32 || output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT16) {
        float* output_data = static_cast<float*>(outputs_tensor);
        /* Pick the index of the max value of predictions */
        for (int i = 0; i < 5; i++) {
            results_idx[i] = std::distance(&output_data[0],
                        std::max_element(&output_data[0], &output_data[output_size]));
            results_accu[i] = output_data[results_idx[i]];
            output_data[results_idx[i]] = 0;
        }
    /* Cast the data depending on the output data type */
    } else if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_UINT8){
        uint8_t* output_data = static_cast<uint8_t*>(outputs_tensor);
        /* Pick the index of the max value of predictions */
        for (int i = 0; i < 5; i++) {
            results_idx[i] = std::distance(&output_data[0],
                        std::max_element(&output_data[0], &output_data[output_size]));
            results_accu[i] = output_data[results_idx[i]] / 255.0;
            output_data[results_idx[i]] = 0;
        }
    }
    /* Check if the model used is an NBG and free memory */
    if (stai_model.get_backend_engine() == stai_mpu_backend_engine::STAI_MPU_OVX_NPU_ENGINE)
        free(outputs_tensor);
    for (int i = 0; i < 5; i++) {
        std::cout << results_accu[i] << ": "  << labels[results_idx[i]] << std::endl;
    }
}

Important

When using an NBG model, please pay attention to freeing the pointer of the output_tensor read from get_output function to avoid memory leaks issues.

3.4. Create the Makefile[edit | edit source]↑

Create the following Makefile in the sources/stai_mpu/examples directory:

OPENCV_PKGCONFIG?="opencv4"
ARCHITECTURE?=""
TARGET_BIN = stai_mpu_img_cls
CXXFLAGS += -Wall $(shell pkg-config --cflags $(OPENCV_PKGCONFIG))
CXXFLAGS += -std=c++17 -O3
CXXFLAGS += -I$(SYSROOT)/usr/include/stai_mpu

LDFLAGS += -lpthread -lopencv_core -lopencv_imgproc -lopencv_imgcodecs
LDFLAGS += -lstai_mpu -ldl

SRCS = stai_mpu_img_cls.cc
OBJS = $(SRCS:.cc=.o)

all: $(TARGET_BIN)

$(TARGET_BIN): $(OBJS)
$(CXX)  -o $@ $^ $(LDFLAGS)

$(OBJS): $(SRCS)
$(CXX) $(CXXFLAGS) -c $^

clean:
rm -rf $(OBJS) $(TARGET_BIN)

Information

The runtime plugin libraries such as libstai_mpu_ovx.so, libstai_mpu_tflite.so or libstai_mpu_ort.so are loaded dynamically during runtime, so there is no need to use them in the linking of you application during build time. Only the -lstai_mpu linking flag is required to link with the STAI MPU C++ API.

3.5. Download and prepare test data[edit | edit source]↑

First create the directory to store test data:

 mkdir stai_mpu_cpp_example

Next download the models and the test pictures:

 wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.nb
 wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.tflite https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.tflite
 wget -O stai_mpu_cpp_example/labels_imagenet_2012.txt https://raw.githubusercontent.com/STMicroelectronics/meta-st-x-linux-ai/refs/heads/main/recipes-samples/image-classification/models/files/labels_imagenet_2012.txt
 wget -O stai_mpu_cpp_example/bird.jpg https://farm3.staticflickr.com/8008/7523974676_40bbeef7e3_o.jpg
 wget -O stai_mpu_cpp_example/plant.jpg https://c2.staticflickr.com/1/62/184682050_db90d84573_o.jpg

Once you have the data and the labels files needed for inferencing downloaded and ready, it is time to cross-compile you STAI MPU C++ based application.

Information

If you wish to run own NBG model from your quantized TFLite™ or ONNX™ , follow this article to convert your model to NBG format.

3.6. Cross-compilation and launch[edit | edit source]↑

From an SDK sourced terminal, run the cross-compilation:

 cd ..
 make

Once the compilation is finished, a binary file named stai_mpu_img_cls should be created.
Copy the binary file and the test data directory onto the board:

 scp -r stai_mpu_cpp_example/ root@<board_ip>:/path/
 scp stai_mpu_img_cls root@<board_ip>:/path/

Information

The corresponding runtime plugin to your model should installed before running the binary.

Connect to the board and launch the example:

 ./stai_mpu_img_cls stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb stai_mpu_cpp_example/bird.jpg stai_mpu_cpp_example/labels_imagenet_2012.txt

Loading dynamically: /usr/lib/libstai_mpu_ovx.so.5
** Input node: 0 -Input name:  -Input dims: 4 -Input type: 4
** Output node: 0 -Output name:  -Output dims: 2 -Output type: 11
Inference time: 13.7656ms
0.972656: chickadee
0.0078125: junco
0.00390625: magpie
0: tench
0: tench

Where the max value index represents the index of the class detected and the value represents the confidence. On these particular pictures, the bird detected is a poecile atricapillus (black-capped chickadee) and the plant is a helianthus annuus (daisy). The index and the name of each class are available in the labels_imagenet_2012.txt stored in the stai_mpu_cpp_example directory.

4. References[edit | edit source]↑

[tensorflowlite_url-1] TensorFlow™ Lite

[onnx_url-2] ONNX™

[1]

[2]