Last edited 4 months ago

How to run inference using the STAI MPU Cpp API



1. Article purpose[edit | edit source]

This article describes how to run an inference on the STM32MPx using the STAI MPU C++ API. It is an example based on an image classification application. The unified architecture of the API allows deploying the same application on all the STM32MPx platforms.

2. STAI MPU C++ API[edit | edit source]

STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite[1] and ONNX™[2]. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs.
In the next section we explore, with a basic image-classification example, how to inference your models on the board using the STAI MPU C++ API whether you are running an NBG, TFLite™ or ONNX™ model on either STM32MP2 series' boards More info green.png or STM32MP1 series' boards More info green.png .

3. Running an inference using the STAI MPU C++ API[edit | edit source]

3.1. Install runtime prerequisites on the target[edit | edit source]

After having configured the AI OpenSTLinux package, you can install the X-LINUX-AI components and the packages needed to run the example.

Then, we will need to install the API plugins required during runtime depending on the model format used for the inference:

  • If you are using a TFLite™ model, please run the following command:
 x-linux-ai -i  stai-mpu-tflite
  • If you are using an ONNX™ model, please run the following command:
 x-linux-ai -i  stai-mpu-ort
  • If you are running your model on an STM32MP2 series' boards More info green.png and running and NBG model, please run the following command:
 x-linux-ai -i  stai-mpu-ovx

3.2. Install and launch of the X-LINUX-AI SDK[edit | edit source]

First of all, the installation of the X-LINUX-AI SDK on your host machine is required to be able to cross-compile AI applications for STM32 boards.

Once the OpenSTLinux SDK is installed, go to the installation directory and source the environment:

 cd <working directory absolute path>/Developer-Package/SDK
  • On STM32MP2 series' boards More info green.png:
  source environment-setup-cortexa35-ostl-linux
  export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa35-ostl-linux"
  • On STM32MP1 series' boards More info green.png:
  source environment-setup-cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi
  export SYSROOT="<working directory absolute path>/Developer-Package/SDK/sysroots/cortexa7t2hf-neon-vfpv4-ostl-linux-gnueabi"

3.3. Write a simple NN inference C++ program[edit | edit source]

The example below shows how to load a NN model using the STAI MPU API, read input and output tensor information, access quantization parameters from model and run inference to get output prediction. We start by creating the following C++ source file and save it as stai_mpu_img_cls.cc in the sources/stai_mpu/examples directory:

#include "stai_mpu_network.h"
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <sys/time.h>
#include <fstream>
#include <chrono>

/////////////////////////////////////////////////
///     Helper function to read labels        ///
/////////////////////////////////////////////////
std::vector<std::string> readLabels(const std::string& filename) {
    std::vector<std::string> labels;
    std::ifstream file(filename);
    std::string line;
    if (file.is_open()) {
        while (std::getline(file, line)) {
            labels.push_back(line);
        }
        file.close();
    } else {
        std::cerr << "Unable to open file: " << filename << std::endl;
    }
    return labels;
}

int main (int argc, char* argv[]){
    if (argc != 4)
        return 0;
    /////////////////////////////////////////////////
    ///     Loading the model and metadata        ///
    /////////////////////////////////////////////////
    std::string model_path = argv[1]; // .onnx or .tflite or .nb file
    stai_mpu_network stai_model = stai_mpu_network(model_path);

    /* Getting the number of input and output tensors */
    int num_inputs = stai_model.get_num_inputs();
    int num_outputs = stai_model.get_num_outputs();

    /* Instanciating quantization parameters for per tensor quantization */
    float input_scale, output_scale;
    uint32_t input_zp, output_zp;

    /* Reading the input and output tensors information  */
    std::vector<stai_mpu_tensor> input_infos = stai_model.get_input_infos();
    std::vector<stai_mpu_tensor> output_infos = stai_model.get_output_infos();
    std::vector<int> input_shape(input_infos[0].get_rank());
    std::vector<int> output_shape(output_infos[0].get_rank());

    /* Accessing and printing the input tensors informations  */
    for (int i = 0; i < num_inputs; i++) {
        stai_mpu_tensor input_info = input_infos[i];
        std::cout << "** Input node: " << i;
        std::cout << " -Input name: " << input_info.get_name();
        std::cout << " -Input dims: " << input_info.get_rank();
        std::cout << " -Input type: " << input_info.get_dtype();
        input_shape = input_info.get_shape();
        std::cout << std::endl;
        /* Reading quantization parameters from inputs */
        if (input_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_AFFINE_PER_TENSOR){
            /* Instanciating the stai_mpu_quant_params that contains all quant params */
            stai_mpu_quant_params input_qparams = input_info.get_qparams();
            /* Accessing quant params depending on quantization type, here affine_per_tensor */
            input_scale = input_qparams.affine_per_tensor.scale;
            input_zp = input_qparams.affine_per_tensor.zero_point;
        }
    }

    /* Accessing and printing the output tensors informations  */
    for (int i = 0; i < num_outputs; i++) {
        stai_mpu_tensor output_info = output_infos[i];
        std::cout << "** Output node: " << i;
        std::cout << " -Output name: " << output_info.get_name();
        std::cout << " -Output dims: " << output_info.get_rank();
        std::cout << " -Output type: " << output_info.get_dtype();
        output_shape = output_info.get_shape();
        std::cout << std::endl;
        /* Reading quantization parameters from outputs */
        if (output_info.get_qtype() == stai_mpu_qtype::STAI_MPU_QTYPE_AFFINE_PER_TENSOR){
            /* Instanciating the stai_mpu_quant_params that contains all quant params */
            stai_mpu_quant_params output_qparams = output_info.get_qparams();
            /* Accessing quant params depending on quantization type, here affine_per_tensor */
            output_scale = output_qparams.affine_per_tensor.scale;
            output_zp = output_qparams.affine_per_tensor.zero_point;
        }
    }

    ////////////////////////////////////////////////////
    ///           Pre-processing the Image           ///
    ////////////////////////////////////////////////////
    /* Read the labels file in to vector of labels */
    std::string labels_path = argv[3]; // the labels file
    std::vector<std::string> labels = readLabels(labels_path);

    /* Prepare the input image with pre-processing */
    std::string image_path = argv[2]; // the image .jpg file
    cv::Mat img_bgr = cv::imread(image_path);
    cv::Mat img_nn;
    cv::Size size_nn(input_width, input_height);
    cv::resize(img_bgr, img_nn, size_nn);
    cv::cvtColor(img_nn, img_nn, cv::COLOR_BGR2RGB);
    uint8_t* input_data = img_nn.data;
    bool floating_model = false;
    float input_mean = 127.5f;
    float input_std = 127.5f;

    ///////////////////////////////////////////////////
    ///         Setting input and infer             ///
    ///////////////////////////////////////////////////
    int input_width = input_shape[1];
    int input_height = input_shape[2];
    int input_channels = input_shape[3];
    auto size_in_bytes = input_height * input_width * input_channels;

    /* Prepare the input tensors in uint8_t type */
    uint8_t* input_tensor_int = new uint8_t[size_in_bytes];
    float* input_tensor_f = new float[size_in_bytes];
    if (input_infos[0].get_dtype() == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32)
        floating_model = true;

    /* Check if the model is in floating point and make normalization*/
    if (floating_model) {
        for (int i = 0; i < size_in_bytes; i++)
            input_tensor_f[i] = (input_data[i] - input_mean) / input_std;
        stai_model.set_input(0, input_tensor_f);
    } else {
        for (int i = 0; i < size_in_bytes; i++)
            input_tensor_int[i] = input_data[i];
        stai_model.set_input(0, input_tensor_int);
    }

    ///////////////////////////////////////////////////
    ///             Run the inference               ///
    ///////////////////////////////////////////////////
    auto start = std::chrono::high_resolution_clock::now();
    /* Run the inference */
    stai_model.run();
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = end - start;
    std::cout << "Inference time: " << duration.count() * 1000 << "ms" << std::endl;

    ///////////////////////////////////////////////////
    ///      Reading and post-processing output     ///
    ///////////////////////////////////////////////////
    void* outputs_tensor = stai_model.get_output(0);
    /* Read the number of output dimensions */
    int output_dims = output_infos[0].get_rank();
    /* Read the output data type  */
    stai_mpu_dtype output_dtype = output_infos[0].get_dtype();
    /* Read the output shape  */
    output_shape = output_infos[0].get_shape();
    int output_size = output_shape[output_dims-1];
    std::vector<int> results_idx(5);
    std::vector<float> results_accu(5);
    /* Cast the data depending on the output data type */
    if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32 || output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT16) {
        float* output_data = static_cast<float*>(outputs_tensor);
        /* Pick the index of the max value of predictions */
        for (int i = 0; i < 5; i++) {
            results_idx[i] = std::distance(&output_data[0],
                        std::max_element(&output_data[0], &output_data[output_size]));
            results_accu[i] = output_data[results_idx[i]];
            output_data[results_idx[i]] = 0;
        }
    /* Cast the data depending on the output data type */
    } else if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_UINT8){
        uint8_t* output_data = static_cast<uint8_t*>(outputs_tensor);
        /* Pick the index of the max value of predictions */
        for (int i = 0; i < 5; i++) {
            results_idx[i] = std::distance(&output_data[0],
                        std::max_element(&output_data[0], &output_data[output_size]));
            results_accu[i] = output_data[results_idx[i]] / 255.0;
            output_data[results_idx[i]] = 0;
        }
    }
    /* Check if the model used is an NBG and free memory */
    if (stai_model.get_backend_engine() == stai_mpu_backend_engine::STAI_MPU_OVX_NPU_ENGINE)
        free(outputs_tensor);
    for (int i = 0; i < 5; i++) {
        std::cout << results_accu[i] << ": "  << labels[results_idx[i]] << std::endl;
    }
}

3.4. Create the Makefile[edit | edit source]

Create the following Makefile in the sources/stai_mpu/examples directory:

OPENCV_PKGCONFIG?="opencv4"
ARCHITECTURE?=""
TARGET_BIN = stai_mpu_img_cls
CXXFLAGS += -Wall $(shell pkg-config --cflags $(OPENCV_PKGCONFIG))
CXXFLAGS += -std=c++17 -O3
CXXFLAGS += -I$(SYSROOT)/usr/include/stai_mpu

LDFLAGS += -lpthread -lopencv_core -lopencv_imgproc -lopencv_imgcodecs
LDFLAGS += -lstai_mpu -ldl

SRCS = stai_mpu_img_cls.cc
OBJS = $(SRCS:.cc=.o)

all: $(TARGET_BIN)

$(TARGET_BIN): $(OBJS)
$(CXX)  -o $@ $^ $(LDFLAGS)

$(OBJS): $(SRCS)
$(CXX) $(CXXFLAGS) -c $^

clean:
rm -rf $(OBJS) $(TARGET_BIN)


3.5. Download and prepare test data[edit | edit source]

First create the directory to store test data:

 mkdir stai_mpu_cpp_example

Next download the models and the test pictures:

 wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.nb
 wget -O stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.tflite https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.tflite
 wget -O stai_mpu_cpp_example/labels_imagenet_2012.txt https://raw.githubusercontent.com/STMicroelectronics/meta-st-x-linux-ai/refs/heads/main/recipes-samples/image-classification/models/files/labels_imagenet_2012.txt
 wget -O stai_mpu_cpp_example/bird.jpg https://farm3.staticflickr.com/8008/7523974676_40bbeef7e3_o.jpg
 wget -O stai_mpu_cpp_example/plant.jpg https://c2.staticflickr.com/1/62/184682050_db90d84573_o.jpg

Once you have the data and the labels files needed for inferencing downloaded and ready, it is time to cross-compile you STAI MPU C++ based application.

3.6. Cross-compilation and launch[edit | edit source]

From an SDK sourced terminal, run the cross-compilation:

 cd ..
 make

Once the compilation is finished, a binary file named stai_mpu_img_cls should be created.
Copy the binary file and the test data directory onto the board:

 scp -r stai_mpu_cpp_example/ root@<board_ip>:/path/
 scp stai_mpu_img_cls root@<board_ip>:/path/

Connect to the board and launch the example:

 ./stai_mpu_img_cls stai_mpu_cpp_example/mobilenet_v2_1.0_224_int8_per_tensor.nb stai_mpu_cpp_example/bird.jpg stai_mpu_cpp_example/labels_imagenet_2012.txt
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.5
** Input node: 0 -Input name:  -Input dims: 4 -Input type: 4
** Output node: 0 -Output name:  -Output dims: 2 -Output type: 11
Inference time: 13.7656ms
0.972656: chickadee
0.0078125: junco
0.00390625: magpie
0: tench
0: tench

Where the max value index represents the index of the class detected and the value represents the confidence. On these particular pictures, the bird detected is a poecile atricapillus (black-capped chickadee) and the plant is a helianthus annuus (daisy). The index and the name of each class are available in the labels_imagenet_2012.txt stored in the stai_mpu_cpp_example directory.

4. References[edit | edit source]