Last edited one week ago

How to run inference using the STAI MPU Python API

Applicable for STM32MP13x lines, STM32MP15x lines, STM32MP25x lines


1. Article purpose[edit | edit source]

This article describes how to run an inference on the STM32MPx using the STAI MPU Python API. It is an example based on an image classification application. The unified architecture of the API allows deploying the same application on all the STM32MPx platforms using several model formats.

Info white.png Information
This article provides a simple inferencing example using the STAI MPU Python API. If you wish to explore more of the functions provided by the API, please refer to the STAI MPU Python Reference.

2. STAI MPU Python API[edit | edit source]

STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite[1] and ONNX™[2]. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs.
In the next section we explore, with a basic image-classification example, how to inference your models on the board using the STAI MPU Python API whether you are running a NBG, a TFLite™ or a ONNX™ on either STM32MP2 series' boards More info green.png or STM32MP1 series' boards More info green.png.

Warning DB.png Important
The STM32MP1 series' boards More info green.png come with no AI hardware acceleration chip. Therefore, inferencing an NBG model on these platforms will result in an error.

3. Running an inference using the STAI MPU Python API[edit | edit source]

3.1. Install runtime prerequisites on the target[edit | edit source]

After having configured the AI OpenSTLinux package, you can install the X-LINUX-AI components and the packages needed to run the example. First, we start by installing main packages needed for image processing which are Python Numpy, Python OpenCV.

Board $> apt-get install python3-numpy python3-opencv

Start by installing the python3-libstai module by running the following command:

 x-linux-ai -i  python3-libstai

Then, we will need to install the API plugins required during runtime depending on the model format used for the inference:

  • If you are using a TFLite™ model, please run the following command:
 x-linux-ai -i  stai-mpu-tflite
  • If you are using an ONNX™ model, please run the following command:
 x-linux-ai -i  stai-mpu-ort
  • If you are running your model on STM32MP2 series' boards More info green.png and running and NBG model, please run the following command:
 x-linux-ai -i  stai-mpu-ovx
Info white.png Information
It is important to mention that the package stai-mpu-ovx is not available on STM32MP1 series' boards More info green.png and the TFLite™ and ONNX™ runtimes supported by the API are running exclusively on CPU.
Warning white.png Warning
The software package is provided AS IS, and by downloading it, you agree to be bound to the terms of the software license agreement (SLA0048). The detailed content licenses can be found here.

3.2. Write a simple NN inference Python script[edit | edit source]

The example below shows how to load a NN model using the STAI MPU Python API, read input and output tensor information, access quantization parameters from model and run inference to get output prediction. We start by creating the following Python source file and save it as stai_mpu_img_cls.py in the sources/stai_mpu/examples directory:

from stai_mpu import stai_mpu_network
from numpy.typing import NDArray
from typing import Any, List
from pathlib import Path
from PIL import Image
from argparse import ArgumentParser
from timeit import default_timer as timer
import cv2 as cv
import numpy as np
import time

def load_labels(filename):
    with open(filename, 'r') as f:
        return [line.strip() for line in f.readlines()]

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('-i','--image', help='image to be classified.')
    parser.add_argument('-m','--model_file',help='model to be executed.')
    parser.add_argument('-l','--label_file', help='name of labels file.')
    parser.add_argument('--input_mean', default=127.5, help='input_mean')
    parser.add_argument('--input_std', default=127.5,help='input std de')
    args = parser.parse_args()

    stai_model = stai_mpu_network(model_path=args.model_file)
    # Read input tensor information
    num_inputs = stai_model.get_num_inputs()
    input_tensor_infos = stai_model.get_input_infos()
    for i in range(0, num_inputs):
        input_tensor_shape = input_tensor_infos[i].get_shape()
        input_tensor_name = input_tensor_infos[i].get_name()
        input_tensor_rank = input_tensor_infos[i].get_rank()
        input_tensor_dtype = input_tensor_infos[i].get_dtype()
        print("**Input node: 0 -Input_name:{} -Input_dims:{} - input_type:{} -Input_shape:{}".format(input_tensor_name,
                                                                                                    input_tensor_rank,
                                                                                                    input_tensor_dtype,
                                                                                                    input_tensor_shape))
        if input_tensor_infos[i].get_qtype() == "affinePerTensor":
            # Reading the input scale and zero point variables
            input_tensor_scale = input_tensor_infos[i].get_scale()
            input_tensor_zp = input_tensor_infos[i].get_zero_point()
        if input_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            input_tensor_dfp_pos = input_tensor_infos[i].get_fixed_point_pos()


    # Read output tensor information
    num_outputs = stai_model.get_num_outputs()
    output_tensor_infos = stai_model.get_output_infos()
    for i in range(0, num_outputs):
        output_tensor_shape = output_tensor_infos[i].get_shape()
        output_tensor_name = output_tensor_infos[i].get_name()
        output_tensor_rank = output_tensor_infos[i].get_rank()
        output_tensor_dtype = output_tensor_infos[i].get_dtype()
        print("**Output node: 0 -Output_name:{} -Output_dims:{} -  Output_type:{} -Output_shape:{}".format(output_tensor_name,
                                                                                                        output_tensor_rank,
                                                                                                        output_tensor_dtype,
                                                                                                        output_tensor_shape))
        if output_tensor_infos[i].get_qtype() == "affinePerTensor":
            # Reading the output scale and zero point variables
            output_tensor_scale = output_tensor_infos[i].get_scale()
            output_tensor_zp = output_tensor_infos[i].get_zero_point()
        if output_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            output_tensor_dfp_pos = output_tensor_infos[i].get_fixed_point_pos()

    # Reading input image
    input_width = input_tensor_shape[1]
    input_height = input_tensor_shape[2]
    input_image = Image.open(args.image).resize((input_width,input_height))
    input_data = np.expand_dims(input_image, axis=0)
    if input_tensor_dtype == np.float32:
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std

    stai_model.set_input(0, input_data)
    start = timer()
    stai_model.run()
    end = timer()

    print("Inference time: ", (end - start) *1000, "ms")
    output_data = stai_model.get_output(index=0)
    results = np.squeeze(output_data)
    top_k = results.argsort()[-5:][::-1]
    labels = load_labels(args.label_file)
    for i in top_k:
        if output_tensor_dtype == np.uint8:
            print('{:08.6f}: {}'.format(float(results[i] / 255.0), labels[i]))
        else:
            print('{:08.6f}: {}'.format(float(results[i]), labels[i]))

3.3. Download and prepare test data[edit | edit source]

First create the directory to store test data:

 mkdir stai_mpu_python_example

Next download the models and the test pictures:

 wget -O stai_mpu_python_example/mobilenet_v2_1.0_224_int8_per_tensor.nb https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.nb
 wget -O stai_mpu_python_example/mobilenet_v2_1.0_224_int8_per_tensor.tflite https://github.com/STMicroelectronics/meta-st-x-linux-ai/raw/refs/heads/main/recipes-samples/image-classification/models/files/mobilenet_v2_1.0_224_int8_per_tensor.tflite
 wget -O stai_mpu_python_example/labels_imagenet_2012.txt https://raw.githubusercontent.com/STMicroelectronics/meta-st-x-linux-ai/refs/heads/main/recipes-samples/image-classification/models/files/labels_imagenet_2012.txt
 wget -O stai_mpu_python_example/bird.jpg https://farm3.staticflickr.com/8008/7523974676_40bbeef7e3_o.jpg
 wget -O stai_mpu_python_example/plant.jpg https://c2.staticflickr.com/1/62/184682050_db90d84573_o.jpg

Once you have the data and the labels files needed for inferencing downloaded and ready, it is time to deploy you application script and run the inference.

Info white.png Information
If you wish to run owm NBG model from your quantized TFLite™ or ONNX™ , follow this article to convert your model to NBG format.

3.4. Script and models deployment and launch of application[edit | edit source]

Copy the script file and the test data directory containing your model onto the board:

 scp -r stai_mpu_python_example/ root@<board_ip>:/path/
 scp stai_mpu_img_cls.py root@<board_ip>:/path/
Info white.png Information
The corresponding runtime plugin to your model should installed before running the binary.

Connect to the board and launch the example to run the model using NBG model for best performance:

 python3 stai_mpu_img_cls.py -m stai_mpu_python_example/mobilenet_v2_1.0_224_int8_per_tensor.nb -i stai_mpu_python_example/bird.jpg -l stai_mpu_python_example/labels_imagenet_2012.txt
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.5
**Input node: 0 -Input_name: -Input_dims:4 - input_type:uint8 -Input_shape:(1, 224, 224, 3)
**Output node: 0 -Output_name: -Output_dims:2 -  Output_type:float16 -Output_shape:(1, 1000)
Inference time:  14.044007999473251 ms
0.941406: chickadee
0.027344: magpie
0.003906: junco
0.003906: water ouzel
0.003906: bulbul

Where the max value index represents the index of the class detected and the value represents the confidence. On these particular pictures, the bird detected is a poecile atricapillus (black-capped chickadee) and the plant is a helianthus annuus (Daisy). The index and the name of each class are available in the labels_imagenet_2012.txt stored in the stai_mpu_python_example/bird.jpg directory.

4. References[edit | edit source]