ST Edge AI: Guide for MPU

Applicable for

1. Description[edit | edit source]↑

1.1. What is ST Edge AI?[edit | edit source]↑

The ST Edge AI utility provides a complete and unified Command Line Interface (CLI) to generate, from a pre-trained Neural Network (NN), an optimized model or a library for all STM32 devices including MPU, MCU, ISPU and MSC. It consists on three main commands: analyze, generate and validate. Each command can be used, regardless of the other commands using the same set of common options (model files, output directory…), or any specific options.

In the case of the STM32MPU, you can use the generate command in ST Edge AI to convert a Neural Network (NN) model to an optimized Network Binary Graph (NBG). This NBG is the only format that allows you to run an NN model using the STM32MP2x Neural Processing Unit (NPU) acceleration. You can also use the validate command to verify that the model outputs are similar on an STM32MPU platform and on a Linux PC.

Information

The analyze commands in ST Edge AI for STM32MPUs are currently under development and will be available in a future release.

1.2. Main features[edit | edit source]↑

ST Edge AI is delivered as an archive containing an installer that can be executed to install the tool on the computer. This installer offers the possibility to select the ST Edge AI component to install. Some of these components are not available on all operating systems. In the case of the STM32MP2 component, it is available only for Linux.

Important

The component STM32MP2 of ST Edge AI is available only for Linux

The tool already contains all the python environment required to run a conversion. The objective is to allow the user to convert and execute a NN model easily on the STM32MP2x platforms. For this, the tool allows the conversion of a quantized TensorFlow™ Lite^[1] or ONNX™^[2] model into a NBG format.

The Network binary graph (NBG) is the precompiled NN model format using the OpenVX™ graph representation. This is the only NN format that can be loaded and executed directly on the NPU of STM32MP2x boards.

The model provided to the tool must be quantized using the 8-bits per-tensor asymmetric quantization scheme to have the best performances. If the quantization scheme is 8-bits per-channel, the model mainly runs on GPU, instead of NPU.

The tool does not support the custom operators:
If the provided model contains custom operations, these are automatically removed or the generation fails.
If the output of the model is already post-processed using a custom operator such as TFLite post-process, the post-processing layer is deleted. To prevent this situation:

Provide a model that does not include the custom post-process layer, and code the post-process function inside your applications. The model runs on NPU, and the post-process is executed on CPU.
Split your model to execute the core of the model on the NPU, using the NBG model and the post-processing layer on the CPU using your TensorFlow™ Lite or ONNX™ model with stai_mpu API.

Once this NBG is generated, you can benchmark or develop your AI application. For further information, refer to the following articles presenting how to deploy your NN model and guide to benchmark your NN model.

2. Installation[edit | edit source]↑

Important

The component STM32MP2 of ST Edge AI is available only for Linux

Download the ST Edge AI tool here: https://www.st.com/en/development-tools/stedgeai-core.html

If you need to setup specific proxy settings, please follow the step by step procedure to setup the proxy:

Then, follow the step by step procedure to install the tool:

Information

A maintenancetool is also installed, allowing to add, remove or update ST Edge AI components.

The maintenancetool is an executable file located in your installation folder. When launched, it allows to add or remove a component, update an existing component (if an update is available), and uninstall the ST Edge AI tool.

3. How to use the tool[edit | edit source]↑

3.1. Script utilization[edit | edit source]↑

The main tool interface is the stedgeai binary. First, go to the binary directory:

cd <your_installation_path>/2.0/Utilities/linux

The features of the stedgeai binary for the stm32mp25 target are the following:

Information

The command "stedgeai --help" will print a lot of options but only the following options can impact the stm32mp25 target.

usage: stedgeai --model FILE --target stm32mp25 [--workspace DIR] [--output DIR] [--no-report] [--no-workspace] 
                [--input-data-type float32] [--output-data-type float32] [--entropy ENTROPY] [--batch-size INT] 
                [--mode host|target|host-io-only|target-io-only] [--desc DESC] [--valinput FILE [FILE ...]] 
                [--valoutput FILE [FILE ...]] [--range MIN MAX [MIN MAX ...]] [--save-csv] [--classifier] 
                [--no-check] [--no-exec-model] [--seed SEED] [-h] [--version] [--tools-version] 
                [--verbosity [0|1|2|3]] [--quiet]
                generate|validate

ST Edge AI Core v2.0.0 (STM32 MP2 module v2.0.0)

command:
 generate|validate     must be the first argument (default: generate)
                       validate
                       	validate the converted model by itself or versus the original model
                       generate
                       	generate the converted model for the target device


common options:
 --model FILE, -m FILE
                       paths of the original model files
 --target stm32|stellar-e|ispu|mlc
                       target/device selector
 --workspace DIR, -w DIR
                       workspace folder to use (default: st_ai_ws)
 --output DIR, -o DIR  folder where the generated files are saved (default: st_ai_output)

 --input-data-type float32
                       indicate the expected inputs data type of the generated model
                       Multiple inputs: in_data_type_1,in_data_type_2,...
                       If one data type is given, it will be applied for all inputs
 --output-data-type float32
                       indicate the expected outputs data type of the generated model
                       Multiple outputs: out_data_type_1,out_data_type_2,...
                       If one data type is given, it will be applied for all outputs

specific generate options:
 --entropy ENTROPY     requested entropy when converting per-channel model to per-tensor NBG model

specific validate options:
 --batch-size INT, -b INT
                       number of samples for the validation
 --mode host|target|host-io-only|target-io-only
                       validation mode to use
 --desc DESC, -d DESC  
                       COM port and baud rate to use for communication with the board. Syntax:
                           serial[:COMPORT][:baudrate]
 --valinput FILE [FILE ...], -vi FILE [FILE ...]
                       files containing data to use as input for validation
 --valoutput FILE [FILE ...], -vo FILE [FILE ...]
                       files containing data to use as reference output for validation
 --range MIN MAX [MIN MAX ...]
                       range of values to generate the random input data (default: [0 1])
 --save-csv            force the storage of all io data in csv files
 --classifier          specify that the model is a classifier (otherwise auto-detection is performed)
 --no-check            disable internal checks (mode/model type dependent)
 --no-exec-model       disable execution of the original model
 --seed SEED  

additional options:
 --no-report           do not generate the report file
 --no-workspace        do not create the workspace folder
 -h, --help            show this help message and exit (use --target to get target specific help)
 --version             print the version of the tool
 --tools-version       print the versions of the third party packages used by the tool
 --verbosity [0|1|2|3], -v [0|1|2|3], --verbose [0|1|2|3]
                       set verbosity level
 --quiet               disable the progress-bar

examples:
 stedgeai analyze mode is not currently supported
 stedgeai generate --target <target> -m myquantizedmodel.tflite
 stedgeai validate --target <target> -m mymodel.tflite --mode target

Warning

The analyze options are not yet supported for the stm32mp25 target.

3.2. Using generate command[edit | edit source]↑

To generate a NBG model, you need to provide a quantized TensorFlow™ Lite or ONNX™ model, and select the correct target: stm32mp25.

Information

The model type is not required, since the tool recognizes it automatically.

3.2.1. Generate NBG from TensorFlow™ Lite or ONNX™[edit | edit source]↑

To convert a TensorFlow™ Lite model to NBG:

./stedgeai generate -m path/to/tflite/model --target stm32mp25

To convert an ONNX™ model to NBG:

./stedgeai generate -m path/to/onnx/model --target stm32mp25

This command generates two files: the .nb file, which is the NBG model, and a .txt file, which is the report of the generate command.

The .nb file is located in the output path specified by the --output option. Else by default, the .nb file is located in the st_ai_output default directory.
The report (with the extension report_modelName_stm32mp25.txt) is located in the workspace folder specified by the --workspace option. Else by default, the report is located in the st_ai_ws default directory.

3.2.2. Generate NBG from TensorFlow™ Lite or ONNX™ without report[edit | edit source]↑

To generate a NBG model without the report, add the --no-report option in your command line.

To convert a TensorFlow™ Lite model to NBG without the report:

./stedgeai generate -m path/to/tflite/model --target stm32mp25 --no-report

To convert an ONNX™ model to NBG without the report:

./stedgeai generate -m path/to/onnx/model --target stm32mp25 --no-report

This command generates a .nb file located in the output folder. If you use the option --output to specify a directory, the NBG is located inside, else the default output folder is st_ai_output.

3.2.3. Generate NBG from TensorFlow™ Lite or ONNX™ with float I/O[edit | edit source]↑

To generate a NBG model with float32 input and output or float32 output, add the --input-data-type and --output-data-type in your command line.

Information

The argument --input-data-type float32 will have no effect if the --output-data-type is not set to float32. However, the --output-data-type float32 can be used alone to generate a model with float32 output.

To convert a model to NBG with float32 I/O:

./stedgeai generate -m path/to/TfliteorOnnx/model --target stm32mp25 --input-data-type float32 --output-data-type float32

To convert a model to NBG with float32 output only:

./stedgeai generate -m path/to/TfliteorOnnx/model --target stm32mp25 --output-data-type float32

This command generates a .nb file located in the output folder. If you use the option --output to specify a directory, the NBG is located inside, else the default output folder is st_ai_output.

3.3. Using validate command[edit | edit source]↑

Once the model is converted to NBG with the ST Edge AI generate command, you can verify if the NBG outputs are similar to the former TensorFlow™ Lite or ONNX™ model.

To validate a model on an STM32MPU platform, you need to have an STM32MP25 board connected to your host Linux PC. There are two ways to do it:

- The PC should be connected to the board with a USB-C/USB-C, USB-A/USB-C cable on the USB-C DRD port of the STM32MP25. This connection will create an Ethernet connection via USB with the following IP adress: 192.168.7.1. - Or, the STM32MP25 must be connected to internet and must have an IP address.

The validate command will send the model to the board and run the inference directly on the CPU or on the GPU/NPU, depending on the options passed as arguments. In parallel, the same model will run on the host PC, and the results of the two executions will be compared.

A set of metrics is returned:

acc  : Accuracy (class, axis=-1)
rmse : Root Mean Squared Error
mae  : Mean Absolute Error
l2r  : L2 relative error
mean : Mean error
std  : Standard deviation error
nse  : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
cos  : COsine Similarity, bigger is better, best=1, range=(0, 1]

This allows to find out if the model executed on the board has similar results to the one on the host PC.

A descriptor -d/--desc argument is mandatory to run the validate mode. The different options of the descriptor are separated by the : character. On STM32MPUs, the descriptor can contain 4 values:

mpu : this select the execution of the STM32MPUs pass. This is the only mandatory option.
<ip_address> : by default the IP address is set to 192.168.7.1 which is the address given when the board is connected to a host PC using USB type C DRD port.
cpu : this allow the user to choose between CPU or GPU/NPU execution
model_path : (Optional) It is possible to pass the model path as argument, else the model path is automatically recovered from '-m/--model' option

./stedgeai validate -m path/to/TfliteorOnnx/model --target stm32mp25 --mode target --desc mpu:<your_ip_addr>:npu

This is the expected output:

ST Edge AI Core v2.0.0
Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 256, 256, 3)/float32, min/max=[0.000000, 0.999999], mean/std=[0.499873, 0.288562]
c/I[1] conversion [Q(0.00392157,0)]-> (10, 256, 256, 3)/uint8, min/max=[0, 255], mean/std=[127.467535, 73.584602]
m/I[1] conversion [Q(0.00392157,0)]-> (10, 256, 256, 3)/uint8, min/max=[0, 255], mean/std=[127.467535, 73.584602]
no output/reference samples are provided
Creating c (debug) info json file /local/home/brissona/STM32MPU_workspace/tmp/STEdgeAI/github/ai_build_tools/build/pkg/Utilities/linux/stm32ai_ws/network_c_info.json
Exec/report summary (validate)
----------------------------------------------------------------------------------------------------------------------------------------------
model file         : path/to/Tflite/model
type               :   tflite
c_name             :   network
options            :   allocate-inputs, allocate-outputs
optimization       :   balanced
target/series      :   stm32mp25
workspace dir      :   path/to/stm32ai_ws
output dir         :   path/to/stm32ai_output
model_fmt          :   sa/ua per tensor
model_name         :   yolov8n_256_quant_pt_uf_pose_cocost
model_hash         :   0x81fc0c54da35ba88814f17f3d99a7ce7
params #           :   3,303,423 items (3.17 MiB)
----------------------------------------------------------------------------------------------------------------------------------------------
input 1/1          :   'serving_default_images0', uint8(1x256x256x3), 192.00 KBytes, QLinear(0.003921569,0,uint8), activations
output 1/1         :   'conversion_287', f32(1x56x1344), 294.00 KBytes, activations
macc               :   741,876,319
weights (ro)       :   3,321,028 B (3.17 MiB) (1 segment) / -9,892,664(-74.9%) vs float model
activations (rw)   :   778,304 B (760.06 KiB) (1 segment) *
ram (total)        :   778,304 B (760.06 KiB) = 778,304 + 0 + 0
----------------------------------------------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers can be used from the activations buffer
Running the TFlite model... 
Running the ST.AI c-model (AI RUNNER)...(name=network, mode=TARGET)
Target inference running on NPU using NBG model: ./stm32ai_output/yolov8n_256_quant_pt_uf_pose_coco-st.nb
ETHERNET:10.48.104.43:./stm32ai_output/yolov8n_256_quant_pt_uf_pose_coco-st.nb:connected ['network']
Summary 'network' - ['network']
--------------------------------------------------------------------------------------------------
I[1/1] 'input_0_'    :   uint8[1,256,256,3], 196608 Bytes, QLinear(0.003921569,0.0,uint8), None
O[1/1] 'output_0_'   :   f32[1,56,1344], 301056 Bytes, None
n_nodes              :   None
activations          :   []
weights              :   []
compile_datetime     :   <undefined>
--------------------------------------------------------------------------------------------------
protocol             :   <undefined>
tools                :   ST.AI v5.0.0
runtime lib          :   v5.1.0
capabilities         :   IO_ONLY
device.desc          :   MP2 VSI NPU
--------------------------------------------------------------------------------------------------
Warning: C-network signature checking is skipped on STM32 MPU
ST.AI Profiling results v2.0 - "network"
---------------------------------------------------------------
nb sample(s)   :   10
duration       :   18.505 ms by sample (17.865/23.583/1.694)
---------------------------------------------------------------
Statistic per tensor
-----------------------------------------------------------------------------------
tensor   #    type[shape]:size              min     max      mean      std  name
-----------------------------------------------------------------------------------
10   u8[1,256,256,3]:196608          0     255   127.468   73.585
10   f32[1,56,1344]:301056      -0.114   1.149     0.411    0.297
-----------------------------------------------------------------------------------
Saving validation data...
output directory:path/to/stm32ai_output
creating path/to/stm32ai_output/network_val_io.npz
m_outputs_1: (10, 56, 1, 1344)/float32, min/max=[-0.093570, 1.176313], mean/std=[0.410561, 0.297534], conversion_287
c_outputs_1: (10, 56, 1, 1344)/float32, min/max=[-0.113647, 1.149414], mean/std=[0.410806, 0.297441], conversion_287
Computing the metrics...
Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - ACC metric is not computed ("--classifier" option can be used to force it)
- the output of the reference model is used as ground truth/reference value
- 10 samples (75264 items per sample)
acc=n.a. rmse=0.015041845 mae=0.008427310 l2r=0.029657720 mean=-0.000245 std=0.015040 nse=0.997444 cos=0.999561
Evaluation report (summary)
----------------------------------------------------------------------------------------------------------------------------------------------
Output       acc    rmse          mae           l2r           mean        std        nse        cos        tensor
----------------------------------------------------------------------------------------------------------------------------------------------
X-cross #1   n.a.   0.015041845   0.008427310   0.029657720   -0.000245   0.015040   0.997444   0.999561   'conversion_287', 10 x f32(1x56x1344), m_id=[287]
----------------------------------------------------------------------------------------------------------------------------------------------
acc  : Accuracy (class, axis=-1)
rmse : Root Mean Squared Error
mae  : Mean Absolute Error
l2r  : L2 relative error
mean : Mean error
std  : Standard deviation error
nse  : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
cos  : COsine Similarity, bigger is better, best=1, range=(0, 1]
Creating txt report file path/to/stm32ai_output/network_validate_report.txt
elapsed time (validate): 64.462s

4. References[edit | edit source]↑

[tensorflowlite_url-1] TensorFlow™ Lite

[onnx_url-2] ONNX™

[1]

[2]