Last edited one month ago

How to use hardware acceleration with TensorFlow Lite and ONNX Runtime frameworks

Applicable for STM32MP25x lines


1. Article purpose[edit | edit source]

The main purpose of this article is to describe the main steps and advise on how to use GPU/NPU hardware acceleration on STM32MP25x lines More info.png using TensorFlow LiteTM and ONNX RuntimeTM frameworks.

2. Hardware acceleration with TensorFlow LiteTM[edit | edit source]

2.1. Prerequisites[edit | edit source]

First, it is mandatory to use a model that supports acceleration with the GPU/NPU of the STM32MP25x lines More info.png. To make sure the model can be accelerated, follow this wiki article to deploy the NN model correctly.

For TensorFlow Lite model hardware acceleration, in addition to OpenVX (NBG model) an external delegate for TensorFlow Lite runtime named tflite-vx-delegate has been delivered since X-LINUX-AI v6.0.0 release. It allows to directly run .tflite models on GPU/NPU of the STM32MP25x lines More info.png through TensorFlow Lite runtime.

To install the tflite-vx delegate package on the board:

 x-linux-ai -i tflite-vx-delegate
 x-linux-ai -i tflite-vx-delegate-example

This command installs the libvx_delegate in the usr/lib directory of the board.

Then, check if the libraries are correctly installed:

 ls /usr/lib | grep -e libvx_delegate.so 

The different libraries are listed as follows with X, Y, and Z representing the version of the library:

libvx_delegate.so.X
libvx_delegate.so.X.Y.Z
Info white.png Information
Theses packages are automatically downloaded on STM32MP2x series with the packagegroup-x-linux-ai-demo (for both packages) as well as packagegroup-x-linux-ai-tflite for tflite-vx-delegate.
Warning DB.png Important
For TFLite vx-delegate, inference source code examples in C++ and Python are available in the meta-layer x-linux-ai in recipes-frameworks/tflite-vx-delegate/files/tflite-vx-delegate-example and on the board usr/local/bin/tflite-vx-delegate-example/

Now with everything correctly set up, let's see how to use the GPU/NPU hardware acceleration with TensorFlow LiteTM.


2.2. Acceleration with TensorFlow LiteTM[edit | edit source]

There are two different ways to use GPU/NPU hardware acceleration on the STM32MP25x lines More info.png using TensorFlow LiteTM.

2.2.1. C++ TensorFlow LiteTM API[edit | edit source]

Here is a snippet on how to modify your C++ application to include the support of the TensorFlow LiteTM external delegate for STM32MP25x lines More info.png GPU/NPU:

 std::unique_ptr<tflite::FlatBufferModel> model;
 std::unique_ptr<tflite::Interpreter> interpreter;
 model = tflite::FlatBufferModel::BuildFromBuffer(_model, _len);
 std::string vx_delegate_path = "/usr/lib/libvx_delegate.so.2";
 
 model->error_reporter();
 tflite::ops::builtin::BuiltinOpResolver resolver;
 
 /* Add custom operator from TFLite VX delegate */
 resolver.AddCustom(kNbgCustomOp, tflite::ops::custom::Register_VSI_NPU_PRECOMPILED());
 
 tflite::InterpreterBuilder(*model, resolver)(&interpreter);
 if (!interpreter) {
   std::cout << "FATAL: Failed to construct interpreter" << std::endl;
   exit(-1);
 }
 
 /* Define the path to the TFLite external delegate */
 const char * delegate_path = vx_delegate_path.c_str();
 
 /* Set external delegate options */
 auto ext_delegate_option = TfLiteExternalDelegateOptionsDefault(delegate_path);
 
 /* Add optional features to skip the warmup time if a Network Binary Graph is already generated */
 ext_delegate_option.insert(&ext_delegate_option, "cache_file_path", "/path/to/my_nn_network.nb");
 ext_delegate_option.insert(&ext_delegate_option, "allowed_cache_mode", "true");
 auto ext_delegate_ptr = TfLiteExternalDelegateCreate(&ext_delegate_option);
 
 /* Set the interpreter to use the external delegate */
 interpreter->ModifyGraphWithDelegate(ext_delegate_ptr);
  
 /* Optionnaly set the number of CPU threads for fallback */
 interpreter->SetNumThreads(number_of_threads);
 
 if (interpreter->AllocateTensors() != kTfLiteOk) {
 std::cout << "FATAL: Failed to allocate tensors!" << std::endl;
 [...]

The TfLiteExternalDelegateOptionsDefault must be initialized with the path to the library of the delegate. In this case the library is libvx_delegate, located in the user file system.

Then it is possible to add specific options available in the chosen delegate. There is no obligation to define these specific options. In the case of libvx_delegate, two options are interesting:

  • cache_file_path: if an .nb file pointed to by this variable is found, it is used to skip the warmup time, that is, the compilation of the .tflite model on the target. If no .nb file is found, it generates an .nb file in this path.
  • allowed_cache_mode: if this variable is set to True, the .nb file pointed to by cache_file_path is used or generated.

Finally, modify the interpreter of the delegate with the ModifyGraphWithDelegate function.

Info white.png Information
For more information on how to use this delegate or for any issues, refer to the tflite-vx-delegate GitHub[1].

2.2.2. Python TensorFlow LiteTM API[edit | edit source]

The following is a snippet on how to modify the Python application to include the support of the TensorFlow LiteTM external delegate for STM32MP25x lines More info.png GPU/NPU:

 import tflite_runtime.interpreter as tflr
 
 vx_delegate = tflr.load_delegate(library="/usr/lib/libvx_delegate.so.2",
                                  options={"cache_file_path":"<path/to/.nb/model>", "allowed_cache_mode":"true"})
 
 self._interpreter = tflr.Interpreter(model_path=<path/to/your/model/.tflite>,
                                      num_threads = 2,                           # Number of CPU cores
                                      experimental_delegates=[vx_delegate])
 
 self._interpreter.allocate_tensors()
 self._input_details = self._interpreter.get_input_details()
 self._output_details = self._interpreter.get_output_details()
 [...]

The TensorFlow LiteTM Interpreter must be initialized with the path to the model and the experimental_delegate path to library of the delegate. In this case the library is libvx_delegate, located in the user file system.

It is possible to add specific options available in the chosen delegate. There is no obligation to define these specific options.
In the case of libvx_delegate, two options are interesting:

  • cache_file_path: if an .nb file pointed to by this variable is found, it is used to skip the warmup time, that is, the compilation of the .tflite model on the target. If no .nb file is found, it generates an .nb file in this path.
  • allowed_cache_mode: if this variable is set to True, the .nb file pointed to by cache_file_path is used or generated.

The TFLite interpreter can be used in the same way with or without the declaration of the delegate. It has no impact on the rest of the code.

Info white.png Information
For more information on how to use this delegate or for any issues, refer to the tflite-vx-delegate GitHub[1].

3. Hardware acceleration with ONNXTM Runtime[edit | edit source]

3.1. Prerequisites[edit | edit source]

For ONNX models hardware acceleration, in addition to OpenVX (NBG model) an execution provider for ONNX runtime named VSInpu execution provider has been delivered since X-LINUX-AI v6.0.0 release. It allows to directly run .onnx models on GPU/NPU of the STM32MP25x lines More info.png through ONNX runtime.

To install the VSInpu execution provider package on the board:

 x-linux-ai -i onnxruntime 
 x-linux-ai -i python3-onnxruntime 
 x-linux-ai -i ort-vsinpu-ep-example-cpp
 x-linux-ai -i ort-vsinpu-ep-example-python

The VSInpu execution provider is up-streamed directly in ONNX runtime, thus it is automatically available with the onnxruntime package for STM32MP2x series

This command installs the libonnxruntime in the usr/lib directory of the board.

Then, check if the libraries are correctly installed:

 ls /usr/lib | grep -e libonnxruntime.so 

The different libraries are listed as follows with X, Y, and Z representing the version of the library:

libonnruntime.so.X
libonnruntime.so.X.Y.Z
Info white.png Information
Theses packages are automatically downloaded on STM32MP2x series with the packagegroup-x-linux-ai-demo (for both packages) as well packagegroup-x-linux-ai-onnx for VSInpu execution provider.
Warning DB.png Important
For ONNX Runtime VSInpu, inference source code examples in C++ and Python are available in the meta-layer x-linux-ai inrecipes-frameworks/onnxruntime/files/ort-vsinpu-ep-example and on the board usr/local/bin/ort-vsinpu-ep-example/

Now with everything correctly set up, let's see how to use the GPU/NPU hardware acceleration with ONNXTM Runtime.

3.2. Acceleration with ONNXTM Runtime[edit | edit source]

3.2.1. C++ ONNXTM Runtime API[edit | edit source]

The following is a snippet on how to modify the C++ application to include the support of the ONNXTM Runtime execution provider for STM32MP25x lines More info.png GPU/NPU:

 Ort::Env ort_env(ORT_LOGGING_LEVEL_WARNING, "Onnx_environment");
 Ort::SessionOptions session_options;
     // Set VSINPU AI execution provider
     OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_VSINPU(session_options);
     if (status != nullptr) {
         std::cerr << "Failed to set VSINPU AI execution provider: " << Ort::GetApi().GetErrorMessage(status) << std::endl;
         Ort::GetApi().ReleaseStatus(status);
         throw std::runtime_error("[ORT] Failed: VSINPU AI execution provider runtime error");
     }
 }
 session_options.DisableCpuMemArena();
 session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
 
 /* create a session from the ONNX model file */
 Ort::Session session(ort_env, model_path.c_str(), session_options);
 
 [...]
 # Get input shape and prepare your input data
 [...]
 Ort::RunOptions run_options;
 auto output_tensors = session.Run(run_options, input_name_data, 
 input_tensor_data, num_of_input, output_data, num_of_ouutput);
 [...]

The ONNXTM Runtime session_options must be modified with the VSINPUExecutionProvider to run the NN model on the GPU/NPU. The OrtSessionOptionsAppendExecutionProvider_VSINPU function is used to execute the model on the GPU/NPU instead of the CPU. If an operation is not supported, the execution of this operation falls back to the CPU.

Info white.png Information
For more information on how to use this execution provider or for any issues, refer to the ONNX Runtime GitHub[2].

3.2.2. Python ONNXTM Runtime API[edit | edit source]

The following is a snippet on how to modify the Python application to include the support of the ONNXTM execution provider for STM32MP25x lines More info.png GPU/NPU:

import onnxruntime as ort

# Load the ONNX model and create an inference session with VSI NPU execution provider
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session = ort.InferenceSession(model_path, sess_options=session_options, providers=['VSINPUExecutionProvider'])

# Get input and output details
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type

# Prepare input data
[...]
# Run the inference
output_data = session.run(None, {input_name: input_data})
[...]

The python API of ONNX Runtime is very simple to use as the VSInpu execution provider is part of ONNX runtime officially supported execution provider. The only needed modification is to use the providers option in the session to point out VSINPUExecutionProvider.

Info white.png Information
For more information on how to use this execution provider or for any issues, refer to the ONNX Runtime GitHub[2].

4. Hardware acceleration with with the STAI_MPU API[edit | edit source]

It is also possible to accelerate the TFLiteTM or ONNXTM models using the GPU/NPU via the STAI_MPU API by setting an option in the STAI_MPU constructor.
However, the STAI_MPU is based on the TensorFlow LiteTM and ONNXTM Runtime mechanism explained in the previous sections.
To find out more about how to unable hardware acceleration using the STAI_MPU API, refer to this article: STAI_MPU:_AI_unified_API_for_STM32MPUs.

5. References[edit | edit source]