This article describes how to measure the performance of an ONNX model using ONNX Runtime on STM32MPUs platform.
1. Installation[edit source]
1.1. Installing from the OpenSTLinux AI package repository[edit source]
After having configured the AI OpenSTLinux package, install X-LINUX-AI components for this application. The minimum package required is:
x-linux-ai -i onnxruntime-tools
The model used in this example can be installed from the following package:
x-linux-ai -i img-models-mobilenetv2-10-224
2. How to use the benchmark application[edit source]
2.1. Executing with the command line[edit source]
The onnxruntime_perf_test executable is located in the userfs partition:
/usr/local/bin/onnxruntime-*/tools/onnxruntime_perf_test
It accepts the following input parameters:
usage: ./onnxruntime_perf_test [options...] model_path [result_file] Options: -m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'. Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times. -M: Disable memory pattern. -A: Disable memory arena -I: Generate tensor input binding (Free dimensions are treated as 1.) -c [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.. -r [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000. -t [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600. -p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file. -s: Show statistics result, like P75, P90. If no result_file provided this defaults to on. -v: Show verbose information. -x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >=0. -y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >=0. -f [free_dimension_override]: Specifies a free dimension by name to override to a specific value for performance optimization. Syntax is [dimension_name:override_value]. override_value must > 0 -F [free_dimension_override]: Specifies a free dimension by denotation to override to a specific value for performance optimization. Syntax is [dimension_denotation:override_value]. override_value must > 0 -P: Use parallel executor instead of sequential executor. -o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all). Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels. -u [optimized_model_path]: Specify the optimized model path for saving. -z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals. -h: help
2.2. Testing with MobileNet[edit source]
The model used for testing is mobilenet_v2_1.0_224_int8_per_tensor.onnx, installed by the img-models-mobilenetv2-10-224 package.
It is a model used for image classification.
On the target, the model is located here:
/usr/local/x-linux-ai/image-classification/models/mobilenet/
To benchmark an ONNX model with onnxruntime_perf_test, use the following command:
/usr/local/bin/onnxruntime-*/tools/onnxruntime_perf_test -I -m times -r 8 /usr/local/x-linux-ai/image-classification/models/mobilenet/mobilenet_v2_1.0_224_int8_per_tensor.onnx
Console output:
Session creation time cost: 0.632683 s Total inference time cost: 1.41846 s Total inference requests: 8 Average inference time cost: 177.308 ms Total inference run time: 1.4186 s Number of inferences per second: 5.63936 Avg CPU usage: 98 % Peak working set size: 52129792 bytes Avg CPU usage:98 Peak working set size:52129792 Runs:8 Min Latency: 0.17603 s Max Latency: 0.182825 s P50 Latency: 0.176624 s P90 Latency: 0.182825 s P95 Latency: 0.182825 s P99 Latency: 0.182825 s P999 Latency: 0.182825 s
To obtain the best performance, it is interesting to use the additional flags -P -x 2 -y 1 to use more than one thread for the benchmark depending of the hardware used.
/usr/local/bin/onnxruntime-*/tools/onnxruntime_perf_test -I -m times -r 8 -P -x 2 -y 1 /usr/local/x-linux-ai/image-classification/models/mobilenet/mobilenet_v2_1.0_224_int8_per_tensor.onnx
Console output:
Setting intra_op_num_threads to 2 Setting inter_op_num_threads to 1 Session creation time cost: 0.483545 s Total inference time cost: 1.43479 s Total inference requests: 8 Average inference time cost: 179.349 ms Total inference run time: 1.43495 s Number of inferences per second: 5.57509 Avg CPU usage: 96 % Peak working set size: 43249664 bytes Avg CPU usage:96 Peak working set size:43249664 Runs:8 Min Latency: 0.177462 s Max Latency: 0.181616 s P50 Latency: 0.178953 s P90 Latency: 0.181616 s P95 Latency: 0.181616 s P99 Latency: 0.181616 s P999 Latency: 0.181616 s
To display more information, use the flag -v.
3. References[edit source]