1. Description[edit | edit source]
1.1. Hardware and software capabilities[edit | edit source]
The unit integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP.
This unit is a combination of GPU and NPU and these two parts are sharing the same parallel processing unit (shaders).
The NPU part is composed of one AI core that delivers 1.2 TOPS at 800 MHz that can be overdrive to 900 MHz to reach 1.35 TOPS.
On the GPU side, the computing power is 12.8 GFLOPS at 800 MHz when processing 16 bit data.
To have a clearer view of the performances you can reach with this NPU, here is a table of the performances of several common models:
Model | Input Shape | Type |
---|---|---|
MobilenetV2_1.0 | 224x224 | 72 FPS |
SSD_MobilenetV2 FPNLite | 256x256 | 36 FPS |
YoloV8n | 256x256 | 59 FPS |
DeepLabV3 | 257x257 | 17 FPS |
1.2. Restriction and usage[edit | edit source]
To access and run an neural network (NN) model on the NPU, you need to use the OpenVX software stack. But, to ease the usage of the NPU software stack, we have developed a stai_mpu unified API that allows you to run an NN model easily. For more information, visit the wiki article on how to use stai_mpu API.
This NPU IP only supports 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the hardware.
The NPU/GPU does not support custom operators coming from other frameworks like TFLite™ or ONNX™, if the model contains such operators they will be removed or conversion to NBG format will fail. However, it is possible to define your own OpenVX operator.
2. Operation support[edit | edit source]
2.1. Basic operations[edit | edit source]
This is the list of the basic operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_CONV2D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CONV1D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CONV3D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_DECONVOLUTION | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_DECONVOLUTION1D | asym-u8 | ||
asym-i8 | |||
fp32 / fp16 | |||
VSI_NN_OP_FCL2 | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GROUPED_CONV1D | asym-u8 | ||
asym-i8 | |||
fp32 / fp16 | |||
VSI_NN_OP_GROUPED_CONV2D | asym-u8 | ||
asym-i8 | |||
fp32 / fp16 |
2.2. Activation operations[edit | edit source]
This is the list of the OVXLIB activation operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_ABS | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ACOSH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ATAN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ATANH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CLIP | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_COS | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ERF | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_EXP | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_HARD_SIGMOID | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_INVERSE_SIGMOID | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LEAKY_RELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LINEAR | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LOG | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LOG_SOFTMAX | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MISH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_NEG | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_PRELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RCP | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RELUN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RSQRT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SIGMOID | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SIGN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SIN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SOFTMAX | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SOFTRELU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SOFTSIGN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SQRT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SQUARE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SWISH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_TANH | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.3. Elementwise operations[edit | edit source]
This is the list of the elementwise operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_ADD | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ADDN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_DIVIDE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_FLOORDIV | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LOGICAL_NOT | bool8 | ||
VSI_NN_OP_LOGICAL_OPS | bool8 | ||
VSI_NN_OP_MATRIXMUL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MAXIMUM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MINIMUM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MOD | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MULTIPLY | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_POW | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RELATIONAL_OPS | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
bool8 | |||
VSI_NN_OP_SELECT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
bool8 | |||
VSI_NN_OP_SUBTRACT | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.4. Normalization operations[edit | edit source]
This is the list of the normalization operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_BATCH_NORM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_BATCHNORM_SINGLE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GROUP_NORM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_INSTANCE_NORM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_L2_NORMALIZE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LAYER_NORM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LPNORM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LRN2 | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MOMENTS | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.5. Reshape operations[edit | edit source]
This is the list of the reshape operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_ARGMAX | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ARGMIN | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_BATCH2SPACE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CONCAT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_DEPTH2SPACE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_EXPAND_BROADCAST | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_PAD2 | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_PERMUTE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_REDUCE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_REORG | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RESHAPE2 | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_REVERSE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SHUFFLECHANNEL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SLICE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SPACE2BATCH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SPACE2DEPTH | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SPLIT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SQUEEZE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_STACK | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_STRIDED_SLICE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_UNSTACK | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.6. RNN operations[edit | edit source]
This is the list of the recurrent neural network (RNN) operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_CONV2D_LSTM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CONV2D_LSTM_CELL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GRU | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GRUCELL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LSTM_OVXLIB | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LSTMUNIT_OVXLIB | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SVDF | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.7. Pooling operations[edit | edit source]
This is the list of the recurrent neural network (RNN) operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_AVG_POOL3D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GLOBALLPPOOL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_LPPOOL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MAX_POOL3D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MAXPOOLWITHARGMAX | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_MAXUNPOOL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_POOL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_POOLWITHARGMAX | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ROI_POOL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_UPSAMPLE | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.8. Miscellaneous operations[edit | edit source]
This is the list of other operations supported by the NPU.
Operation | Type | NPU support | GPU support |
---|---|---|---|
VSI_NN_OP_BUCKETIZE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CAST | all types | ||
VSI_NN_OP_CEIL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CONCATSHIFT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_CUMSUM | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_DATACONVERT | asym-u8 / asym-i8 | ||
VSI_NN_OP_DROPOUT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_EMBEDDING_LOOKUP | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_FLOOR | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GATHER | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GATHER_ELEMENTS | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GATHER_ND | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_GRID_SAMPLE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ONE_HOT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_PROPOSAL | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_REPEAT | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RESIZE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RESIZE_1D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_RESIZE_3D | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_REVERSESEQUENCE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_ROUND | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SCATTER_ELEMENTS | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SCATTER_ND | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SCATTER_ND_UPDATE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SEQUENCE_MASK | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_SIGNAL_FRAME | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_TILE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_UPSAMPLESCALE | asym-u8 / asym-i8 | ||
fp32 / fp16 | |||
VSI_NN_OP_VARIABLE | asym-u8 / asym-i8 | ||
fp32 / fp16 |
2.9. Fuse operation support[edit | edit source]
This is the list of the operation combinations that the NPU can fuse.
Fuse operation | First operation | ||||
---|---|---|---|---|---|
Second operation | CONV2D | CONV1D | DW_2D | FCL2 | PERMUTE |
ABS | |||||
ACOSH | |||||
ADD | |||||
ATAN | |||||
CELU | |||||
CLIP | |||||
CONV1D | |||||
CONV2D | |||||
DEPTH2SPACE | |||||
DW_2D | |||||
ELU | |||||
ERF | |||||
GELU | |||||
HARD_SIGMOID | |||||
HSWISH | |||||
INVERSE_SIGMOID | |||||
LEAKY_RELU | |||||
LOG | |||||
MAX_POOL | |||||
MISH | |||||
MULTIPLY | |||||
NEG | |||||
PERMUTE | |||||
PRELU | |||||
RCP | |||||
RELU | |||||
RELUN | |||||
RESHAPE | |||||
RSQRT | |||||
SELU | |||||
SIGMOID | |||||
SOFTRELU | |||||
SOFTSIGN | |||||
SPACE2DEPTH | |||||
SQRT | |||||
SQUARE | |||||
SUBTRACT | |||||
SWISH | |||||
TANH | |||||
MAX_POOL + ABS | |||||
MAX_POOL + ACOSH | |||||
MAX_POOL + ADD | |||||
MAX_POOL + ATAN | |||||
MAX_POOL + BATCH_NORM | |||||
MAX_POOL + CELU | |||||
MAX_POOL + CLIP | |||||
MAX_POOL + ELU | |||||
MAX_POOL + ERF | |||||
MAX_POOL + GELU | |||||
MAX_POOL + HARD_SIGMOID | |||||
MAX_POOL + HSWISH | |||||
MAX_POOL + INVERSE_SIGMOID | |||||
MAX_POOL + LEAKY_RELU | |||||
MAX_POOL + MISH | |||||
MAX_POOL + MULTIPLY | |||||
MAX_POOL + NEG | |||||
MAX_POOL + PRELU | |||||
MAX_POOL + RCP | |||||
MAX_POOL + RELU | |||||
MAX_POOL + RELUN | |||||
MAX_POOL + RSQRT | |||||
MAX_POOL + SELU | |||||
MAX_POOL + SIGMOID | |||||
MAX_POOL + SOFTRELU | |||||
MAX_POOL + SOFTSIGN | |||||
MAX_POOL + SQRT | |||||
MAX_POOL + SQUARE | |||||
MAX_POOL + SUBTRACT | |||||
MAX_POOL + SWISH | |||||
MAX_POOL + TANH |