On-device learning for object detection

Applicable for

1. Article purpose[edit | edit source]

This article explains how to use the teacher-student learning feature for object detection applications with on-device automatic labeling. We will demonstrate the concept of transfer learning using the ONNXRuntime training API on an STM32MP2 series' boards device.

2. Description[edit | edit source]

The application demonstrates the teacher-student machine learning use case for object detection. Frames are grabbed from a camera sensor, then processed and labeled using a powerful and accurate model known as the teacher model, with high accuracy and low real time constraints. This local dataset is used to retrain the student, validate its loss convergence, and export a new inference model to run in the same application within a Gstreamer pipeline. If data are collected in a generic uniform way, the user will notice an improvement of the inference accuracy of the student model.
The models used with this application are :

The SSD MobileNet V2 as the student: defined and exported from the PyTorch-SSD repo^[1], and trained on PASCAL VOC dataset. The training artifacts of this model are provided along with the application package.
The RT-DETR as the teacher: exported in large version from the Ultralytics^[2] Python module and trained on a larger dataset COCO. The ONNX model is provided with the application package, but the user can optionally export it to ONNX format, as detailed in this article.

Information

For this application, the use case is to detect two classes (BACKGROUND and Person).
To change the class, you can modify the label.txt passed as an argument. Ensure to put a class among the 80 supported by the COCO dataset, as they are known by the teacher model.
To adapt the training artifacts for more than two classes, refer to the how to generate training artifacts article.

3. Installation[edit | edit source]

3.1. Install from the OpenSTLinux AI package repository[edit | edit source]

Warning

The software package is provided AS IS, and by downloading it, you agree to be bound to the terms of the software license agreement (SLA0048). The detailed content licenses can be found here.

After having configured the AI OpenSTLinux package you can install X-LINUX-AI components for on-device learning application.

3.1.1. Install the GTK+3.0 UI application[edit | edit source]

The application is available only on Python and STM32MP2 series' boards .

To install this application, use the following command:

x-linux-ai -i on-device-learning-obj-detect-python

Information

The C++ ONNXRuntime training API is fully integrated within the X-LINUX-AI components, you are free to explore building your C++ on-device learning application by refering to the ONNXRuntime training C++ API.^[3]

Then, restart the demo launcher:

systemctl restart weston-graphical-session.service

3.2. Export the teacher model: RT-DETR (optional)[edit | edit source]

The application package comes along with the teacher model integrated. It can be found in the following path on the target /usr/local/x-linux-ai/on-device-learning/teacher_model/rt-detr/rtdetr-l.onnx. However, it is possible for the user to export it on its own, by first installing the Ultralytics, by running the following command:

pip install ultralytics

Once the package is installed on your host machine, you are set to export the RT-DETR model to ONNX ^[4]format by running the following script on your host machine:

from ultralytics import RTDETR

# Load a model
model = RTDETR("rtdetr-l.pt")

# Export the model to ONNX format
path = model.export(format="onnx", imgsz=256)

The new rtdetr-l.onnx model is generated, it can be deployed to your target, to the same location on your board:

scp <path/to/model>/rtdetr-l.onnx <your-board-ip-addr>:/usr/local/x-linux-ai/on-device-learning/teacher_model/rt-detr/

3.3. Generate the student model training artifacts (optional)[edit | edit source]

The application package comes along with the student model training artifacts installed. You can find it in the following path on your target /usr/local/x-linux-ai/on-device-learning/student_model/ssd_mobilenet_v2/training_artifacts/. However, it is possible for the user to export it on its own, by following the dedicated article, it can be deployed to your application by running the command below:

scp <path/to/model>/*_model.onnx <your-board-ip-addr>:/usr/local/x-linux-ai/on-device-learning/student_model/training_artifacts/
scp <path/to/model>/checkpoint <your-board-ip-addr>:/usr/local/x-linux-ai/on-device-learning/student_model/training_artifacts/

3.4. Source code location[edit | edit source]

In the OpenSTLinux Distribution with X-LINUX-AI Expansion Package:

<Distribution Package installation directory>/layers/meta-st/meta-st-x-linux-ai/recipes-samples/on-device-learning/files/

On GitHub:

recipes-samples/on-device-learning/files/teacher-student

4. How to use the application[edit | edit source]

4.1. Launching via the demo launcher[edit | edit source]

You can click on the icon to run the Python application installed on your STM32MP2 series' boards .

4.2. Executing with the command line[edit | edit source]

The on-device learning for object detection Python applications are located in the userfs partition:

/usr/local/x-linux-ai/on-device-learning/odl_teacher_student_obj_detect.py

It accepts the following input parameters:

In Python application:

 
Usage: python3 odl_teacher_student_obj_detect.py -t <model .onnx> -l <label .txt file> --training_artifacts_path <artifacts parent dir>

-t --teacher_model <.onnx file path>:        .onnx teacher model to be executed for data annotation 
-l --label_file <label file path>:           Name of file containing labels
--training_artifact_path <directory path>:   Path of the directory containing the training artifacts
--inference_model_path <file path>:          The initial inference model path in case there is no new inference model
--frame_width  <val>:                        Width of the camera frame (default is 640)
--frame_height <val>:                        Height of the camera frame (default is 480)
--framerate <val>:                           Framerate of the camera (default is 15fps)
--conf_threshold <val>:                      Threshold of accuracy above which the boxes are display (default 0.60)
--iou_threshold <val>:                       Threshold of intersection over union above which the boxes are displayed (default 0.45)
--nb_calib_img                               Number of images to consider for static quantization parameters
--help:                                      Show this help

4.3. Navigating through the tabs[edit | edit source]

The application is developed with GTK ^[5], and provided in the format of a GTK notebook, allowing you a smooth navigation through all the steps of the teacher-student workflow.

4.3.1. Data retrieval tab[edit | edit source]

The primary advantage of on-device learning is that data remain on the device, ensuring enhanced privacy and security. This approach eliminates the need to transfer sensitive data to external servers for processing, thereby reducing the risk of data breaches and unauthorized access, therefore the need to use on-device camera sensor.
On this tab, the user had the option to choose the retrieval frequency of the images and the number of samples to grab and save.

Information

To avoid wasted samples, the camera sensor must be pointing to the object of interest before launching the retrieval process.

Once the retrieval process is done, you are now set to move to the next tab.

4.3.2. Data Visualization[edit | edit source]

This tab displays the retrieved images to inspect visually their quality to identify potential errors or inconsistencies in the data retrieval process. There are two sections called Old data and New data. This is a solution for a common problem known as catastrophique forgetting, an issue occurring when a neural network trained on new tasks or data loses performance on previously learned tasks. This happens because updating the model's parameters to optimize for new data overwrites the knowledge encoded from old data, akin to a system forgetting past learning. Keeping a small subset of old data and interleaving them with the new ones during training mitigates the issue, by refreshing the model's memory of prior patterns.

Start by setting the percentage of old data you want to add to your new data by moving the scale.
Choose the percentage of the data you want to set for training. 10% is set by default for testing. The remaining part is allocated for evaluation during the training phase.

Once the dataset splitting process is done, you will be redirected automatically to the next tab.

Information

After pressing split, 20% of the newly retrieved images are copied to the old_images directory for future training sessions.
As seen on the figure above, it is normal to have empty old images directory if you are running the application for the first time.

4.3.3. Data Annotation using teacher model[edit | edit source]

To annotate the collected data, run inferences using the STAI_MPU API and the RT-DETR model converted previously to ONNX format and deployed on target as shown in this section.

The use of this tab is straightforward: pressing the launch annotation button starts the annotation process. The annotated images are displayed one after another on this tab, to help you visually monitor the annotation process, which may take some time.

Important

Some of the model's operator are not supported by the NPU/GPU compiler, hence the unique execution provider for these inference session should be the CPU. For this reason, and because of the teacher model size, each of the inferences takes around 13 seconds to reach high accuracy predictions.
The inference session of the teacher model occupies the two cores of the CPU. Be patient during this process.

Information

At the end of the teacher model inference session, you can inspect visually the results of the annotation process using the teacher model, by pressing twice on each image. A pop-up with the image containing the bounding-box around the object of interest appears. Pressing once on the pop-up closes it.

Once the annotation process is done, you are now set to move to the next tab which represents the training phase.

4.3.4. Training, quantizing and evaluating the student model[edit | edit source]

The next step is to train the student model using the annotations generated by the teacher model. This process involves feeding the pre-processed images and their corresponding labels into the student model, optimizing the model parameters through iterative learning. The goal is to achieve a model that efficiently and accurately detects objects, even with potentially fewer resources or simpler architecture compared to the teacher model.
As you can notice on this tab, you have the option to set the number of epochs for which you want to train your student model, the learning rate, and the batch size.

Warning

For the batch size, we recommend keeping the default configuration. Using a higher batch size may result in out of memory error, and the application may crash due to limited RAM on the device.

Once the parameters are set, you can launch a training session by pressing the associated button. The training session may take few minutes, depending upon the number of samples and the number of epochs.

The final step of this tab is to run an evaluation on the eval set images to notice the mean average precision of the models before and after training, and on both old and new data image, to monitor any old patterns forgetting. After the end of the training session and the evaluation, the new inference model is exported automatically and quantized on device to allow it to run in inference mode on the NPU/GPU through the STAI_MPU API, as shown in the next paragraph.

Information

The newly generated inference model after training is quantized using a calibration dataset, which is a subset of the training set. The quantization scheme is static per-channel, to minimize the accuracy loss after quantization.

4.3.5. Inferencing using the new updated student model[edit | edit source]

The final step of this workflow is to run the newly trained and exported model in an inference application based on live video stream from the camera sensor. This enables a visual validation of the new model's behavior, compared to the old model. You are allowed to switch between the two models by pressing on one of the buttons in the bottom.

Information

The inference time for this model is around 290 ms, as it runs on the CPU in per-tensor quantization scheme. It is possible to run it on the GPU/NPU using the ONNXRuntime Execution Provider.

5. References[edit | edit source]

[stm32hotspot_url-1] PyTorch-SSD repo

[ultralytics_url-2] Ultralytics

[orttraining_url-3] ONNXRuntime training C++ API.

[onnx_url-4] ONNX

[ort_url-5] GTK

[1]

[2]

[3]

[4]

[5]