Last edited 6 months ago

SWIOTLB mechanism overview

Applicable for STM32MP25x lines

1. Article purpose[edit source]

This article explains the Linux® kernel SWIOTLB (SoftWare Input Output Translation Lookaside Buffer) mechanism. SWIOTLB has been introduced for platforms embedding DMA masters which can't access to more than 32bits address space and which have no hardware IOMMU. SWIOTLB can be seen as a software IOMMU. This SWIOTLB mechanism is a native Linux® kernel feature which is enabled by default.

2. STM32 memory space[edit source]

To understand the need of the SWIOTLB feature, the STM32 memory space has to be inspected. Basically, it can be represented as following:

  • The first 2GB [0x0000 0000 - 0x7fff ffff] of the memory space is used for internal memories and internal peripherals.
  • The rest of the memory space is used for the DDR:
    • [0x8000 0000 - 0xffff ffff] for DDR up to 2GB.
    • [0x8000 0000 - 0x17fff ffff] for DDR up to 4GB.

The CPU (Arm® Cortex®-A35) and some peripherals are "master" on the bus and can access directly to the DDR. Those peripherals are basically ones which embed a DMA: HPDMA, ETH, SDMMC, DCMIPP, USB3, USBH, DCMIPP, LTDC, VDENC, VENC, PCIE.

Alternate text
STM32 Memory space

On STM32MP25x lines More info.png, all bus master peripherals (except the Arm® Cortex®A35) are only 32 bits compatible, meaning that they cannot access to any address greater than 0xffffffff. It is not a problem for a 2GB (or smaller) DDR configuration but for DDR greater than 2GB it is. Indeed, if the application allocates a buffer inside an area above the first 2GB of DDR then bus master peripherals can't access to it.

The SWIOTLB mechanism solves this issue.

3. SWIOTLB feature[edit source]

SWIOTLB feature avoids a DMA to access to a buffer out of its boundaries. As soon a dma_map_xxx API is called, SWIOTLB code checks the DMA capability (32bit/more) of the peripheral:

  • If the address of the buffer to transmit is in the range of the DMA capability, then no problem.
  • If the address of the buffer to transmit is higher than the DMA capability then the SWIOTLB copies the buffer to transmit in an area (the "aperture") which is accessible by the DMA. The DMA will then access to this new allocated buffer (the "bounce buffer"). This copy is done thanks a "memcpy" and can impact performances.

More details explained in swiotlb tutorial[1].

Alternate text
STM32 Memory space

Useful information about SWIOTLB:

  • SWIOTLB reserves by default 64MB of DDR for the aperture. If this value is not enough, you will get the following kernel log:
swiotlb buffer is full (sz: 64 bytes), total 32768(slots), used 32768 (slots)
  • It can be fixed by increasing the aperture area size used for SWIOTLB thanks the kernel command line
  • Add swiotlb=n with n = the number of TLB slabs requested. On our STM32 platform a slab = 2KB. So, if 128MB is needed for the SWIOTLB aperture then you need to set in kernel command line: swiotlb=65536. See kernel command line documentation [2]
  • SWIOTLB can't recopy a buffer greater than 256KB. If you try to recopy a buffer greater than 256KB you get the following kernel log
swiotlb buffer is full (sz: 1081345 bytes), total 32768 (slots), used 80 (slots)

4. Good practices[edit source]

SWIOTLB is an efficient feature to make all DMA transfers possible (whatever buffer location) but, as seen upper in this article, it adds an extra memcpy which impacts performances.

To avoid this drawback, user must avoid allocating a buffer which is not accessible by the DMA.
On STM32 platforms, it means to avoid allocating a buffer after the first 2GB of DDR. Basically, the buffer should either be allocated with GFP_DMA flag instead of GFP_KERNEL or use dma_alloc_coherent allocation.

More details in kernel DMA API [3].

5. References[edit source]

  1. [1], SWIOTLB tutorial
  2. [2], kernel command line documentation
  3. [3] Linux® kernel DMA API