Skip to main content

AN0012 SPU input and output DMA

TriCore™ TC3xx AURIX™ family

About this document

Scope and purpose

The second generation AURIX™ offers enormous computing power and fast responsive real-time behavior. It also supports low-, mid-, and high-end Radar applications. To ensure the fastest signal processing capabilities a high-performance signal processing unit (SPU) is integrated.

This document describes the input and output direct access memory (DMA) mechanism of the SPU using second generation AURIX™.

This application note is applicable for all second generation AURIX™ devices with at least one SPU instance available. (TC39x, TC35x, TC33x).

Intended audience

This document is intended for design engineers, technicians, and developers of electronic systems with basic knowledge of radar processing and the SPU.

Overview

Input and output DMA in the signal processing unit (SPU) subsystem

Figure 1 shows input and output DMA (marked in yellow) in the SPU subsystem. The input DMA engine can either passively accept data pushed by the analog-digital-converter (ADC) interfaces (RIF or Radar Interface) or actively load data of an existing data cube from the Radar Memory also referred to as Extension Memory (EMEM). The task of the input DMA is to structure data in the buffer memory for subsequent processing in the SPU pipeline. Thus, the input DMA has to make sure that all data required to perform the configured processing steps is fed into the processing pipeline at a time. Examples where a complete data set is required include:

  1. Fast Fourier Transforms (FFT): All data points to perform the FFT need to be available
  2. Integration of results across multiple antennas: Data points from all antennas need to be available

The output DMA engine is used to write results to Radar Memory (EMEM).

Figure 1. Input DMA and output DMA in the SPU system


Working principles of input DMA

Depiction of radar data in Radar Memory

In radar applications data can often be ordered in three dimensions: multiple samples taken along a frequency sweep, multiple subsequent frequency sweeps (ramps), and data recorded by multiple antennas. To account for these three dimensions of data in Radar Memory, the schematic shown in Figure 2 is used to represent the flat 1D data array stored in Radar Memory. The two left columns are used as indices for – depending on context – antenna, ramp or sample index.

For better understanding concrete values are depicted while these can be adapted to each dedicated use case. The values used are summarized in Table 1.

Table 1. Parameters and values of radar data dimensions used in this document
Parameter Symbol Value
Number of samples per ramp N s a m p l e s 1024
Number of antennae N a n t 4
Number of ramps N r a m p s 128
Number of range bins after 1st FFT N r a n g e b 512
Figure 2. Structure of radar data in Radar Memory


Note:
The marked memory location (orange square) corresponds to (ramp 1-antenna 0-sample 0).

Using EMEM input

The input DMA operates either with the RIF or the Radar Memory (EMEM) as data source (compare Figure 1). This section describes the use of EMEM data as input.

The input DMA engine is designed to perform data transfers from Radar Memory to the SPU processing pipeline using address offsets from a configurable base address. Data is read from Radar Memory in 256-bit words and the input DMA can read 256 bits of data every two clock cycles.

In order to access data in Radar Memory the three dimensionional structure is accounted for by three configurable loops that generate the address offset from the configured base address:

  • Three offset values (two configurable and one that has to be set to sample size)
  • Three configurable repeat values

These three nested loops allow rotation of the 3-dimensional data cube when the SPU is loading data from or writing data to the EMEM. The offsets are used in a “bin loop”, an “inner loop” and an “outer loop” processing flow. The inner loop can potentially be executed multiple times for each execution of the outer loop and the bin loop can be executed multiple times for each execution of the inner loop. Code Listing 1 shows how the address offset is calculated from loop parameters.

Code Listing 1 Pseudocode illustrating the calculation of the adress offset from loop parameters

001	/* loop parameters (Input DMA configuration parameters bold)
002	adroffset: address offset
003	ol_cnt: outer loop counter
004	olr: outer loop maximum repeat value
005	ilo: outer loop offset 
006	il_cnt: inner loop counter
007	ilr: inner loop maximum repeat value
008	olo: inner loop offset 
009	bl_cnt: bin loop counter
010	blr: bin loop maximum repeat value
011	blo: bin loop offset    
012	*/
013	for(ol_cnt=0;ol_cnt<olr;ol_cnt++){
014	    for(il_cnt=0;il_cnt<ilr;il_cnt++){
015	        for(bl_cnt=0;bl_cnt<blr;bl_cnt++){
016		     adroffset = blo*bl_cnt + ilo*il_cnt + olo*ol_cnt
017	        }
018	    }
019	}

Example data read for first stage (range) FFT

This section provides an example where the bin loop offset is set to sample size. Note that examples in Sections Example data read for first stage (range) FFT, Example data read for second stage (Doppler) FFT: How to use transpose read, and Setting inner loop offset to sample size differ with respect to which loop is being set to sample size. The mandatory settings for addressing mode, processing mode and number of blocks described in each section also apply in cases of different concrete settings of e.g., number of ramps and in cases where the other two loops are switched.

In order to provide loop offset and loop repeat configuration parameters (Table 2) for performing a range FFT refer to the structured drawing in Figure 3. In this case, the bins in Radar Memory to be used as consecutive input bins for each FFT dataset are stored in consecutive addresses in Radar Memory hence, each 256 bit read fetches multiple samples for the same FFT, e.g., four, eight, or sixteen bins depending on the precision of the data used. Note that in the value of inner and outer loop offset the number of samples and the number of antennae needs to be considered.

Table 2. Example loop offset and loop repeat configuration parameters (related to Figure 2)
Name Abbreviation Value
Bin loop offset blo sample size in bytes
Inner loop offset ilo N s a m p l e s
Outer loop offset olo N s a m p l e s × N a n t
Bin loop repeat blr N s a m p l e s
Inner loop repeat ilr N a n t
Outer loop repeat olr N r a m p s

As consecutive samples to be used as inputs to the FFT are stored in consecutive addresses in Radar Memory address transposition is not required and the linear addressing mode must be selected. Bandwidth optimisation is not needed in this mode as all data fetched in a 256 bit read is used in the same FFT dataset (compare illustration in Figure 3). For details on address transposition and bandwidth optimization refer to Section Example data read for second stage (Doppler) FFT: How to use transpose read.

Figure 3. Radar data structure in Radar Memory prior to range FFT


Note:
All data fetched in a 256 bit read is used in the same range FFT dataset.

To perform an integration across the results from multiple antenna in the Math2 unit of SPU, data from all antennae has to be available in Math2 for integration. This can be ensured by configuring the input DMA accordingly. By setting the Processing Mode to Integration the input DMA takes care that data from all antennae is available for subsequent integration.

Table 3. ID_RM_CONF register settings required when bin loop offset is set to sample size
Description Field Value Comment
Addressing Mode TRNSPS 0 (linear) Mandatory setting
Processing Mode PM 0,1 (default, integration) Can be selected

Number of simultaneous data blocks

from RAM (bandwidth optimization)

BLOCKS 0 (1 data block) Mandatory setting

Example data read for second stage (Doppler) FFT: How to use transpose read

This section provides an example where the outer loop offset is set to sample size. Note that examples in Sections Example data read for first stage (range) FFT, Example data read for second stage (Doppler) FFT: How to use transpose read, and Setting inner loop offset to sample size differ with respect to which loop is being set to sample size. The mandatory settings for addressing mode, processing mode and number of blocks described in each section also apply in cases of different concrete settings of e.g., number of ramps and in cases where the other two loops are switched.

In many cases in radar processing the first (range) FFT is followed by a second FFT referred to as Doppler FFT. This FFT is performed accross all ramps of a dedicated range bin of a dedicated anteannae. The data that the input DMA needs to read for this Doppler FFT is marked in purple in Figure 4. As consecutive samples to be used as inputs to the FFT are stored in non-consecutive addresses in Radar Memory, address transposition is required. This is supported by the input DMA with a hardware transpose unit and is enabled via selection of transpose addressing mode. As the outer loop offset is set to sample size, integration mode needs to be enabled as well (compare mandatory setting in Table 5).

Figure 4. Radar data structure in Radar Memory after range FFT


Note:
To make efficient use of the entire 256-bit word read bandwidth optimization can be enabled.

Loop parameter setting for the second stage FFT example are provided in Table 4. Note that the number of samples is set to the number of range bins in this case.

Table 4. Example loop offset and loop repeat configuration parameters for second stage FFT (related to Figure 4)
Name Abbreviation Value
Bin loop offset blo N s a m p l e s × N a n t
Inner loop offset ilo N s a m p l e s
Outer loop offset olo sample size in bytes
Bin loop repeat blr N r a m p s
Inner loop repeat ilr N a n t
Outer loop repeat olr N s a m p l e s

In the example sketched in Figure 4, the orange rectangle indicates the 256-bit word read of the input DMA from Radar Memory. When retaining the current configuration, the full 256-bit word read is not utilized efficiently, resulting in only the first range bin (e.g. bin ‘0’ in the first read) being written to the buffer memory.

To avoid discarding data and wasting bandwidth, build multiple data blocks in buffer memory by setting the BLOCKS bitfield to a value other than ‘0’. This is referred to as bandwith optimization.

Table 5. ID_RM_CONF register settings required when outer loop offset is set to sample size
Description Field Value Comment
Addressing Mode TRNSPS 1 (transpose) Mandatory setting
Processing Mode PM 1 (integration) Mandatory setting

Number of simultaneous data blocks

from RAM (bandwidth optimization)

BLOCKS 0,1,3,7 (1,2,4,8 data blocks) Can be selected

Setting the BLOCKS parameter to ‘0’ results in one dataset being stored in buffer memory at a time. The first buffer is sketched in Figure 5.

Figure 5. Data structure in first buffer of input DMA with bandwidth optimization disabled (BLOCKS = 0)


Setting the BLOCKS parameter to ‘3’ results in four datasets being stored in buffer at a time. The first buffer is sketched in Figure 6.

Note:
The bin offset and bin repeat values do not need to be adapted when changing the BLOCKS parameter.
Figure 6. Data structure in first buffer of input DMA with bandwidth optimization enabled (BLOCKS = 3)


Setting inner loop offset to sample size

This section provides an example where the inner loop offset is set to sample size. Note that examples in Sections Example data read for first stage (range) FFT, Example data read for second stage (Doppler) FFT: How to use transpose read, and Setting inner loop offset to sample size differ with respect to which loop is being set to sample size. The mandatory settings for addressing mode, processing mode and number of blocks described in each section also apply in cases of different concrete settings of e.g., number of ramps and in cases where the other two loops are switched.

The second case in which consecutive samples to be used as inputs to the FFT are stored in non-consecutive addresses in Radar Memory is sketched in Figure 7.

Figure 7. Radar data is structured in Radar Memory after range FFT


Transpose addressing needs to be selected.

Note:
In this case the inner loop offset is set to sample size and integration mode is not supported in this case (Table 6). Additionally, bandwidth optimization is not possible and the number of blocks must be set to ‘1’ (number of blocks is equal to BLOCKS field value ‘+1’).
Table 6. Example loop offsets and repeat values (related to Figure 7 and Figure 2)
Name Abbreviation Value
Bin loop offset blo N s a m p l e s
Inner loop offset ilo sample size in bytes
Outer loop offset olo N s a m p l e s × N r a m p s
Bin loop repeat blr N r a m p s
Inner loop repeat ilr N s a m p l e s
Outer loop repeat olr N a n t
Table 7. ID_RM_CONF register settings required when inner loop offset is set to sample size
Description Field Value Comment
Addressing Mode TRNSPS 1 (transpose) Mandatory setting
Processing Mode PM 0 (default) integration mode not supported

Number of simultaneous data blocks

from RAM (bandwidth optimization)

BLOCKS 0 (1 data block) Mandatory setting

Additional limitations to consider

The buffer RAM size limit of 32 KBytes

Input and output DMA have a buffer RAM size limit of 32 KBytes that has to be considered. Applying a transpose read with a high number of ramps and bandwidth optimization enabled might cause violation of the buffer limits:

  • Example 1: Buffer limit not violated: 256 ramps * 6 antennas * 4 Data blocks * 4 Byte per sample= 24.5 KByte
  • Example 2: Buffer limit violated: 256 ramps * 6 antennas * 8 Data blocks * 4 Byte per sample = 49 KByte
32-byte data alignment in Radar Memory

Each dataset in memory starts at a 32-byte aligned address. This allows the bandwidth optimization hardware to function and is enforced by the output Data Manager. This restriction has to be observed when using software to build data arrays in memory for reading by the input Data Manager.

Maximum number of 2028 bins in an FFT dataset

The maximum number of bins supported in an FFT dataset is 2048.

Note:
Setting a Bin Loop Repeat value for an FFT size greater than ID_RM_BLR.BLR > 2047 is not supported and should be avoided.

Using RIF input

The input DMA operates either with the RIF or the Radar Memory (EMEM) as data source (compare Figure 1). This section describes the use of RIF data as input.

The RIF provides a parallel memory interface to the SPU unit with a 32-bit bus width. When loading data from the RIF or RIFs, processing mode is always default and the input data format and number of antennae must be aligned with the RIF configuration settings.

The SPU expects 16 bit signed Qm.n integers as an input from the RIF, which is delivered to the SPU in a 32-bit packet. While the RIF accepts a wider range of formats it is responsible for adjusting the direction, data length (16 bits) and format of the incoming ADC data so it fits the SPU RIF input format.

Configuration setting for using the RIF as a data source for input DMA are:

  • Data source (ADC IF 0 or ADC IF 1 or both)
  • Ramps per measurement cycle
  • Number of active antennae
  • Input format (signed or unsigned, real, or complex)
  • Samples per ramp

The input DMA implements a ramp counter (referred to as partial acquisition counter PACTR) which can be used to detect if excess data is received or to allow for a delay of start of processing of the SPU. The counter can be configured to generate an error or an interrupt if a particular value is reached.

Working principles of output DMA

The output DMA Engine has eight independent data channels with one output FIFO per result source (e.g., FFT, NCI, LOG2). The data width of the output DMA write path to the Radar Memory is 256-bit.

For each output channel, a base address can be configured. The write address will be incremented automatically so that the results of each channel are stored as an array in memory starting from the defined base address. Each FFT result always starts at a 256 bit (32 byte) aligned address. This is mandatory if the data is to be read back into the SPU as the read operations controlled by the input Data Manager require the start address of each dataset to be 256-bit aligned.

The exception is the “In Place FFT” mode for writing the FFT results to the same memory location as the input data used for the FFT. In this case, the start address is configurable while the address sequence is calculated using the input DMA Engine Loop Repeat and Loop Offset parameters. Input DMA Engine Loop Repeat and Loop Offset parameters are inherited from the input DMA setting. In this case the input and output data size in Bytes must be the same.

References

  1. Infineon Technologies AG: AURIX™ TC3xx User Manual Part-1; Available online

Revision history

Document revision Date Description of changes

1.0

2024-08-05

Initial release

1.1 2025-05-28 Template update; no content update
Last updated on