AN0001 AURIX™ TC3xx functional safety (FUSA) in a nutshell

32-bit TriCore™ AURIX™ TC3xx microcontroller

About this document

Scope and purpose

As requirements from functional safety standards in automotive, industrial and other fields are a challenging subject, this document intends to provide a first set of guidelines for users who are unfamiliar using the AURIX™ TC3xx microcontroller unit (MCU) in a functional safety scope.

Intended audience

This application note is intended for all those evaluating the AURIX™ TC3xx MCU, including functional safety engineers on the customer side and application engineers. This includes designers of safety-related systems who:

Are new to functional safety
Want to know more about functional safety (also called “FUSA”) applications
Want to understand in principle how functional safety can be implemented with hardware support
Are looking for functional safety details that cannot be found in the MCU user manual

Structure of the document

This document is divided into four main sections:

Section
Introduction to main safety concepts
, Section
AURIX™ TC3xx products platform and scalability
, and Section
Introduction to FUSA application
are introductory sections to the AURIX™ TC3xx platform, functional safety and first steps in this complex world
Section
AURIX™ TC3xx MCU and FUSA
presents the structure of AURIX™ TC3xx and how safety is built within
Section
ISO 26262 - Electric power steering
and Section
ISO 26262 - XEV traction inverter
discuss application use cases with AURIX™ TC3xx and its interoperability with other Infineon-suitable chips
Section
Safety software enablement and AURIX™ TC3xx
explains how safety software fits with AURIX™ TC3xx built-in features

Disclaimer

Sections about application use cases (Section

ISO 26262 - Electric power steering

and Section

ISO 26262 - XEV traction inverter

) are for training purposes only and are not to be taken as a blueprint for developing such units.

Introduction to main safety concepts

To explain how to proceed when facing functional safety aspects using AURIX™ TC3xx, the following sections provide a brief introduction to basic safety concepts.

Functional safety

Functional safety defines an entire domain of modern industrial activities. In general, safety is used in relation to situations that can cause harm to humans or generally, the risk of physical injury or damage to the overall health of people (that is, a safe system will not cause harm to humans). In general, no system can be created completely safe, so the functional safety domain focuses on reducing the risk of harm to an acceptable level. The acceptable level is society-dependent and can be differently evaluated depending on the social context.

Functional safety is described as follows:

In the umbrella standard (IEC 61508:2010):
As part of the overall safety that relates to the following:
- Equipment under control (EUC)
- Control system of the EUC that depends on the correct functioning of the Electric/Electronic/Programmable (E/E/PE) safety-related systems
- Other risk reduction measures
In the automotive standard (ISO 26262:2018):

Absence of unreasonable risk due to hazards caused by the malfunctioning behavior of E/E systems.

The electronic components are clearly mentioned in the above two definitions; therefore, this domain is relevant to semiconductors.

The functional safety process starts with a hazard analysis and risk assessment (HARA) of the relevant system or subsystem by suitably qualified and experienced personnel.

From the analysis and assessment, individual safety goals are defined with the specific objective of avoiding harm during an operational condition of the vehicle/appliance or of the automated action in general.

To each of these goals, a corresponding safety integrity level (SIL) as specified in the umbrella standard IEC 61508 is assigned based upon the risk evaluation. In the automotive domain, the acceptable risk level is called Automotive Safety Integrity Level (ASIL).

From the system level, the safety goals are translated into safety requirements for subsystems and individual hardware components. Once the design is complete, verification is carried out by a combination of the component manufacturer and the system manufacturer following the 'V'-model.

Systematic and random faults

Faults in a functional safety system can be broadly classified into the following two categories:

Systematic faults: A fault in design or manufacturing that can be present in hardware and software. The existence of systematic faults can be reduced through continual and rigorous process improvement and robust analysis of any new technology or component
Random faults: A fault of a hardware element that follows a probabilistic distribution. Random faults are limited to hardware. The rate of random faults cannot be reduced. It is important to keep the focus on:
- Prevention measures such as process and design (for example, layout rules)
- Detection and mitigation by safety mechanisms (for example, ECC, redundant data storage)

Random hardware faults can be permanent or transient. If the fault is permanent, it will stay there over time.

In case where errors are transient, they can be removed by writing or resetting or setting a new value. In

Figure 2

, it is possible to find a simplified representation of the major cause of transient faults in semiconductors. Alpha and neutron particles cause transient faults that need to be considered when determining the failure rate of a chip.

ISO 26262 and IEC 61508 standards perspective

AURIX™ TC3xx was initially developed for automotive systems and is compliant with the ISO 26262:2018 standard. At the same time, compliance with IEC 61508:2010 was also assessed.

Table 1

summarizes the main differences between the two standards relating to their applicability to AURIX™ TC3xx.

Table 1. ISO 26262 and IEC 61508 standards applicability to AURIX™ TC3xx
Section	ISO 26262	IEC 61508
Application field	12-part standard that is strictly for on-road vehicles, such as passenger cars, trucks, buses and motorcycles, covering the concept up to the production stage for electrical/electronic systems. This standard is tailored to the needs of the automotive industry. Originated from IEC 61508 for automotive.	7-part industrial-related standard; most often used for machinery, oil wells, chemical plants, nuclear sites, forklifts and robots. This standard refers to industrially relevant technical standards for EMC, communication and cybersecurity.
Safety classification	Classification is based on Automotive Safety Integrity Levels (ASIL). ASIL: A (least stringent), B, C, D (most stringent)	Classification is based on Safety Integrity Level (SIL). SIL: 1 (least critical), 2, 3, 4 (most critical)
Functional Safety	definition is in ISO 26262-1:2018 clause 3.67	definition is in IEC 61508-4:2010 clause 3.1.12
Areas covered	it covers safety management, system/HW design, SW design, production and operation of safety-critical E/E/PE systems, but the same is valid for components.	Covers safety management, system/HW design, SW design, production and operation of safety-critical E/E/PE systems.
“Components” view	Automotive systems distinguish system design from hardware component design. “Components” used in the system require specific compliance with the ISO 26262 standard. One life cycle for all (tailoring concept). ISO 26262-11 is specific for semiconductor development.	A hardware component compliant with IEC 61508 is called a “compliant item”. The HW component life cycle is introduced for “ASICs”. ISO 61508-2 Annex E and F are for semiconductors.
How safety is implemented	The safety goal concept requires risk reduction to be a part of the initial control system design.	The safety function concept was initially based on the idea of defining equipment under control (EUC) and then building risk reduction measures for the system.
Documentation	ISO 26262 clearly defines work products for each requirement. Confirmation reviews with independent reviewers, dependent on ASIL, are requested.	General considerations on documentation are reported in Part 1, Clause 5. No confirmation reviews are requested; only assessments with independent assessors. Relating documents to be provided, there are less detailed requirements (no WPs).
SIL and ASIL determination	To determine the ASIL level of a system, a risk assessment must be performed for all hazards identified. Risk comprises three components: severity, exposure and controllability.	The SIL level of a product is determined by three factors: Systematic capability rating : If the quality management system meets the requirements of IEC 61508, a SIL capability rating is issued. Architectural constraints for the element : Architectural constraints are established by Route 1H or Route 2H. Route 1H involves calculating the Safe Failure Fraction for the element. PFH (or PFDavg) calculation for the product : PFH is the average frequency of a dangerous failure of the safety function [h-1] for high demand mode of operation or continuous mode of operation, while PFDavg is the average probability of a dangerous failure on demand of the safety function operating in low demand mode of operation.
Corresponding terms	Item Defined in ISO 26262-1:2018 respectively at clause 3.41	Functional unit Defined in IEC 61508-4:2010 respectively at clause 3.4.5
Corresponding terms	Element, Fault, Failure Defined in ISO 26262-1:2018 respectively at clause 3.41, clause 3.54 and clause 3.50	Element, Fault, Failure Defined in IEC 61508-4:2010 respectively at clause 3.4.5, clause 3.6.1 and clause 3.6.4
Decomposition versus synthesis	ASIL decomposition is defined in ISO 26262-1:2018 clause 3.3 An ASIL D safety requirement can be decomposed as: ASIL D (D) + ASIL QM (D) or ASIL C (D) + ASIL A (D) or ASIL B (D) + ASIL B (D)	According to IEC 61508-2:2010, SIL synthesis essentially allows the synthesis (or combining) of two redundant elements with a systematic capability of ‘N’ to have a systematic capability of ‘N + 1’, with ‘N’ less than or equal to SIL 3. The rules for SIL synthesis according to IEC 61508 are: SIL 2 + SIL 2 gives SIL 3 SIL 1 + SIL 1 gives SIL 2 The IEC 61508 standard does not allow recursive SIL synthesis and in addition, the two combined elements should have the same SIL. IEC 61508 also requires a two-channel implementation for SIL 4 systems (the hardware fault tolerance has to be >0 for a SIL 4 function).
Failure rate (λ) Expressed in FIT (see Failure rate )	λ = λ _SPF + λ _RF + λ _MPF + λ _S	λ = λ _S + λ _D = (λ _SD + λ _SU) + (λ _DD + λ _DU )
Definitions for the different component of failure rate	λ _SPF – Single-point faults λ _RF – Residual faults λ _MPFDP – Detected/perceived multi-point faults λ _MPFL – Latent multi-point faults λ _MPF = λ _MPFDP + λ _MPFL – Multi-point faults λ _S – Safe faults Expressed in FIT	λ _S – Safe failure rate: No impact on safety function λ _SD – Safe detected failure rate − λ _SU – Safe undetected failure rate • λ _D – Dangerous failure rate − Impact on safety function − λ _DD – Dangerous detected failure rate − λ _DU – Dangerous undetected failure rate Expressed in FIT
Metrics	In automotive systems, metric targets are mandatory on the item level and are related to both single- and multi-point faults.	In IEC 61508 metrics, the most relevant factors are single-point faults, even if they include common cause evaluation through a β factor.
Probabilistic metrics	Probabilistic Metric for Random Hardware Failures (PMHF): Quantitative criteria for the residual risk of a safety goal violation due to random hardware failures. In simple terms: A metric to show the robustness of a safety architecture. P _MHF = λ _SPF + λ _RF + 0,5 × λ _{SM1, DPF, latent} × λ _{IF, DPF} × T _lifetime Expressed in FIT	In an architecture without redundancy (1oo1) PFH = λ _DU PFH definition is in IEC 61508-4:2010 clause 3.6.19 Expressed in FIT
Similar metrics terms	Single Point Fault Metric (SPFM): Quantitative criteria for the effectiveness of the safety architecture to cope with single-point and residual faults. In simple terms, metric for the share of remaining dangerous faults in relation to all faults. Expressed in percentage	Safe Failure Fraction (SFF): Ratio of safe and dangerous (but detected) failures in a system safety function to the total failure rate. SFF exact definition is in IEC 61508-4:2010 clause 3.6.15 SFF is calculated at the element (component) or system level for a safety function. It should not be applied to sub-elements. Expressed in percentage
Metrics terms unique to ISO	Latent Fault Metric (LFM): Quantitative criteria for the effectiveness of the safety architecture to cope with latent dual-point faults.In simple terms: A metric for the share of remaining critical latent faults in relation to all dual-point faults. Expressed in percentage
Terms unique to IEC		Low-demand mode safety functions are required to operate at low frequencies, typically once or more per year. Low-demand functions have less stringent requirements on PFDavg (the average probability of a dangerous failure on demand of the safety function) to achieve a specific SIL. High-demand mode safety functions are required to operate at high frequencies, typically many times per hour. High demand and continuous demand functions have more stringent requirements on PFH (average frequency of a dangerous failure of the safety function) to achieve a specific SIL. Continuous-demand mode safety functions operate continuously. For more details refer to IEC 61508-4:2010 clause3.5.16 Type A products are simple products in which all failure modes are known. For more details refer to IEC 61508-2:2010 clause 7.4.4.1.2. Type B products are complex products in which not all failure modes are known (for example, semiconductor). For more details refer to IEC 61508-2:2010 clause 7.4.4.1.3. Hardware Fault Tolerance (HFT) HFT is the number of faults that can occur without failure of the safety function. A hardware fault tolerance of N means that N+1 is the minimum number of faults that can cause a loss of the safety function. For more details refer to IEC 61508-2:2010 clause 7.4.4.1. For AURIX™ TC3xx, HFT is equal to 0. This means that the fault might be detected, but safety functionality is lost with one fault. With a hardware fault tolerance of 0 (in other words, 1oo1 redundancy), the maximum safety integrity level that can be achieved by a Type B (complex semiconductor) safety-related element is SIL 3. HFT > 0 requires redundancy.
Fault Tree Analysis	A Fault Tree Analysis or equivalent top-down analysis is required in the case of ASIL C and ASIL D.	A Fault Tree Analysis or equivalent is only “R” (recommended) in IEC 61508.
Dependent Failure Analysis	DFA is the analysis to identify single events that can cause multiple sub-parts to malfunction (for example, intended function and its safety mechanism) and lead to a violation of a safety requirement or safety goal. DFA is qualitative in automotive standard.	DFA is quantitative and faults in the diagnostic circuit can contribute to FMEDA metrics with the so-called beta factor.

Safety Element out of Context (SEooC) in automotive

AURIX™ TC3xx is an MCU developed for various applications.

Since it is not tailored for a specific item, according to automotive safety standard ISO 26262 part 10, the AURIX™ TC3xx is a Safety Element out of Context (SEooC) hardware component.

As ISO 26262-10:2018 highlights, the development of an MCU starts with an assumption of system-level attributes and requirements. It is the responsibility of the system integrator to integrate the SEooC assumptions of use.

According to the ISO 26262 classification, the MCU is a hardware component that performs a set of functions at the item level as a part of a system. A system, as it is defined in ISO 26262-1, is composed of at least three related elements: a sensor, a controller and an actuator.

Figure 3

shows the typical use of the AURIX™ TC3xx in the context of an electronic control unit (ECU).

Inputs are provided by one or more sensors at the system level, processed by the HW components on the ECU and forwarded to the input channels of the MCU
The MCU processes the data and provides outputs to other hardware components
Hardware components drive one or multiple actuators or transmit data to another ECU via a communication network

Figure 3. AURIX™ TC3xx in the context of an electronic control unit (ECU)

Fail-safe system

A system is said to be fail-safe if it is designed such that in the event of a failure of any element of the system, the system prevents harm to humans.

This is accomplished by having the system enter a safe state if any safety-relevant failure occurs or if it detects a “latent” failure that cannot be corrected immediately.

Failure rate

Failure rate is the frequency or rate with which a system or component fails, expressed in failures per hour.

Symbol: λ(lambda)

Unit: 1 FIT = 10

^-9

^-1

(failure in time)

Failure rates scale depending on time and the number of systems or components.

Examples of different meanings of 1 FIT:

If there are 10
⁹
systems or components, one of them will fail every hour.

If there are 10
⁵
systems or components working 10
⁴
hours consecutively, one of them will fail.

ISO 26262 perspective

One of the key metrics for a functional safety system is the time to reach a safe state after a fault occurs.

This period, known as the Fault Handling Time Interval (FHTI), is the sum of two elements:

Fault detection time (FDTI)
Fault reaction time (FRTI)

A more commonly used term, similar to FHTI, is the Fault Tolerant Time Interval (FTTI) , which is defined in ISO 26262-1:2018 clause 3.61 and provides the minimum time before a system could become dangerous when a fault occurs.

Figure 4

shows a graphical representation of the relationship between these timings.

The worst case for the fault detection time is application-specific and defined by the diagnostic time interval. All hardware safety mechanisms within AURIX™ TC3xx hardware provide a very fast fault detection time, in the order of microseconds.

IEC 61508 perspective

A term corresponding to FTTI in the IEC 61508 standard is the “process safety time”. This time is defined in IEC 61508-4:2010 at clause 3.6.20. In general, the time to react to a fault is longer in industrial applications with respect to automotive ones.

Protective measures

When the need for a protective measure is identified and the classification is determined, the measure must be implemented in the system.

Safety systems can have various principles of operation, for example:

One single device is inherently fail-safe (so without integrated primary or secondary protection)
One single device with periodic self-testing and monitoring, where the control layer and primary and secondary protection layers are integrated into one single device
Two independent devices are compared using the same or different technology Secondary protection is provided by the comparison

Single device, inherently fail-safe

Electronic fail-safe devices can include fuses, circuit breakers or current-limiting circuits, which interrupt electrical currents under overload conditions. As a result, they directly prevent damage to wiring or circuit devices.

Single device with periodic self-testing and monitoring

One of the most common safety architectures is what some industrial standards call a "single device with periodic self-testing and monitoring". In this architecture, protective measures can be implemented in a number of layers, as shown in

Figure 5

Safety-classified functionalities that will lead directly to a hazard are implemented through a control layer plus a primary and secondary protection layer. This means that the system needs to be safe even when two independent faults occur.

The worst case is when two faults happen, one in the control layer and another in the primary protection layer, at a time distance that depends on the acceptable risk for the system (normally 12–24 hours in the most restrictive case). Statistically, it is considered that there is a very low probability that more than two independent faults occur.

The functional layer is intended as the component necessary for the control tasks such as receiving signals from sensors and sending control signals to actuators. This is referred to as the “control layer”. In the absence of any protective measures, failures in combination with normal conditions in the control layer can directly lead to a hazardous situation, such as sending a spurious control signal to operate a valve. Such failures are considered "critical failures".

A second layer is necessary to implement safety measures to detect critical failures. These measures can be considered as forming the second functional layer (primary protection), whose task is to initiate a protective action in the event of a critical failure in combination with all defined “normal conditions”.

Faults that remain without leading to a critical failure are considered latent faults. Latent fault diagnostics can be executed with a lower frequency with respect to faults leading to safety-critical failures. This kind of fault, normally occurring in the protective function, nevertheless leads to a hazardous situation, even years later, in combination with a second fault.

It will be necessary to incorporate safety measures that prevent such a situation. To prevent a dropout of primary protection due to a latent fault, the proper functioning of the "safeguards" is supervised. The necessary function can be considered a third functional layer (secondary protection).

By implementing primary and secondary protection layers, a function with a high safety rating can be realized.

Two independent channels with comparison

Figure 6. Layers of a safety system using two devices with comparison

When adopting the technique of two independent channels with comparison, these two can use the same or different technology targeting the same function. In other terms, it includes homogeneous redundancy or redundancy with diversity.

When applying diversity to a system, it is not necessary to use hardware components from different manufacturers; the goals can also be achieved by using components from a single manufacturer.

This approach is limited to detecting that there is a fault but not determining where the fault is, as opposed to redundant systems with higher number of instances where the majority of voters will determine which channel is faulty (this is, for example, the case of at least two channels giving the same information over three channels present).

The final layer of protection is then provided by the comparator. The comparator itself will be guaranteed in its functionality; therefore, tests need to be run on the comparator to detect faults leading directly to a hazard or to cover latent faults. The comparator itself should also be free from systematic faults as per the rest of the system.

AURIX™ TC3xx products platform and scalability

Infineon’s AURIX™ TC3xx family concept offers both scalable feature sets and scalable pinouts for optimal flexibility, as well as requirements meeting up to ASIL-D and SIL 3 classified requirements. Depending on the specific application and its functional safety system requirements, users can choose the product that best fits to their specific case.

Introduction to FUSA application

Application assumptions and implementation of safety functions

Application assumptions are assumptions at the system level, such as safety goals, fault-tolerant time interval and system safe-state determination. Depending on the application fields, a few examples are listed in Section

ISO 26262 - Electric power steering

and subsequent sections to explain how to proceed with the first architecture approach, considering safety-related aspects.

Standards about functional safety aspects

Standards are guidelines and reflect best practices. The functional safety standards differ depending on the sector of applicability. IEC 61508 is widely considered to be the root of all functional safety standards. Following the IEC 61508 standard, many other standards were created for functional safety, for example, ISO 26262 for the automotive sector and IEC 60730-1 Annex H for automatic controls in household and industrial applications.

Figure 9. Functional safety standards overview

Functional safety levels

When planning an ECU, the engineers have more than one safety functionality to implement. Each of the safety elements that are implemented has its own safety classification. In the same MCU, especially if it is multicore, you can set up various independent functionalities; a few of them are not related to safety; others can have a defined safety level.

Depending on the specific application field, the safety levels have the following names:

Automotive standards (ISO 26262): QM, ASIL A, ASIL B, ASIL C, ASIL D
Industrial standards (IEC 61508): SIL 1 and SIL 2, SIL 3, SIL 4
Automatic electronic controls (IEC 60730): Class A, Class B and Class C

The methodology to be applied has a similar basis. Considering the application that is developed and the given technical specifications, a risk analysis needs to be performed to understand which safety goals are to be achieved and their level of criticality with respect to safety (ASIL, SIL and so on), depending on parameters such as severity, exposure and controllability in the automotive field. It is possible to find an example for automotive safety level classification in

Figure 10

At this point, the project engineers, to whom this book is intended for, can start building their system, considering a good chipset that is adequate to the functionalities they need also from a safety perspective.

AURIX™ TC3xx is designed to be the right MCU for the control board in safety applications, with a significant number of safety mechanisms already built into the hardware. This allows for a fast and reliable system design.

AURIX™ TC3xx MCU and FUSA

Figure 11

shows an overview of the functional blocks available on the AURIX™ TC3xx platform. A clear distinction is made through the colors between cores, memories, peripherals and special features of the product family.

As AURIX™ TC3xx is developed for functional safety use cases, each functional block (apart from a few peripherals) is built to prevent a fault from leading directly to a safety goal violation or to remain latent.

Figure 12

shows a non-exhaustive overview of the hardware safety measures available for each functional block of the AURIX™ TC3xx platform. The main safety features are described in the subsequent subsections.

Safety of MCU infrastructure blocks

To execute any safety application software, it is essential to ensure the correct configuration and monitoring of the MCU infrastructure.

By MCU infrastructure, it means:

Common functional blocks: Considered as possible common-cause failure initiators
Access protection features: Provide freedom from interference between HW/SW elements (see Section
Freedom from interference (FFI)
for more information)
Error management and reporting: Provide a transition to the safe state when a fault is detected (see Section
MCU reaction plan - SMU
for more information)

Common functional blocks

Blocks such as the power management system (PMS), clocks, ports, reset and others that are common to all basic functionalities of the MCU are considered always active. They are usually involved in the nominal (also called “mission”) functions and the monitoring functions of AURIX™ TC3xx. Faults in any of these blocks are considered safety-critical and must be detected. In the following sections, an overview of the safety features that are available in these functional blocks is provided.

Power management system (PMS)

The PMS provides the power infrastructure, generates supply voltages using internal voltage regulators, facilitates power distribution and manages system power modes. Monitoring the supply voltages is the main safety activity implemented in the PMS. This consists of detecting overvoltage or undervoltage events at the different supply voltages.

Power built-in self-test (PBIST)
This test is carried out at reset (Cold PORST; see Section
Types of resets
for details on various resets). The goal is to ensure that external power supplies reach a minimal value before reset release. The MCU will remain in a reset state while the defined voltage thresholds are not exceeded
Primary undervoltage monitor
The primary voltage monitor triggers a reset (Cold PORST) if VEXT, VDDP3 or VDD drops below the lowest possible threshold for the correct operation of the system. Reset threshold values are defined in the product datasheet
Secondary overvoltage and undervoltage monitors
The secondary voltage monitor triggers alarms in case of an overvoltage or undervoltage event in any of the supply rails (that is, VEXT, VDDP3, VDD, VEVRSB and VDDM). These thresholds are to be configured by the user
Note:
VEVRSB and VDDM are not monitored by the primary monitor because it is assumed that the secondary monitors are reliable when VEXT, VDDP3 and VDD are within the operating range.
Note:
External supervisors for VEXT overvoltage are supplied at the system level; for example, see Section
Error-correcting code technique
.

Figure 13. Power supply rail internal primary and secondary safety monitors

Clocking system

The clocking system includes the clock generation unit (clock source), clock scaling (PLLs), clock distribution (CCU) and individual clock configurations (for each MCU peripheral). Since the clock signals are distributed to all peripherals, the clocking system is a potential source of common-cause failures. Therefore, fault detection coverage of this block is an important part of the safety measures that are implemented in AURIX™ TC3xx devices.

Hardware measures are implemented to detect faults in all submodules of the clocking system. The main measure is to compare the frequencies derived from an independent clock source against the operating frequency to be protected.

Clock source
The backup clock (internally generated) and the external crystal oscillator (XTAL OSC) are monitored via a watchdog function. An alarm is generated if the number of backup clock cycles within a window of 512 f
_PLL0
clock cycles (derived from the XTAL OSC) exceeds a configurable value
Clock scaling (PLLs)
The output of PLLs output is monitored by comparing it against a diverse clock (the backup clock); the backup clock output is monitored by f
_PLL0
. A “clock alive” alarm is generated if the monitored clock is below an expected value
Figure 14.
Clock source and clock scaling built-in safety mechanisms

The PLL has a lock detection feature that differentiates between stable and unstable circuit behavior. The PLL may unlock because of a break in the crystal or ceramic resonator or the external clock line. In such a case, a safety management unit (SMU) alarm event is generated.
Clock distribution (CCU)
Clock signals distributed to the individual peripherals must be monitored by the application software. The idea is to compare safety-related peripheral clocks against clocks that are generated from different PLLs, for example:
- f
  _PLL0
  (
  STM
  ) vs f
  _PLL1
  (
  QSPI
  ) and/or
- f
  _PLL0
  (
  STM
  ) vs f
  _PLL2
  (
  ASCLIN
  )
Such a plausibility test detects the wrong settings or states of the clock dividers in the CCU logic.
Figure 15.
Clock distribution protected by plausibility check
Clock configuration
In all clocking system subparts, safety-related configuration registers (SFRs) are protected via safety flip-flop (SFF) mechanisms. Safety flip-flops are special flip-flops that implement a hardware mechanism capable of detecting bit flips within the protected registers, thus preventing single-point faults. Alarms will be generated in the event of bit-flip events

System control unit (SCU)

The SCU is a module that includes several central infrastructure submodules, such as the reset control unit, emergency stop, watchdog timers and trap generator (see Section

Interrupts and trap handling

). A set of protection mechanisms (SFFs, signal redundancy and so on) is implemented in the SCU to detect transient or permanent faults that may lead to severe malfunction of the MCU.

The SCU implements the following HW features that are typically required for a safety application:

Watchdog timers
: These timers monitor access to protected SFRs via Endinit or safety Endinit protection (each CPU has a watchdog timer; in addition, a safety watchdog timer is available for shared resources)
Emergency stop (ES)
: The emergency stop feature provides a fast reaction to an alarm without the intervention of the software. As a reaction to the emergency event, selected output ports can be immediately placed into a defined state (for example, bring the actuators into a known state). An emergency stop can be triggered by the following:
- A transition on the port that is configured as the emergency stop input
- An alarm event or command from the SMU that is configured to generate a port emergency stop
The emergency stop control logic for the ports operates in two modes:
- Synchronous mode (default): The emergency case is activated by hardware and released by software
- Asynchronous mode: Both the activation and release of the emergency case are done by hardware

Die temperature sensor

To avoid the MCU working outside of the expected temperature range, two temperature sensors are implemented in the AURIX™ TC3xx device. This feature can be used as an additional safety mechanism because it enables the generation of an early warning whenever the die temperature is too close to the boundaries of the operating range. Both sensors are located near areas that are the warmest areas of the product. The first instance (PMS_DTS) is located close to the PMS. The second instance (DTS_Core) is located close to the CPU cluster. Each sensor will detect whether the temperature is within the specified limits and set temperature underflow/overflow alarms accordingly.

Safety of processing blocks

Safe computation - CPU

The TC3xx family utilizes the TC1.62P core hardware, which is based on the TC1.6P core with enhancements in memory distribution, protection and other aspects. Additionally, up to four CPUs are protected by a lockstep mechanism, which allows them to run up to ASIL-D or SIL 3 applications without the need to integrate cyclic software-based self-tests for the CPU.

CPU memory and temporal protection

The CPU offers several HW measures for protection of memory and module resource accesses (registers), as well as timer-based mechanisms for detecting timing violations of the SW.

Lockstep CPU

Depending on the device variant, an AURIX™ TC3xx offers up to four lockstep CPUs. The lockstep (LS) CPU monitoring is based on hardware redundancy with online monitoring of the outputs. The lockstep monitoring function compares the outputs of the master and the checker cores and signals a fault to the SMU for appropriate action.

The monitoring function temporarily separates the cores by inserting delays in the signal chain to avoid an external disturbance that affects both cores in the same way and therefore goes undetected by the lockstep mechanism. To achieve this, the redundant core inputs and the master core outputs fed to the comparators are delayed by two clock cycles, realigning the two signals. The lockstep core has no effect on the nominal operation.

The cores equipped with a lockstep also have a continuously running background self-test of the lockstep comparator. The self-test function will inject faults into both inputs of each of the monitored nodes and verify that the fault is correctly detected by the monitoring logic.

Figure 16

shows a simple representation of an arbitrary node comparator and the fault injection mechanism, which is highlighted in green.

All CPU functions are covered by the following lockstep system:

Exception handling
Instruction fetch and execution
Data movements from internal RAM to the core or bus interface

Non-lockstep CPU

The non-lockstep CPU has an identical architecture compared to the lockstep CPU, but it does not include the checker core and the comparator output logic. While the performances of the non-lockstep and lockstep CPUs are the same, the non-lockstep CPU cannot rely on redundant hardware elements. Therefore, a software-based mechanism named software-based self-test (SBST) is required for covering single-point faults and latent faults of the CPU itself.

System timer module (STM)

The system timer is a free-running 64-bit timer that is enabled immediately after an application reset and can be read by the application software. Each CPU has a dedicated STM. The system timer is fundamental for the operating system and task scheduling. It can be configured to generate a compare-match interrupt service routine (ISR) by using dedicated registers. The STM is not part of the duplication area of the CPU, so no specific hardware is dedicated to monitoring the correct behavior of the timer. In cases where the STM is used in safety-relevant applications, the application SW performs plausibility checks using an independent timer.

Error-correcting code technique

Error-correcting code (ECC) is a technique that adds a number of check bits to a message or data, allowing it to detect and correct a limited number of errors. In this scope, the minimum number of bit flips required to change one valid codeword into another valid codeword is called the “hamming distance”. Typical ECCs, such as hamming codes, can detect with certainty up to 2-bit errors and correct 1-bit errors. Hamming codes are used in AURIX™ TC3xx SRAMs. In this case we have single-bit error correction and dual-bit error detection (SECDED). Another type of ECC, such as BCH codes, can be designed to correct multiple-bit errors. BCH codes are used in AURIX™ TC3xx PFlash with dual-bit errors correction and three-bit error detection (DECTED). For DECTED, a hamming distance of six is required.

CPU RAMs

Each CPU utilizes different RAM blocks as local memories, which are represented in

Figure 18

and listed as:

Data Scratch Pad RAM (DSPR)
Program Scratch Pad SRAM (PSPR)
Data Cache (DCache)
Program Cache (PCache)
Distributed Local Memory Unit (DLMU)
Local PFlash Bank (LPB)

Figure 18. Processor core, local memory and connectivity

The processor core connects to these memories and to the following bus interfaces (where these are implemented):

SRI Slave Interface (x2)
SRI Master Interface
SPB Master interface

The CPU RAM can be affected by transient or permanent faults and implements the same safety mechanisms common to all SRAM blocks. See Section

Error-correcting code technique

for more information.

The RAMs of the lockstep CPU are not replicated as part of the lockstep mechanism, but the program memory interface (PMI) and data memory interface (DMI) within the lockstep CPU’s RAM instances , including ECC encoding and decoding, are part of the area of duplication. Lockstep CPU memories are therefore classified as ASIL-D, allowing the user to execute ASIL-D software from a lockstep CPU using any combination of local CPU memory or the memories of another lockstep CPU.

Non-lockstep CPU memories inherit the ASIL-B level from the non-lockstep CPU. If users want to perform ASIL-D read/write operations by using a non-lockstep CPU memory from a lockstep CPU, the system integrator must take care of monitoring data corruption. This means monitoring the data (stored in the CPU.PSPR, CPU.DSPR or CPU.DLMU) of non-lockstep CPUs by using information redundancy.

Nonvolatile memory (NVM)

As shown in

Figure 19

, the MCU features memory resources that are distributed to multiple locations. In particular, the NVM is dedicated to storing data or programs in flash memories. It is composed of the following parts:

Data flash
: Flash banks are used to store application data. Two banks are available (DF0 and DF1)
Program flash
: Flash banks are used to store application SW code or constant data. One bank with local access is available per CPU
User configuration blocks (UCB)
: This block is used for user configuration of the device. It is part of DF0
Flash standard interface (FSI)
: Executes erase, program and verify operations on all flash memories
Configuration sector (CFS)
: This block contains device-specific settings that are not accessible by the user
BootROM (BROM)
: Contains the firmware executed by the device at start-up before user-defined software can be executed
DMU
: Interface the FSI and PFI with data flash, UCB and CFS
Program flash interface (PFI)
: Provides a fast connection between each program flash bank and its CPU

Monitoring

The content of PFlash banks is crucial because it contains the code executed by the CPU. Any fault leading to transient or permanent corruption of the PFlash content can lead to severe malfunctions that are not detectable by SW. Therefore, the NVM offers dedicated safety mechanisms for the monitoring of several failures affecting the PFlash. See Section

Error-correcting code technique

for more information.

Each 256-bit block is protected by an enhanced error detection code (EDC) and ECC logic that can detect up to 3-bit errors (TBE) and correct single-bit errors (SBE) as well as double-bit errors (DBE). In addition, a correctable bit address buffer (CBAB) is available to monitor the number and address of corrupted code words; each address is stored only once and an uncorrectable bit address buffer (UBAB) is present to store the address of an uncorrectable error.

To address the latent fault metric of the ECC logic, the PFlash ECC/EDC decoder is also monitored to detect failure modes of the ECC decoder. In addition to all these HW-based safety mechanisms, before starting to fetch safety-relevant code from a PFlash bank or upon every content update, the user application software will execute additional checks, such as a CRC of the full memory.

The nonvolatile memory also has a dedicated RAM for the FSI interface. The RAM can be affected by transient or permanent faults that can corrupt data and it is protected by the same safety mechanisms common to all SRAM blocks.

Volatile memory

In addition to the volatile memories associated with each CPU, AURIX™ TC3xx has EMEM, LMU and peripheral RAMs. A few volatile memories are in a user-accessible memory range. Other memories are localized to the peripheral modules. RAMs are protected, for the most part, by the ECC mechanism. See Section

Error-correcting code technique

for more information. A few exceptions feature EDC only.

Finally, when the ECC detects the majority of the data corruption, the RAM has additional hardware safety mechanisms capable of detecting errors in the RAM address.

The RAM alarm concept is updated based on the expected fault reaction. From each RAM, the following three alarms are sent to the SMU:

Correctable-error alarm (CE): ECC correction is performed (that is, single-bit error correction)
Un-correctable error alarm (UCE): ECC detection (that is, double-bit error) or RAM address error detection
Miscellaneous error alarm (ME): Non-critical (latent) fault detection

EMEM

The EMEM is a dedicated memory that contains RAM blocks (EMEM tiles), which can be used for ADAS applications, calibration or trace data storage. The EMEM implements interfaces to SRI bus and BBB bus.

LMU

The LMU is an SRI-connected module providing access to volatile memory resources. Its primary purpose is to provide up to 256 KB of local memory for general-purpose usage.

SRAM

In this document, the various SRAMs are identified by the name of the functional block where the memory is located. In cases where one functional block has multiple SRAM instances, each memory has a unique identifier.

Table 2. Example of SRAM instances available in the TC39x
Functional block	Memory name
CPU	DSPR
	PSPR
	DTAG
	PTAG
	PCACHE
	DCACHE
	DLMU
SPU	CONFIG
	BUFFER
	FFT
DMA	RAM
GTM	RAM
EMEM	RAM
LMU	RAM
PSI5	RAM
MCMCAN	RAM
CIF	RAM
HSPDM	RAM
NVM	FSIRAM
TRACE	RAM
ERAY	RAM
AMU	LMU_RAM
GETH	RAM
SDMMC	RAM
SCR	RAM

MCU function - ADAS

AURIX™ TC3xx supports advanced driver assistance systems (ADAS), a suite of technologies that help drivers stay safe on the road. The main blocks for ADAS are radar interface (RIF) and signal processing unit (SPU).

Radar interface (RIF)

The RIF acts as a 32-bit interface between internal or external ADC channels with the SPU module. The RIF is used in ADAS applications, where a high level of safety is required. Therefore, different parts of the RIF are monitored by HW safety mechanisms. The CRC redundancy technique is used for increasing fault coverage on configuration registers and the data interfaces with the monolithic microwave integrated circuit (MMIC) input stage and SPU output stage.

In the event of an error detection in the MMIC, the application SW performs additional actions to handle errors. Redundancy is applied to the RIF data path and safety mechanisms for increasing data integrity. RIF is a slave node of the FPI and is protected by common access protection safety mechanisms.

Signal processing unit (SPU)

The SPU is a semi-autonomous accelerator for performing Fast Fourier Transforms (FFTs) on data from one or more dedicated ADC interfaces. The SPU uses a three-stage streaming architecture to provide data pre-processing, FFT and data post-processing operations. The SPU uses the radar memory to store datasets and has internal buffer memories, which are used to store the data currently progressing through the processing pipeline.

The SPU is composed mainly of these parts:

SPU core: Computational unit for FFT calculations
SPU lockstep: Full redundancy in case the second SPU is used as a lockstep unit
SPU RAMs: Used for storing data (FFT, BUFFER) and configuration (CONFIG)

The SPU offers several safety mechanisms that monitor the correct behavior of the unit. During runtime, the SPU configuration data and control flow of the operation are periodically checked. The SPU interfaces with RIF and EMEM are protected by hardware built-in safety mechanisms. The second SPU instance can be configured for full redundancy (comparison of control and data outputs), partial redundancy (comparison of control only) or no redundancy (no comparison).

In the event that SPU is not configured for full redundancy, additional external measures will be implemented at the SW level. A class of faults in the SPU can cause a deadlock in the SPU. A SW-based self-test (SBST) is provided and will detect and signal an error in case the test execution time takes longer than expected. The integrity of a few SPU safety mechanisms is monitored by other dedicated safety mechanisms. SPU is a slave node of the FPI and is protected by common access protection safety mechanisms.

SPU has three different types of RAM, each of which is dedicated to specific usage. Each RAM can be affected by transient or permanent faults that can corrupt data and have the same safety mechanisms common to all SRAM blocks.

Debug and test functionalities

Trace and debug modules are slave nodes of the FPI bus and are therefore protected by common access protection safety mechanisms. Their functionality is disabled during operation. These blocks are protected by hardware safety mechanisms for providing freedom from interference, as described in Section

Freedom from interference (FFI)

SRI and FPI busses

SRI bus

The SRI bus connects the CPU, the DMA module and other high-bandwidth requestors to high-bandwidth memories and other resources for instruction fetches and data accesses. The SRI interconnect supports parallel transactions between SRI masters and independent SRI slaves.

FPI bus

The FPI connects the high-speed peripherals (CPU and DMA) to the medium- and low-bandwidth peripherals. The AURIX™ TC3xx family has up to two FPI bus instances:

System peripheral bus (SPB): Main non-ADAS system and communication peripherals
Back bone bus (BBB): Emulation device-related and ADAS-related peripherals, available in ADAS/Emulation

SRI and FPI safety mechanisms

Any R/W operation of the MCU buses can be affected by several faults during the address phase or the data phase, resulting in incorrect or missing data, wrong addressing and so on. SRI and FPI slaves are protected by built-in hardware mechanisms against these possible faults.

MCU function - MCU communication

DMA

The DMA moves data from source modules to destination modules without the intervention of the CPU or other on-chip devices. A data move is defined by DMA configuration data. A DMA channel operation is initiated by a DMA hardware request or a DMA software request.

During DMA operations, transactions can be subject to permanent or transient faults that can affect the success of the data moves in several ways. In addition, the DMA source or destination address can be corrupted, resulting in the wrong data at the destination. Faults in the DMA move engine can lead to lost or delayed transactions. All these kinds of faults are addressed by internal hardware mechanisms.

Interrupts and trap handling

The TriCore™ architecture manual defines how the CPUs deal with interrupts and traps. The interrupt router (IR) module is responsible for scheduling service requests (also called interrupts) to the correct service provider. In the TC3xx architecture, internal peripherals, external hardware or application software can raise a service request. The service providers are all CPU’s and DMA.

The IR is a critical block since a fault in its logic, coming from hardware or software, can affect one or more service providers or the interrupt service routine (ISR). The IR is connected to all internal functional blocks, so a failure in a peripheral can generate malfunctions in the IR and propagate to the CPU, DMA or other peripherals. The correct behavior of the IR and its monitoring functions during runtime are crucial parts of the safety measures implemented in the TC3xx architecture. This is achieved by a combination of internal safety mechanisms built into the hardware and a few software checks.

Trap generation (TRAP) is a functionality of the SCU, which hosts a cluster of sub-modules that control various system functions. SCU trap generation determines which CPU receives a trap based on the trap event trigger. Interrupts can interfere heavily with the sequential execution of the program or be executed as an error reaction, while traps are generated when the core of the MCU detects an error.

Safety of application dependent blocks

Application-dependent blocks are parts of the MCU for which the fulfillment of the safety requirements requires a combination of application-level safety mechanisms and safety mechanisms provided by the MCU. Typical application dependent parts are peripheral modules participating in data-acquisition, actuation control and system-level communication. Correct and safe functionality of the MCU application-dependent blocks can be guaranteed with different techniques.

One of the most common techniques in SEooC peripherals is redundancy with comparison. When redundancy is applied, a dependent failure analysis (DFA) and coverage of common block functionality with additional controls are necessary.

In AURIX™ TC3xx, each identified cause of dependent failures is controlled by an adequate safety measure. One of these measures is the correct pin distribution, as explained in Section

Considerations on common-cause failures on pins and packages

Most of the application-dependent blocks use:

GTM modules for generating or capturing signals
ADC modules (EVADC or EDSADC) to perform analog signal acquisition

It is important to briefly introduce the GTM and ADC module concepts before exposing the application-dependent scenarios from a safety perspective.

Overview of GTM

Below is a summary representation of the generic timer module (GTM) that is often used in application-dependent use cases to implement digital acquisition and digital actuation. GTM TIM modules can acquire PWM, while GTM TOM or ATOM modules can generate PWM signals for digital actuation.

The safety mechanism is not in the resource itself (apart from access protection and SRAM ECC), but in the redundancy that is required as an assumption of use, which means implemented by the application engineer.

Overview of EVADC/EDSADC

Below is a summary representation of the enhanced analog-to-digital converter (EVADC) block and enhanced delta-sigma analog-to-digital converter (EDSADC) that are often used in application dependent use cases.

The safety mechanism is not in the resource itself (apart access protection) but in the redundancy that is required as an assumption of use.

There are four ADC types present in the MCU:

Delta Sigma: 13 ENOB (effective number of bits), ≤ 200 ksps
Primary SAR: 12-bit, ≤ 2.5 Msps
Secondary SAR: 12-bit, ≤ 1.4 Msps
Fast Compare: 10-bit, ≤ 5 Msps

Safe analog acquisition

In AURIX™ TC3xx, the key to achieve the required safety level for analog acquisition is to have a redundant channel called the “monitoring” channel in addition to the functional channel, also known as the “mission” channel. Depending on the safety level required, the input pin for both channels (mission and monitor) will be the same or different. When selecting AD converters for mission and monitoring, the user should ensure to choose ADC blocks guaranteeing physical separation.

Figure 24

shows the case when redundant safety-related analog signals are delivered by the system and redundantly processed by internal resources of the ADC module. The results of the redundant processing are transported from the ADC module to volatile memory and compared by the CPU.

Note: The system integrator implements a check of the ADC reference voltage, either by an external monitor or by internally converting a known signal and compares the result with the expected value.

Safe digital acquisition

In AURIX™ TC3xx, the key to achieve the required safety level for digital acquisition is to have a redundant channel called the "monitoring" channel in addition to the functional channel, also known as the "mission" channel. Depending on the safety level required, the input pin for both channels (mission and monitor) will be the same or different.

Additionally, regarding the internal peripherals to be used for mission and monitoring, different combinations are possible using independent TIM (timer input) channels of the GTM block or a TIM channel and a "diverse" input timer from an independent peripheral (for example, using CCU6, which is an independent timer module). The CPU reads and compares the results of the signal measurement.

Figure 25. Simplified example of digital acquisition using two independent TIM channels of GTM

For digital acquisition, it is recommended to avoid the use of adjacent pins to prevent common-cause failures of the ports and package (see Section

Considerations on common-cause failures on pins and packages

Safe digital actuation

In AURIX™ TC3xx, the key to achieve the required safety level for digital actuation is to have a redundant channel called the “monitoring” channel in addition to the functional channel, also known as the “mission” channel. When selecting internal peripherals to be used for mission and monitoring, different combinations are possible using independent timer output channels and comparing them or reading back. For example, a GTM output timer using a GTM input timer and comparing these two signals.

When a GTM output resource (TOM or ATOM channel) is used to generate the PWM signal, this can be sent back from the external actuator to a GTM input resource (TIM) and application SW can perform a comparison of the PWM output with the PWM feedback signal.

Figure 26. Simplified overview for digital actuation TOM-TIM-SW

For digital actuation, it is recommended to avoid the use of adjacent pins to prevent common-cause failures of the ports and package (see Section

Considerations on common-cause failures on pins and packages

Safe E2E communication

Safe end-to-end (E2E) communication is often applied as a safety measure to communication ports instead of using the safety measure “redundancy with comparison”. E2E protocol is a generally well-known safety measure and is also stated in ISO 26262. It is not an Infineon-specific AURIX™ TC3xx safety feature. The AUTOSAR standard also describes a few specifics about this measure.

Figure 27. Types of communications possible with AURIX™ TC3xx

The safety measures are facilitated by using additional meta-data such as CRC, counters and timestamps as part of the payload data. Application software initiates the safe communication safety measures during every communication event (transmission and reception). On detection of a failure, the application SW triggers the reaction. For example, in the case of a transmission error, application SW must re-transmit the same package again. In case of a reception error, application SW will not acknowledge the received package.

The E2E profiles provide a consistent set of data protection mechanisms, designed to protect against the faults considered in the fault model for communication blocks.

Each E2E profile uses a subset of the following data protection mechanisms:

A CRC, provided by the CRC library
A sequence counter is incremented at every transmission request; the value is checked at the receiver side for the correct increment
An alive counter is incremented at every transmission request; the value is checked at the receiver side if it changes, but the correct increment is not checked
A specific ID for every message
Timeout detection: Receiver communication timeout and sender acknowledgement timeout

MCU reaction plan - SMU

The TC3xx is built to be fail-safe, which means the MCU must ensure entering a safe state upon fault detection. To achieve that, the MCU implements a HW infrastructure that is responsible for collecting alarms (fault notification) and triggering appropriate reactions, which is ensured by the SMU. The reaction of each alarm can be configured accordingly to the needs of the application.

The SMU module is connected to all safety mechanisms that are within the MCU to collect alarms. The SMU is also connected to the system control unit, the interrupt router, the ports and the power management system to trigger the configured reaction when an alarm is set.

To mitigate the potential common cause faults, the AURIX™ TC3xx SMU is portioned into two parts:

SMU_core: Located in the core domain
SMU_stdby: Located in the stand-by domain

The SMU_core and SMU_stdby are designed differently and they are located in different clock and power domains with physical isolation between them. The SMU_core collects most alarm signals from the hardware monitors and safety mechanisms according to the safety concept, while the SMU_stdby collects alarms from modules that detect core alive signals, power or temperature failures. This enables the SMU to process any incoming alarm, regardless of the clock frequency used to generate the alarm.

The SMU, in combination with AURIX™ TC3xx embedded safety mechanisms, ensures the detection and reporting of more than 99% of the critical failure modes of the MCU within the fault tolerance time interval.

SMU behavior

SMU offers the following failure reporting alternatives (see

Figure 29

Internal
External
Alternate

Internal failure reporting

The internal failure reporting interface enables the MCU to indicate, via SMU_core, the presence of an internal MCU failure.

The SMU_core can be configured to request one of the following internal reactions to a failure being detected:

Interrupt request for one or multiple CPUs (ISR)
Non-maskable interrupt (NMI)
CPU reset request for one or multiple CPUs
An application or system reset

External failure reporting

The external failure reporting interface enables the communication of the presence of an internal MCU failure to an external safe state controller via SMU_core. Through fault signaling protocol (FSP) pin(s), the alarm information goes to an independent monitor, for example, Infineon technologies OPTIREG™ PMIC TLF35584. This external alarm signal can be delayed by configuring the recovery timer (RT). See

Figure 30

as an example. The SMU can also receive an error notification from an external device via the emergency stop ports and can react to it without the intervention of a CPU.

Alternate external failure reporting

The alternate external failure reporting interface enables the communication of the presence of an MCU common-cause failure to an external safe state controller via an alternative diverse path from SMU_stdby.

On detection of common-cause failure (for example, clock failure, power failure, SMU failure, high or low temperature detection), the SMU_stdby can be configured to set the fault signaling protocol error pin(s) in high impedance state regardless of the port configuration.

Figure 29. SMU internal and external reactions (simplified)

Figure 30. External failure reporting interface - failure reaction example

System level hardware requirements

To cover additional safety aspects at the system level, AURIX™ TC3xx requires the following external safety measures:

Overvoltage monitoring of the two main supplies:
- VEXT (V
  _µC
  )
- VEVRSB (V
  _StBy
  )
External watchdog
External error signaling to activate an independent secondary safety path

This can be achieved by using, for example, an external power management chip (see

Figure 31

One of the primary roles of this device is to monitor the voltage supplies of the system, whether internally generated by the device or from other on-board regulators and, if necessary, disconnect the MCU from the power supply to avoid a violation of the safety goals.

The Infineon OPTIREG™ PMIC TLF35584 is capable of detecting dependent failures that affect both the function and the diagnostic, such as a watchdog error. When this happens, then the safety power supply can initiate a return to a safe state by driving output pins to disconnect the power feed to the actuators and/or triggering a reset of the MCU.

The safety power supply also monitors the fault signaling protocol (FSP) pin of the MCU that signals an internal failure, indicating that the MCU response is no longer reliable. In this case, the power supply is the ‘last man standing’ and its built-in safe state controller triggers a safe state to meet the safety goals for the system.

As the building blocks of a functional safety system are reviewed and understood, the benefits of sourcing from a single supplier both MCU and PMIC become immediately apparent. Each of the elements of the AURIX™ TC3xx system is specifically designed and tested to work alongside each other and contain signals and controls that significantly ease the task of building a system capable of reaching the highest safety integrity levels.

Safety path

The safety path is the signal chain and circuitry that enables and maintains the system’s safe state. For several applications, the safe state is achieved by disabling actuators, communication channels or the complete system.

In such systems, the safety path is therefore referred to as the safety shutdown path.

Primary safety path

The primary safety path is a safety path (shutdown path) that is managed directly by the safety microcontroller. The microcontroller can keep the application in a safe state if the assumptions of use are respected (for example, operating conditions in a valid range as described in the datasheet) and it is possible to act directly on motor-control signals, communication signals or other possible signals that activate a defined safe state of the system.

Secondary safety path

A secondary safety path is a safety (or safety shutdown) path established through the PMIC’s safety functionality or other external hardware. For example, the safe state of the system in

Figure 32

is intended to be achieved by the PMIC's safe state outputs SS1 and SS2, which should be connected to system circuitry that can release and assert the safe state.

The secondary safety path will be implemented with a high degree of independence from the primary safety path of AURIX™ TC3xx to provide a redundant mechanism for cases in which the primary safety path is unreliable.

Figure 32. Example of primary and secondary safety paths for a communication block

Considerations on common-cause failures on pins and packages

In this section, the topic of common-cause failures (CCFs) in pins and packages will be addressed.

Based on the safety concept for TC3xx, a few functional blocks’ safety mechanisms require the use of two redundant channels (for example, ADC redundant channel acquisition):

Mission channel
Monitoring channel

Figure 33. Channel redundancy representation in the ADC functional block

Common-cause failures are the failure of two or more elements of an item resulting from a single specific event or root cause, which can affect both the mission and the monitor input/output (I/O) signals, potentially leading to failures.

Figure 34. Abstract representation of a common-cause failure

An example of a common-cause failure from the package perspective is when a BGA ball shorts its neighboring balls.

Figure 35. Illustrative example of a common-cause failure from the BGA ball level

Figure 36

shows how a common-cause failure can affect neighboring balls at the package level.

The ball-out of packages LFBGA516 and LFBGA292 is separated into eight different groups. Those highlighted groups indicate that the balls from one group are appropriately separated from the balls from other non-adjacent groups.

The mission I/O pin can be used from one group and the monitor signal can be used from any other group, so that the mission and the monitor groups are not adjacent. This is valid not only for a generic I/O port but also for GTM and ADC modules’ pins when applying the redundancy principles.

As shown in

Figure 37

, if the mission signal is connected to the group highlighted in blue (group 1), then avoid connecting the monitor signal to the same blue group or to the adjacent group marked in yellow (group 2) and the group marked in purple (group 8).

Figure 38. TC3971 LFBGA292 I/O configuration

AURIX™ TC3xx safety package for customers

To enable customers to reach the target safety level for their project, Infineon provides a comprehensive library file that contains all the functional safety documentation related to each specific TC3xx device and is required for the design of a safety-relevant system based on this specific microcontroller. This package is made available to customers under non-disclosure agreement for both ISO 26262 compliance and IEC 61508 compliance justification.

The "safety package" for ISO 26262 is composed of the documents as shown in

Table 3

Table 3. Documents of the safety package
Document	Description
Safety case report	Serves as the work product requested by ISO 26262:2018 Part 2 Clause 6.5.4 and follows the guidance of ISO 26262, Part 10 Clause 5.3.
Safety manual	Provides guidance for integrating the device into a safety system, assumptions of use, safety mechanisms and implementation hints.
FMEDA template	Calculation tool to compute customized ISO 26262 random-fault-related metrics for TC3xx devices depending on the specific configuration for the customer application.
Safety analysis summary report	Describes the performed safety analysis for the AURIX™ TC3xx devices and provides reference to the corresponding safety analysis results.
Safety package release note	Present the set of customer-relevant safety documents, taken all together, to enable Infineon to substantiate the functional safety claims. Communicate to the integrator of the AURIX™ TC3xx device the major changes between the documents in this safety package and the documents referenced in the previous package.

The subsequent sections provide more details about the concept related to metrics such as base failure rate, soft error rate and FMEDA.

Base failure rate (BFR)

The base failure rate (BFR) is the first input of the FMEDA. It is related to the hardware permanent errors only. Data commonly comes from the SN29500 or IEC TR 62380 standards. Infineon’s AURIX™ BFR template according to TR 62380 (or better ISO 26262-11:2018), provides the BFR for permanent faults (hard errors) for the die and the package separately.

The default mission profile is “Motor Control”, as it is the worst-case condition, and working and dormant times for the calculation of τ_i, τ_on + τ_off = 1 were considered. The integrators can change these values based on their own calculations for the mission profiles of the application.

Soft error rate (SER)

The soft error rate (SER) is related to hardware soft (non-permanent) errors and is composed of NSER and ASER data with a package-related adjustment factor, where:

NSER is a soft error rate caused by neutron radiation from cosmic rays at the earth's surface, obtained during accelerated neutron testing. The value depends on the altitude and the location on earth and is referenced to "New York Sea Level". The BFR related to SE caused by neutron radiation (NSER) is multiplied by the value entered in the related cell "Flux Factor for Neutron Particles". For scaling instructions, see JESD89 A.3
ASER is an alpha particle-originating soft error rate obtained during accelerated testing and caused by impurities of process and package materials, for example mainly because of mold compound, and is referenced to low alpha materials. It can also be caused by other materials, such as solder bumps. For bare die applications, the “Flux Factor for Alpha Particles” must be used to scale the SER with an appropriate scaling that considers the alpha activity of the material set for encapsulation and interconnect

As all AURIX™ TC3xx MCUs use low-alpha mold compound (alpha particle emission is 0.0010 cph/cm2) for the packages, the following equation for the SER is applied in the FMEDA.

𝑆𝐸𝑅 = 𝑁𝑆𝐸𝑅 + 𝐴𝑆𝐸𝑅

The soft error rate is independent from the mission profile.

Failure modes, effects and diagnostic analysis (FMEDA)

Failure modes, effects and diagnostic analysis (FMEDA) is the analysis of the effect of random hardware faults on a safety requirement or safety goal, including the quantitative estimation of failure rates and the probability/rate of a safety goal violation.

Infineon provides support for an accurate estimation of the failure rate and diagnostic coverage of the AURIX™ TC3xx MCU, providing a FMEDA template that is fully configurable by the customer for their specific use case for both ISO 26262 and IEC 61508 standards.

Infineon FMEDA supply metrics for:

Permanent faults/hard errors (HE)
: Random HW fault that occurs and stays (for example open, short and so on)
Transient HW faults/soft errors (SE)
: Random HW fault that occurs once and subsequently disappears (for example bit-flip in SRAM because of alpha radiation)

Inputs for the FMEDA Excel sheet come from the BFR calculation and SER.

Figure 39. Infineon FMEDA template, inputs and outputs

ISO 26262 - Electric power steering

To better understand how AURIX™ TC3xx safety features are employed, it is important to discuss a typical application from the automotive sector (an electric power steering (EPS)). A complete chipset, meeting ISO 26262 requirements and supporting fail-safe EPS systems, will be described.

Note: High availability and fail-operational reliability can be achieved by adding a redundant functional system.

The EPS system assists a driver to steer the vehicle with less manual force. The application example of an EPS solution is presented in

Figure 40

The electronic control unit (ECU) directly controls an electric brushless direct current (BLDC) motor with 3, 6 or 12 phases, which applies additional torque or force to the steering column or directly to the steering rack. The main hazard is “unwanted steering” which is detected within a fault-tolerant time interval of in the order of milliseconds. The ASIL rating assigned is D.

The critical safety hazard “unwanted steering” comprises faults that lead to unintended generation of torque or moves by the BLDC motor that may lead to steering the vehicle in a hazardous direction. In the event of a detected error, the EPS system will inform the driver with a warning and the driver must be able to control the vehicle manually without interference from the EPS system.

Figure 41. Simplified block diagram of an EPS system

To perform specific tasks, such as steering angle measurement and electric motor control, a number of dedicated integrated circuits are utilized, which are discussed in the subsequent sections. Moreover, their requirements in terms of functionalities and safety-related specifications are described.

Initial assumptions

This example does not cover the steer-by-wire use case; therefore, mechanical steering is considered to be still present. However, the basic requirements for a fail-safe EPS also apply in steer-by-wire systems for each of the redundant functional channels.

Need for protection

EPS uses an electric motor to assist in the steering of a vehicle. A sensor detects the torque exerted on the steering wheel by the driver and an ECU applies assisted torque via the motor. The mechanical linkage between the steering wheel and the steering gear is retained as a backup, so the driver can manually steer the car.

System safe state

: Current flow to the motor is cut off and the motor remains in freewheeling (phase cut-off or active freewheeling). Fail operational may be required, especially for heavy vehicles.

Note: To have a high-availability use case, the example available in this section needs to be reviewed using redundancy.

The EPS system description is represented in

Figure 41

. The main block is the AURIX™ TC3xx microprocessor, which handles and controls all the major functionalities.

Hardware components

The major requirements and the key features of the selected components are:

Safe system supply optimized for EPS with ASIL-D monitoring and supervision
Safe bridge driver, optimal for EPS and brake booster, supporting ASIL-D for safe state off
ISO 26262 compliant angle sensor, torque sensor and motor position sensors designed for most demanding safety applications (ASIL-D) with superior accuracy performance
Robust MOSFET with superior switching behavior
AURIX™ TC3xx- microcontroller that needs to guarantee safe calculation because the output values of the control algorithms heavily influence the generation and control of the BLDC motor. In addition, PWM signals (typically at 20 kHz) for the high side and the low side switches of the half bridge drivers need to be provided safely by the MCU, as well as the SPI for communication with the power supply and watchdog
Current sensors are applied for torque control loops and use multiple redundant ADC channels and converters, for example, dual sensing of 1-3 shunts, using a 4th or using a SPI interface to bridge driver ASICs
CAN (FD) messages for steering angle signal values transmitted outside of the ECU
CAN (FD) and FlexRay (optional) for getting commands in steer-by-wire systems (for example, ADAS commands)

Power supply

The power management IC device can manage and monitor the power supply for a range of ECU component systems, including electric power steering, engine control units and advanced driver assistance systems.

An important functional safety feature of this integrated circuit is its ability to detect and report faults in the power supply rails, such as overvoltage, undervoltage and overcurrent conditions. The device includes a range of built-in protection mechanisms, such as voltage and current clamping, to help to prevent damage to sensitive electronic components in the event of a fault.

The power supply circuit also includes several features to ensure reliable and stable power delivery to critical vehicle systems, including multiple regulated outputs. Each of those can be programmed to a specific voltage and current limit. Another very relevant safety feature is the presence of a watchdog timer to monitor the system for malfunctions and automatically reset the device if necessary.

In addition to its functional safety features, a good power supply integrated circuit is designed to be highly efficient and reliable. The device need to be capable of operating at high temperatures and includes advanced thermal protection features to prevent damage from overheating. An example of such an IC is the TLF35584 presented in

Figure 42

Pinout AURIX™ TC3xx-PMIC

Table 4

lists an example of connections required for functional and safety purposes when using

TLF35584

as the power supply chip for an EPS system. It is necessary to go to the Infineon website to check which is the most appropriate and up-to-date chip variant before making any decision about the project.

Table 4. AURIX™ TC3xx-TLF35584 connections
NR	AURIX™	PMIC	Description
1	DATA_IN (RX)	MISO (SDO)	Digital SPI signaling output port refers to the VEXT supply voltage. Connect to the SPI port "data input" of the MCU.
2	DATA_OUT (TX)	MOSI (SDI)	Digital SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port "data output" of the MCU.
3	CHIP SELECT	CSN (SCS)	Digital active-low SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port "chip select" of the MCU.
4	CLOCK	CLK_SPI (SCL)	Digital SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port "clock" of the MCU.
5	FSP	ERR	Diagnostic output signal from AURIX™ TC3xx to TLF to activate an independent safety path.
6	ESR1	INT	Safety output to AURIX™ TC3xx
7	PORTX.Y	WDI	Watchdog input signal from AURIX™ TC3xx
8	PORST	ROT	Reset to AURIX™ TC3xx
9	PORTA.B	SS1	For the startup test of SS1 output effectiveness (optional)
10	PORTC.D	SS2	For the startup test of SS2 output effectiveness (optional)

3-phase bridge driver

A gate driver IC is dedicated to controlling six external N-channel MOSFETs, forming an inverter for 3-phase motor drives. Such chips are often referred to as gate driver units (GDUs). An example of a block diagram of such an integrated circuit can be seen in

Figure 43

From a functional safety perspective, a three-phase driver must include safety features to prevent dangerous situations from occurring in the event of faults.

Some of the key safety features that are typically included in a three-phase gate driver are:

Short-circuit protection of the motor winding
Chip overtemperature protection
Chip under-voltage protection
Overcurrent protection of the motor winding: The internal current amplifier is a circuit that measures the current flowing through the motor windings and amplifies the signal to a level that can be read by the device's control circuitry to ensure safe and reliable operation of the motor
Diagnostic features for chip health management

Pinout AURIX™ TC3xx to the gate driver

Table 5

shows the connections required for functional and safety purposes when using TLE9183QK as the 3-phase motor driver chip. It is necessary to visit the Infineon website to check which is the most appropriate and up-to-date device before making any decisions about the project.

Table 5. AURIX™ TC3xx-TLE9183QK connections
NR	AURIX™	Gate driver	Description
1	DATA_IN (RX)	MISO (SDO)	Digital SPI signaling output port refers to the VEXT supply voltage. Connect to the SPI port “data input” of the MCU.
2	DATA_OUT (TX)	MOSI (SDI)	Digital SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port “data output” of the MCU.
3	CHIP SELECT	CSN (SCS)	Digital active-low SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port “chip select” of the MCU.
4	CLOCK	CLK_SPI (SCL)	Digital SPI signaling input port refers to the VEXT supply voltage. Connect to the SPI port “clock” of the MCU.
5	GTM TOM module	IH1_N	AURIX™ PWM output, TOM channel of the GTM timer
5	GTM TOM module	IH2_N	AURIX™ PWM output, TOM channel of the GTM timer ¹
6	GTM TOM module	IH3_N	AURIX™ PWM output, TOM channel of the GTM timer ¹
7	GTM TOM module	IL1	AURIX™ PWM output, TOM channel of the GTM timer ¹
8	GTM TOM module	IL2	AURIX™ PWM output, TOM channel of the GTM timer ¹
9	GTM TOM module	IL3	AURIX™ PWM output, TOM channel of the GTM timer ¹
10	ADC module	VRO	AURIX™ ADC input; voltage reference output: The DC output voltage at the outputs of the CSAs (VOx) for zero differential input voltage is defined by the output of the reference buffer at pin VRO.
11	ADC module Z1	VO1	Analog output of current sense amplifier 1 for shunt signal amplification, goes to ADC module Z1 of AURIX™. For functional reasons (synchronization), keep Z1≠Z2; Z2≠Z3; and Z1≠Z3
12	ADC module Z2	VO2	Analog output of current sense amplifier 2 for shunt signal amplification, goes to ADC module Z2 of AURIX™. For functional reasons (synchronization), keep Z1≠Z2; Z2≠Z3; and Z1≠Z3
13	ADC module Z3	VO3	Analog output of current sense amplifier 3 for shunt signal amplification, goes to ADC module Z3 of AURIX™. For functional reasons (synchronization), keep Z1≠Z2; Z2≠Z3; and Z1≠Z3
14	PORTx.A	ERR_N	Input to the MCU for diagnostic purposes
15	PORTy.B	ENA	Enable bridge driver When set to "low", output stages of the gate driver are turned OFF and remain off. Low to high transitions trigger a reset of device-latched errors.
_	_	SOFF_N	Safe Off Not: Switch off paths independent of ENA. Analog active low input pin for external triggering of device safe state. It does not cause the reset of error registers.
_	_	INH_N	Analog active-low inhibit pin. Sets the device into sleep mode for low quiescent current consumption. External FETs are turned off actively before the charge pumps are turned off. Resetting via inhibit requires a new configuration via SPI.
_	_	PFBx	Phase feedback: Not connected to the MCU in this use case

Note: A PWM clock source plausibility check must be considered.

Sensors and their position

A typical EPS includes a set of sensors to read the motor position, the steering angle and the steering torque, as shown in

Figure 44

Torque sensor

The steering torque sensor measures the torque exerted on the steering wheel by the driver. Its signal is fed into the ECU and used to regulate the amount of assisting torque provided by the motor. The torque sensor in this example uses a diverse redundancy such that faults can be detected, for example, using an inverse sensing scheme between the two redundant channels. Torque sensors typically communicate with the AURIX™ TC3xx using the SENT protocol.

An example of a torque sensor is the Infineon hall-based magnetic sensor TLE4999C8, designed for torque sensing applications. One of the key functional safety features is the ability to detect and report faults in the sensor output. This can be achieved through a combination of redundant signal processing and built-in self-test mechanisms.

The redundant signal processing ensures that two independent sensing elements are used to generate differential output signals. These signals can then be compared by the MCU, which does a plausibility check to detect any issue in the measurement chain. The built-in self-test mechanisms allow the sensor to periodically test its own internal components and report any errors or faults to the system control unit.

Figure 45. Dual-die hall-based magnetic sensor characteristic

Another important functional safety feature of the sensor is its ability to withstand harsh environmental conditions, such as high temperatures, humidity and vibration. This can be achieved using specialized packaging and sealing techniques, as well as the integration of built-in protection features such as overvoltage and reverse-polarity protection.

To be easily integrated with other components of the system, such as MCUs and motor drivers, the sensor features a bus-capable digital short-PWM-code (SPC) interface (similar to SENT), which can be easily configured.

Pinout AURIX™ TC3xx to the torque sensor

Figure 46. Infineon TLE4999C8 torque sensor schematic and data example

Table 6

shows the connection that is needed for functional and safety purposes when using TLE4999C8 as a torque sensor chip. It is necessary to visit the Infineon website to check which is the most appropriate and up-to-date chip before making any decisions about the project.

Table 6. AURIX™ TC3xx-TLE4999C8 connections
NR	AURIX™	Torque sensor	Description
1	SENT	SPC	The two sensor outputs are transmitted using the SPC protocol.

Figure 47. Sensor communication to AURIX™ TC3xx via SPC protocol

Steering angle sensor (SAS)

The steering angle sensor is typically mounted at the top of the steering column, close to the steering wheel inside the passenger compartment. It is not required for the basic functionality of the EPS system, but it is required for the electronic stability program (ESP), which assists the driver in critical driving conditions. It usually communicates via SENT or SPI to the MCU.

Some of the key safety features for a steering angle sensor in an EPS application are:

Redundancy
: The SAS should have redundant sensor elements and signal processing circuitry to provide reliable operation in case of a single-point failure. This redundancy can ensure that the sensor continues to operate safely and accurately even in the event of a failure in one of the sensor elements
Diagnostic coverage
: Both internal self-diagnostics and external system-level diagnostics (see below explanation)
Accuracy
: The SAS should provide accurate and reliable position sensing to ensure that the EPS system operates safely and predictably

Figure 48. Dual-die angle output of the steering angle sensor

For the above use cases, it uses a dual-die implementation, that is, two fully redundant measurement outputs in one package. In this example, for both chip 1 and chip 2, two different SENT connections are present to transmit the angle sensor measurement; one pin of each chip is the interface for channel 1 and a second pin of each chip is the interface for channel 2. For each single chip, the MCU will take care of a plausibility check to ensure that the two measures are coherent. In addition, another plausibility check is performed to evaluate the information coming from the two different chips and correlate it.

Figure 49. Two outputs from two dual-die-angle sensors

Pinout AURIX™ TC3xx to the steering angle sensor

Table 7

lists an example of connections needed for functional and safety purposes when using the TLE5014D angle sensor chip with SENT configuration. It is necessary to go to the Infineon website to check which is the most appropriate and up-to-date chip before making any decisions about the project.

Table 7. AURIX™ TC3xx-TLE5014D connections
NR	AURIX™	Angle sensor	Description
1	SENT 1	Chip 1 IFB-1	Input to MCU - SENT/SPC/PWM/SICI interface for channel 1 First redundant sensor
2	SENT 2	Chip 1 IFB-2	Input to MCU - SENT/SPC/PWM/SICI interface for channel 2 First redundant sensor
3	SENT 3	Chip 2 IFB-1	Input to MCU - SENT/SPC/PWM/SICI interface for channel 1 Second redundant sensor
4	SENT 4	Chip 2 IFB-2	Input to MCU - SENT/SPC/PWM/SICI interface for channel 2 Second redundant sensor

Figure 50. Sensor communication to AURIX™ TC3xx via SENT

Rotor position sensor

The rotor position sensor is mounted directly at the end of the shaft of the EPS motor, which is commonly a highly efficient BLDC motor. It can use multiple analog signals (sin and cos, resolver) or a digital signal (SENT). There are multiple options for sensing the motor position, for example, magnetic sensors, resolvers and encoders.

Figure 52. Example of a rotor position sensor with redundancy and diversity

One way to sense motor position is to use a sensor that includes two independent sensing elements, with each element providing a single/differential output signal for both the sine and cosine components of the magnetic field. This results in a total of two/four signals, with one/two sets of redundant signals for each component. The redundant single/differential signals for the sine and cosine components provide several benefits for functional safety.

The bottom sensor element is an anisotropic magnetoresistance (AMR) sensor. Therefore, in the angle range of 180° to 360° of the giant magnetoresistance (GMR) sensor, the AMR sensor output signal will be in the range of 0° to 180° again. This is represented by the blue line in

Figure 52

In case a diverse output of the two sensors is desired, the connections to the SIN_N and SIN_P or COS_N and COS_P pins on the printed circuit board can be interchanged. The consequence of this change of connections is that either the differential sine or the cosine signal is inverted, as represented by the dotted blue line in

Figure 52

If differential signaling is used (8 connections instead of 4), this can reduce the effects of electromagnetic interference (EMI) on the signal, which can improve the accuracy and reliability of the sensor. This solution also provides a redundancy feature that can help detect and isolate faults in the signal.

Pinout AURIX™ TC3xx to the rotor position sensor

Table 8

lists an example of connections needed for functional and safety purposes when using TLE5309A16(D) as a rotor position sensor chip. It is necessary to go to the Infineon website to check which is the most appropriate and up-to-date chip before making any decisions about the project.

The TLE5309A16(D) sensor can be used in single-ended or differential output mode.

Figure 52

shows a typical application circuit for the TLE5309A16(D) in single-ended output mode using the positive output channels. For single-ended operation, positive or negative output channels can be used. Unused single-ended output pins should preferably be floating or connected to GND with a high-ohmic resistance (>100k Ω). The TLE5309A16(D) contains separate supply pins for the GMR sensor and the AMR sensor.

Table 8. AURIX™ TC3xx-TLE5309A16(D) connections
NR	AURIX™	Rotor position sensor	Description
1	DSADCX_a	SIN_P1	Input to cluster X of DSADC
2	DSADCX_b	COS_P1	Input to cluster X of DSADC
3	DSADCY_a	SIN_P2	Input to cluster Y of DSADC
4	DSADCY_b	COS_P2	Input to cluster Y of DSADC
5	EVADCZ1.a	VDIAG1	Input to cluster Z1 of DSADC
6	EVADCZ2.b	VDIAG2	Input to cluster Z2 of DSADC

Note: To be able to detect any common-cause failure coming from the MCU, it is recommended that X≠Y and Z1≠Z2. In other words, there must be enough independence between the two redundant ADC acquisitions (P1 and P2) and the same is true for the VDIAGx signal. Separation between sin_P1 and sin_P2, cos_P1 and cos_P2, VDIAG1 and VDIAG2 pins also need to be considered, as explained in Section Considerations on common-cause failures on pins and packages

CAN transceiver

To enable the EPS to interact with the entire car system, an integrated circuit that provides the CAN physical layer is needed. For this reason, a CAN transceiver IC must be selected to enable the AURIX™ TC3xx MCU to communicate using the specific bus protocol.

Some of the key features of a robust CAN transceiver are:

Low current consumption allows the system to accomplish the CAN communication with a small power budget
Fail-safe features such as TxD time-out, RxD recessive clamping (that is, fail-safe feature that prevents sending data on the bus if the RxD line is clamped to high) and overtemperature shut-down allow the system to behave in a predictable manner in safety-critical situations. Other safety measures also report the CAN short circuit proof to ground, battery and VCC, as well as undervoltage detection for the supply voltages
Local failure diagnostics should also be implemented by specifically designed output pins

This IC should be chosen by considering the earlier mentioned characteristics to allow for a safe and reliable communication with the system, as mentioned in Section

Safe E2E communication

Figure 53. CAN TLE9252V transceiver pinout

Pinout AURIX™ TC3xx-CAN transceiver

Table 9

shows the AURIX™ TC3xx-TLE9252V connections.

Table 9. AURIX™ TC3xx-TLE9252V connections
NR	AURIX™	CAN	Description
1	_	CANH	Output of the transceiver to the CAN bus line
2	_	CANL	Output of the transceiver to the CAN bus line
3	PORTX1.A1	NERR	Error flag output, failure and wake-up indication
4	PORTX2.A2	WAKE	Input, sensitive to rising and falling edges
5	PORTX3.A3	NSTB	Standby control input
6	_	INH	Output from external control circuitry—not to be connected to the MCU
7	CAN RX	RxD	Receive data output from the MCU
8	CAN TX	TxD	Transmit data input from the MCU

FlexRay communication (optional)

FlexRay is often used instead of CAN in applications where high data transfer rates, deterministic communication and fault tolerance are critical. CAN is a widely used communication protocol in the automotive industry and is suitable for many applications, but it has some limitations that make it less than ideal for safety-critical systems.

One of the key advantages of FlexRay over CAN is its higher data transfer rates. FlexRay supports data transfer rates of up to 10 Mbps, which is significantly faster than the bit rate supported by CAN. This makes FlexRay ideal for applications where large amounts of data need to be transferred quickly and reliably, such as in advanced driver assistance systems (ADAS) and autonomous vehicles.

Another advantage of FlexRay is its deterministic communication. Unlike CAN, which uses a non-deterministic arbitration scheme to resolve conflicts between nodes competing for access to the bus, FlexRay uses a deterministic scheduling scheme that guarantees message transmission times and latencies. This is important for safety-critical systems, where timing is critical and unpredictable latencies can lead to system failures.

FlexRay is also designed to be fault-tolerant, which means that it can continue to operate even if one or more nodes in the network fail. This is important for safety-critical systems, where a single failure can have dangerous consequences.

Some examples of communications in a car that may be implemented using FlexRay instead of CAN include:

Advanced driver assistance systems (ADAS)
: FlexRay can be used to transmit sensor data (for example, radar, lidar, camera) to an ADAS control unit for processing and decision-making
Brake-by-wire systems
: FlexRay can be used to transmit signals from the brake pedal to the brake actuator, providing a more responsive and reliable braking system
Electric power steering (EPS) systems
: FlexRay can be used to transmit steering angle and torque data between the EPS control unit and the steering motor, allowing for precise and accurate steering control
Active suspension systems
: FlexRay can be used to transmit data between suspension sensors and the suspension control unit, allowing for real-time adjustment of the suspension system

CAN is still widely used in many automotive applications and is often the preferred choice for simpler, less demanding communication tasks; however, for safety-critical systems or applications that require higher performance, more robust communication and deterministic timing, FlexRay may be a better option.

Summary

It is important highlighting that it is not possible to add safety features in the last phase of the project development; otherwise, it can be that the MCU does not have enough resources in terms of GPIO and peripherals for redundancy. The correct set of safety features needs to be established at the time the MCU is chosen; otherwise, important safety mechanisms can be missing.

This section underlines that the choice of the correct chipset that is used in a safety application requires a good knowledge of application-related risks, safety goals and safety measures required by each specific functionality.

New trends

For automated driving vehicles or for heavy vehicles, the EPS system must be highly available (ISO 26262: safety-related availability). Typically, two EPS systems are used to achieve fail-safe operational steering. These systems individually apply the same safety methods listed above. The two EPS chipsets either operate in parallel or operate in “hot standby” for the second channel.

In case one channel detects an error, it will stop operation (fail silent). However, in contrast to the standard, there are mechanisms installed that automatically detect a fail silent situation on the other channel, such that the remaining channel can take over the full operation and report the situation to higher-level systems, informing the driver.

Figure 54. Transition from Fail-Safe to Fault-Tolerant EPS

ISO 26262 - XEV traction inverter

The most important system-level blocks of a powertrain system in an electric vehicle (EV) are the electric motor itself, the traction inverter drive, the DC/DC converter, the high-voltage Li-ion battery and the on-board charger (OBC). The traction inverter system is described in detail in the subsequent sections. This system needs to ensure that the vehicle powertrain is operating safely and at optimum efficiency.

Figure 55. Simplified block diagram of the traction inverter system

To control the traction inverter, Infineon’s AURIX™ TC3xx MCU family implements advanced features for signal acquisition with the highest safety level (multicore and lockstep architecture, DS-ADC-enabled direct resolver-to-MCU interface, customized PWM pattern generation). Surrounding the 3-phase power stage that will contain Si IGBTs or SiC MOSFETs, there are driver devices that translate the signals from the MCU and provide the necessary isolation.

To provide the multiple rails needed in a traction inverter, Infineon OPTIREG™ power management IC (PMIC) products offer integrated, multi-rail solutions specifically for the harsh automotive environment. Fast 750 V and 1200 V switching devices such as CoolSiC™ MOSFETs are best driven by galvanically isolated gate-driver ICs.

Infineon's EiceDRIVER™ gate driver ICs incorporate the essential features necessary for driving SiC MOSFETs, such as, for example, overcurrent protection, an under- and overvoltage lockout mechanism on all supply lines and support for active short circuit and freewheeling strategies. In addition, the drivers support ASIL-D on the system level, with additional monitoring and supervision functions being integrated to simplify design for safety-relevant applications, including ISO 26262 compliance.

Current sensors in the motor phases and a position sensor on the rotor of the e-motor provide the necessary feedback for precise and energy efficient torque control of the motor, as current sensing is one of the essential measurements within a traction inverter. The Infineon XENSIV™ products offer high-precision miniature coreless magnetic current sensors for AC and DC measurements with an analog interface and fast over-current detection outputs.

Figure 56. Simplified system example of traction inverter

Most state-of-the-art electric vehicles use permanent magnet synchronous machines (PMSM) as the main electric traction motor, typically with three phases. The power range of these motors is from 20 KW up to >100 KW and they normally have 3 or 4 pole pairs with up to ≤25 krpm and a max acceleration of 115.000 rad/s², which requires an output response of ≤ +/-0.2° for 10 krpm. Also, asynchronous e-motors are available, which have less efficiency but the same torque range respect to PMSM. They are anyway bigger and weigh more.

As the manufacturing of permanent magnets requires rare earth material and the recycling process of PMSM often results in the loss of the permanent magnets, the market share of externally excited synchronous machines (EESM) is increasing. These machines generate the magnetic field of the rotor by using copper coils, which are excited by DC current instead of permanent magnets.