AN1001 FuSa in a Nutshell - Introduction to functional safety
About this document
Scope and purpose
As requirements from functional safety standards in automotive, industrial and other fields are a challenging subject, this document intends to provide a first set of guidelines for users who are unfamiliar using Infineon microcontrollers unit (MCU) and other complex chips in a functional safety scope.
This application note is part of a series of document named “FuSa in a nutshell” and listed in
3
.
Intended audience
This application note is intended for all those evaluating Infineon MCUs and other complex chips, including functional safety engineers on the customer side and application engineers. This includes designers of safety-related systems who:
Are new to functional safety
Want to know more about functional safety (also called “FuSa”) applications
Want to understand in principle how functional safety can be implemented with hardware support
Are looking for functional safety details that cannot be found in the user manual of the product
Structure of the document
To explain how to proceed when facing functional safety aspects using Infineon products, the following sections provide a brief introduction to basic safety concepts.
Disclaimer
This series of documents named "FuSa in a nutshell" are for training purposes only and are not to be taken as a blueprint for developing electronic controls unit.
Introduction to main safety concepts
Functional safety
Functional safety defines an entire domain of modern industrial activities. In general, safety is used in relation to situations that can cause harm to humans or generally, the risk of physical injury or damage to the overall health of people (that is, a safe system will not cause harm to humans). In general, no system can be created completely safe, so the functional safety domain focuses on reducing the risk of harm to an acceptable level. The acceptable level is society-dependent and can be differently evaluated depending on the social context.
Functional safety is described as follows:
In the umbrella standard (IEC 61508:2010):
As part of the overall safety that relates to the following:
Equipment under control (EUC)
Control system of the EUC that depends on the correct functioning of the Electric/Electronic/Programmable (E/E/PE) safety-related systems
Other risk reduction measures
In the automotive standard (ISO 26262:2018):
Absence of unreasonable risk due to hazards caused by the malfunctioning behavior of E/E systems.
The electronic components are clearly mentioned in the above two definitions; therefore, this domain is relevant to semiconductors.
The functional safety process starts with a hazard analysis and risk assessment (HARA) of the relevant system or subsystem by suitably qualified and experienced personnel.
From the analysis and assessment, individual safety goals are defined with the specific objective of avoiding harm during an operational condition of the vehicle/appliance or of the automated action in general.
To each of these goals, a corresponding safety integrity level (SIL) as specified in the umbrella standard IEC 61508 is assigned based upon the risk evaluation. In the automotive domain, the acceptable risk level is called Automotive Safety Integrity Level (ASIL).
From the system level, the safety goals are translated into safety requirements for subsystems and individual hardware components. Once the design is complete, verification is carried out by a combination of the component manufacturer and the system manufacturer following the 'V'-model.
Systematic and random faults
Faults in a functional safety system can be broadly classified into the following two categories:
Systematic faults: A fault in design or manufacturing that can be present in hardware and software. The existence of systematic faults can be reduced through continual and rigorous process improvement and robust analysis of any new technology or component.
Random faults: A fault of a hardware element that follows a probabilistic distribution. Random faults are limited to hardware. The rate of random faults cannot be reduced. It is important to keep the focus on:
Prevention measures such as process and design (for example, layout rules)
Detection and mitigation by safety mechanisms (for example, ECC, redundant data storage)
Figure 1. Faults classification
Random hardware faults can be permanent or transient. If the fault is permanent, it will stay there over time.
In case where errors are transient, they can be removed by writing or resetting or setting a new value. In
Figure 2
, it is possible to find a simplified representation of the major cause of transient faults in semiconductors. Alpha and neutron particles cause transient faults that need to be considered when determining the failure rate of a chip.
Figure 2. Alpha particles and neutron particles as possible causes of transient failures
ISO 26262 and IEC 61508 standards perspective
AURIX™ TC3xx was initially developed for automotive systems and is compliant with the ISO 26262:2018 standard. At the same time, compliance with IEC 61508:2010 was also assessed.
summarizes the main differences between the two standards relating to their applicability to AURIX™ TC3xx.
Section | ISO 26262 | IEC 61508 |
---|---|---|
Application field | 12-part standard that is strictly for on-road vehicles, such as passenger cars, trucks, buses and motorcycles, covering the concept up to the production stage for electrical/electronic systems. This standard is tailored to the needs of the automotive industry. Originated from IEC 61508 for automotive. | 7-part industrial-related standard; most often used for machinery, oil wells, chemical plants, nuclear sites, forklifts and robots. This standard refers to industrially relevant technical standards for EMC, communication and cybersecurity. |
Safety classification | Classification is based on Automotive Safety Integrity Levels (ASIL). ASIL: A (least stringent), B, C, D (most stringent) | Classification is based on Safety Integrity Level (SIL). SIL: 1 (least critical), 2, 3, 4 (most critical) |
Functional Safety | definition is in ISO 26262-1:2018 clause 3.67 | definition is in IEC 61508-4:2010 clause 3.1.12 |
Areas covered | it covers safety management, system/HW design, SW design, production and operation of safety-critical E/E/PE systems, but the same is valid for components. | Covers safety management, system/HW design, SW design, production and operation of safety-critical E/E/PE systems. |
“Components” view | Automotive systems distinguish system design from hardware component design. “Components” used in the system require specific compliance with the ISO 26262 standard. One life cycle for all (tailoring concept). ISO 26262-11 is specific for semiconductor development. | A hardware component compliant with IEC 61508 is called a “compliant item”. The HW component life cycle is introduced for “ASICs”. ISO 61508-2 Annex E and F are for semiconductors. |
How safety is implemented | The safety goal concept requires risk reduction to be a part of the initial control system design. | The safety function concept was initially based on the idea of defining equipment under control (EUC) and then building risk reduction measures for the system. |
Documentation | ISO 26262 clearly defines work products for each requirement. Confirmation reviews with independent reviewers, dependent on ASIL, are requested. | General considerations on documentation are reported in Part 1, Clause 5. No confirmation reviews are requested; only assessments with independent assessors. Relating documents to be provided, there are less detailed requirements (no WPs). |
SIL and ASIL determination | To determine the ASIL level of a system, a risk assessment must be performed for all hazards identified. Risk comprises three components: severity, exposure and controllability. | The SIL level of a product is determined by three factors: Systematic capability
rating: If the quality management system meets the requirements of IEC 61508, a SIL capability rating is issued. Architectural constraints for the
element: Architectural constraints are established by Route 1H or Route 2H. Route 1H involves calculating the Safe Failure Fraction for the element. PFH (or PFDavg) calculation for
the product: PFH is the average frequency of a dangerous failure of the safety function [h-1] for high demand mode of operation or continuous mode of operation, while PFDavg is the average probability of a dangerous failure on demand of the safety function operating in low demand mode of operation. |
Corresponding terms | Item Defined in ISO 26262-1:2018 respectively at clause 3.41 | Functional unit Defined in IEC 61508-4:2010 respectively at clause 3.4.5 |
Corresponding terms | Element, Fault, Failure Defined in ISO 26262-1:2018 respectively at clause 3.41, clause 3.54 and clause 3.50 | Element, Fault, Failure Defined in IEC 61508-4:2010 respectively at clause 3.4.5, clause 3.6.1 and clause 3.6.4 |
Decomposition versus synthesis | ASIL decomposition is defined in ISO 26262-1:2018 clause 3.3 An ASIL D safety requirement can be decomposed as: ASIL D (D) + ASIL QM (D) or ASIL C (D) + ASIL A (D) or ASIL B (D) + ASIL B (D) | According to IEC 61508-2:2010, SIL synthesis essentially allows the synthesis (or combining) of two redundant elements with a systematic capability of ‘N’ to have a systematic capability of ‘N + 1’, with ‘N’ less than or equal to SIL 3. The rules for SIL synthesis according to IEC 61508 are:
The IEC 61508 standard does not allow recursive SIL synthesis and in addition, the two combined elements should have the same SIL. IEC 61508 also requires a two-channel implementation for SIL 4 systems (the hardware fault tolerance has to be >0 for a SIL 4 function). |
Failure rate (λ) Expressed in FIT (see Section Failure rate ) | λ = λ SPF + λ RF+ λMPF +λ S | λ = λ S + λ D = (λ SD + λ SU ) + (λ DD + λ DU ) |
Definitions for the different component of failure rate | λ SPF – Single-point faults λ RF – Residual faults λ MPFDP – Detected/perceived multi-point faults λ MPFL – Latent multi-point faults λ MPF = λMPFDP + λMPFL – Multi-point faults λ S – Safe faults Expressed in FIT | λ S – Safe failure rate: No impact on safety function λ SD – Safe detected failure rate − λ SU – Safe undetected failure rate • λ D – Dangerous failure rate − Impact on safety function − λ DD – Dangerous detected failure rate − λ DU – Dangerous undetected failure rate Expressed in FIT |
Metrics | In automotive systems, metric targets are mandatory on the item level and are related to both single- and multi-point faults. | In IEC 61508 metrics, the most relevant factors are single-point faults, even if they include common cause evaluation through a β factor. |
Probabilistic metrics | Probabilistic Metric for Random
Hardware Failures (PMHF): Quantitative criteria for the residual risk of a safety goal violation due to random hardware failures. In simple terms: A metric to show the robustness of a safety architecture. PMHF = λ SPF + λ RF + 0,5 × λ SM1, DPF, latent × λ IF, DPF × T lifetime Expressed in FIT | In an architecture without
redundancy (1oo1) PFH = λ DU PFH definition is in IEC 61508-4:2010 clause 3.6.19 Expressed in FIT |
Similar metrics terms | Single Point Fault Metric
(SPFM): Quantitative criteria for the effectiveness of the safety architecture to cope with single-point and residual faults.In simple terms, metric for the share of remaining dangerous faults in relation to all faults. Expressed in percentage | Safe Failure Fraction (SFF): Ratio of safe and dangerous (but detected) failures in a system safety function to the total failure rate. SFF exact definition is in IEC 61508-4:2010 clause 3.6.15 SFF is calculated at the element (component) or system level for a safety function. It should not be applied to sub-elements. Expressed in percentage |
Metrics terms unique to ISO | Latent Fault Metric
(LFM): Quantitative criteria for the effectiveness of the safety architecture to cope with latent dual-point faults. In simple terms: A metric for the share of remaining critical latent faults in relation to all dual-point faults. Expressed in percentage | |
Terms unique to IEC | Low-demand mode safety functions are required to operate at low frequencies, typically once or more per year. Low-demand functions have less stringent requirements on PFDavg (the average probability of a dangerous failure on demand of the safety function) to achieve a specific SIL. High-demand mode safety functions are required to operate at high frequencies, typically many times per hour. High demand and continuous demand functions have more stringent requirements on PFH (average frequency of a dangerous failure of the safety function) to achieve a specific SIL. Continuous-demand mode safety functions operate continuously. For more details refer to IEC 61508-4:2010 clause3.5.16 Type A products are simple products in which all failure modes are known. For more details refer to IEC 61508-2:2010 clause 7.4.4.1.2. Type B products are complex products in which not all failure modes are known (for example, semiconductor). For more details refer to IEC 61508-2:2010 clause 7.4.4.1.3. Hardware Fault Tolerance
(HFT) HFT is the number of faults that can occur without failure of the safety function. A hardware fault tolerance of N means that N+1 is the minimum number of faults that can cause a loss of the safety function. For more details refer to IEC 61508-2:2010 clause 7.4.4.1. For AURIX™ TC3xx, HFT is equal to 0. This means that the fault might be detected, but safety functionality is lost with one fault. With a hardware fault tolerance of 0 (in other words, 1oo1 redundancy), the maximum safety integrity level that can be achieved by a Type B (complex semiconductor) safety-related element is SIL 3. HFT > 0 requires redundancy. | |
Fault Tree Analysis | A Fault Tree Analysis or equivalent top-down analysis is required in the case of ASIL C and ASIL D. | A Fault Tree Analysis or equivalent is only “R” (recommended) in IEC 61508. |
Dependent Failure Analysis | DFA is the analysis to identify single events that can cause multiple sub-parts to malfunction (for example, intended function and its safety mechanism) and lead to a violation of a safety requirement or safety goal. DFA is qualitative in automotive standard. | DFA is quantitative and faults in the diagnostic circuit can contribute to FMEDA metrics with the so-called beta factor. |
Safety Element out of Context (SEooC) in automotive
AURIX™ TC3xx is an MCU developed for various applications.
Since it is not tailored for a specific item, according to automotive safety standard ISO 26262 part 10, the AURIX™ TC3xx is a Safety Element out of Context (SEooC) hardware component.
As ISO 26262-10:2018 highlights, the development of an MCU starts with an assumption of system-level attributes and requirements. It is the responsibility of the system integrator to integrate the SEooC assumptions of use.
According to the ISO 26262 classification, the MCU is a hardware component that performs a set of functions at the item level as a part of a system. A system, as it is defined in ISO 26262-1, is composed of at least three related elements: a sensor, a controller and an actuator.
Figure 3
shows the typical use of the AURIX™ TC3xx in the context of an electronic control unit (ECU).
Inputs are provided by one or more sensors at the system level, processed by the HW components on the ECU and forwarded to the input channels of the MCU.
The MCU processes the data and provides outputs to other hardware components.
Hardware components drive one or multiple actuators or transmit data to another ECU via a communication network.
Figure 3. AURIX™ TC3xx in the context of an electronic control unit (ECU)
Fail-safe system
A system is said to be fail-safe if it is designed such that in the event of a failure of any element of the system, the system prevents harm to humans.
This is accomplished by having the system enter a safe state if any safety-relevant failure occurs or if it detects a “latent” failure that cannot be corrected immediately.
Failure rate
Failure rate is the frequency or rate with which a system or component fails, expressed in failures per hour.
Symbol: λ(lambda)
Unit: 1 FIT = 10
-9
h
-1
(failure in time)
Failure rates scale depending on time and the number of systems or components.
Examples of different meanings of 1 FIT:
If there are 10
9
systems or components, one of them will fail every hour.
or
If there are 10
5
systems or components working 104 hours consecutively, one of them will fail.
Fault-related timings
ISO 26262 perspective
One of the key metrics for a functional safety system is the time to reach a safe state after a fault occurs.
This period, known as the Fault Handling Time Interval (FHTI), is the sum of two elements:
Fault detection time (FDTI)
Fault reaction time (FRTI)
A more commonly used term, similar to FHTI, is the Fault Tolerant Time Interval (FTTI) , which is defined in ISO 26262-1:2018 clause 3.61 and provides the minimum time before a system could become dangerous when a fault occurs.
shows a graphical representation of the relationship between these timings.
Figure 4. Fault Tolerant Time Interval
The worst case for the fault detection time is application-specific and defined by the diagnostic time interval. All hardware safety mechanisms within AURIX™ TC3xx hardware provide a very fast fault detection time, in the order of microseconds.
IEC 61508 perspective
A term corresponding to FTTI in the IEC 61508 standard is the “process safety time”. This time is defined in IEC 61508-4:2010 at clause 3.6.20. In general, the time to react to a fault is longer in industrial applications with respect to automotive ones.
Protective measures
When the need for a protective measure is identified and the classification is determined, the measure must be implemented in the system.
Safety systems can have various principles of operation, for example:
One single device is inherently fail-safe (so without integrated primary or secondary protection).
One single device with periodic self-testing and monitoring, where the control layer and primary and secondary protection layers are integrated into one single device.
Two independent devices are compared using the same or different technology. Secondary protection is provided by the comparison.
Single device, inherently fail-safe
Electronic fail-safe devices can include fuses, circuit breakers or current-limiting circuits, which interrupt electrical currents under overload conditions. As a result, they directly prevent damage to wiring or circuit devices.
Single device with periodic self-testing and monitoring
One of the most common safety architectures is what some industrial standards call a “single device with periodic self-testing and monitoring”. In this architecture, protective measures can be implemented in a number of layers, as shown in
Figure 5
.
Figure 5. Layers of safety systems in the case of a single device with periodic self-testing and monitoring
Safety-classified functionalities that will lead directly to a hazard are implemented through a control layer plus a primary and secondary protection layer. This means that the system needs to be safe even when two independent faults occur.
The worst case is when two faults happen, one in the control layer and another in the primary protection layer, at a time distance that depends on the acceptable risk for the system (normally 12–24 hours in the most restrictive case). Statistically, it is considered that there is a very low probability that more than two independent faults occur.
The functional layer is intended as the component necessary for the control tasks such as receiving signals from sensors and sending control signals to actuators. This is referred to as the “control layer”. In the absence of any protective measures, failures in combination with normal conditions in the control layer can directly lead to a hazardous situation, such as sending a spurious control signal to operate a valve. Such failures are considered “critical failures”.
A second layer is necessary to implement safety measures to detect critical failures. These measures can be considered as forming the second functional layer (primary protection), whose task is to initiate a protective action in the event of a critical failure in combination with all defined “normal conditions”.
Faults that remain without leading to a critical failure are considered latent faults. Latent fault diagnostics can be executed with a lower frequency with respect to faults leading to safety-critical failures. This kind of fault, normally occurring in the protective function, nevertheless leads to a hazardous situation, even years later, in combination with a second fault.
It will be necessary to incorporate safety measures that prevent such a situation. To prevent a dropout of primary protection due to a latent fault, the proper functioning of the “safeguards” is supervised. The necessary function can be considered a third functional layer (secondary protection).
By implementing primary and secondary protection layers, a function with a high safety rating can be realized.
Two independent channels with comparison
Figure 6. Layers of a safety system using two devices with comparison
When adopting the technique of two independent channels with comparison, these two can use the same or different technology targeting the same function. In other terms, it includes homogeneous redundancy or redundancy with diversity.
When applying diversity to a system, it is not necessary to use hardware components from different manufacturers; the goals can also be achieved by using components from a single manufacturer.
This approach is limited to detecting that there is a fault but not determining where the fault is, as opposed to redundant systems with higher number of instances where the majority of voters will determine which channel is faulty (this is, for example, the case of at least two channels giving the same information over three channels present).
The final layer of protection is then provided by the comparator. The comparator itself will be guaranteed in its functionality; therefore, tests need to be run on the comparator to detect faults leading directly to a hazard or to cover latent faults. The comparator itself should also be free from systematic faults as per the rest of the system.
References
ISO 26262:2018 Road vehicles - Functional safety
; Available online
IEC 61508:2010 Functional safety of electrical/electronic/programmable electronic safety-related systems
; Available online
Infineon Technologies AG:
AN1000 - FuSa in a Nutshell - release note
; Available online
Glossary
Definition | Description | Notes |
---|---|---|
Architectural Element | The smallest element on which the FMEDA is performed | |
ASIL | Automotive Safety Integrity Level; refer to ISO 26262-1:2018, 3.6 | |
CCF | Common-Cause Failure; refer to ISO 26262-1:2018, 3.18 | |
DC | Diagnostic Coverage; refer to ISO 26262-1:2018, 3.33 | |
DFA | Dependent Failure Analysis identifies single events that can cause multiple sub-parts to malfunction (for example, intended function and its safety mechanism) and lead to a violation of a safety requirement or safety goal. | |
DPF | Dual-Point Failure; for the definition refer to ISO 26262-1:2018, clause 3.38 | |
ECU | Electronic Control Unit | |
FHTI | Fault Handling Time Interval is defined in ISO 26262 as the sum of three elements: The fault detection time, the fault reaction time and the time for the system to reach a safe state. | |
FTTI | Fault Tolerant Time Interval; for the definition refer to ISO 26262-1:2018, clause 3.61 | |
FMEA | Failure Mode and Effects Analysis | |
FMEDA | Failure Modes, Effects and Diagnostic Analysis Analysis of the effect of random hardware faults on a safety requirement or safety goal, including quantitative estimation of failure rates and the probability/rate of a safety goal violation | Quantitative Bottom-up HW only |
FTA | Fault Tree Analysis Analysis in which a top-level failure mode is broken down to a combination of lower-level faults (root causes) using a Boolean logic approach | Qualitative (may be quantitative) Top-down HW only |
HARA | Hazard Analysis and Risk Assessment; Refer to ISO 26262-1:2018, 3.76 | |
HW | Hardware | |
IC | Integrated Circuit | |
IEC | International Electrotechnical Commission | |
ISO | International Organization for Standardization | |
MCU | Microcontroller unit | |
SW | Software |
Revision history
Document revision | Date | Description of changes |
---|---|---|
V1.00 | 2024-08-26 | Initial release |
V1.10 | 2025-05-28 | Template update. Updated Disclaimer in About this document |