Microcontroller (ARM Cortex-M class) — Failure Modes & Failure Rate
Programmable logic device that runs the safety-related software. Hardware λ is dominated by single-event upsets (SEU), latch-up, and flash-retention loss — but the safety-related failure landscape is also driven by software and is what ISO 26262 IEC 62304 actually addresses.
Failure modes
Single-event upset (SEU) in RAM / registers
- Root causes
- Cosmic-ray neutron strikes a sensitive node in SRAM, flipping a bit. Rate scales with altitude (10× at aviation altitudes) and process node (smaller geometry = more sensitive).
- Detection
- ECC on critical RAM / register banks; lock-step CPU with comparator; periodic memory checksum.
- Mitigation
- Specify SEU-rated parts for aerospace; implement ECC (single-error-correct, double-error-detect) on critical state; use lock-step or 2-out-of-3 voter architecture for ASIL-D.
Hard fault / latch-up
- Root causes
- Voltage transient triggering parasitic SCR action between substrate and well; sustained over-temperature.
- Detection
- Loss of execution; supply-current spikes to a stuck-high value; watchdog times out.
- Mitigation
- Latch-up immune process (DMOS isolation); supply transient suppression; current-limited supply with thermal cut-off.
Flash-retention loss
- Root causes
- Charge leakage from floating-gate or charge-trap cells over multi-year storage; accelerated by elevated temperature.
- Detection
- CRC of code regions at boot; flash error-corrector flagging uncorrectable cells.
- Mitigation
- ECC on flash; periodic code-region CRC verification; specify automotive-grade parts with 15-year retention guarantee for long-life applications.
Stuck-at fault (manufacturing defect surfaces in field)
- Root causes
- Latent process defect that wasn't caught at outgoing test — surfaces under thermal or voltage stress.
- Detection
- Self-test (BIST) at boot; stuck-at testing of digital logic.
- Mitigation
- Comprehensive boot self-test; vendor selection for known-good test coverage; ASIL-rated parts (with SEooC supplier safety manual).
Typical applications
ECUs in automotive (engine, brake, steering, ADAS); embedded controllers in industrial, medical, consumer; the heart of every modern safety-critical embedded system.
How to model in a fault tree
For ISO 26262 ASIL work, the microcontroller is rarely a single basic event — it's modelled with separate basic events for SEU, latch-up, hard-fault, and software faults under the ECU sub-tree. Diagnostic coverage from the safety mechanisms (lockstep, ECC, BIST) reduces the contribution of each to the PMHF. For lockstep dual-core MCUs (Cortex-R5, Cortex-M7 with DCLS), the architectural metric SPFM is the headline number, not raw λ. See PMHF for how these combine into the ASIL target.