Home / Guides / PMHF, SPFM, LFM for ASIL D
ISO 26262 · Hardware metrics

Hardware random-failure metrics for ASIL D — PMHF, SPFM, LFM

ISO 26262 Part 5 imposes three quantitative hardware-level metrics that any ASIL D component must clear: SPFM (Single Point Fault Metric), LFM (Latent Fault Metric), and PMHF (Probabilistic Metric for Hardware Failures). Each measures a different way the architecture can fail, and passing one says nothing about whether the others pass. The most common audit finding is a team that computed PMHF, declared the design ASIL D, and never noticed an SPFM at 96% on a 99% target. This guide explains the underlying fault classification, defines each metric formally, walks an AEB ECU through all three, and names the operational mistakes.

≈ 19 min read Worked example: AEB ECU (ASIL D) Standards: ISO 26262 Part 5 §8–9

What's actually being measured — the hardware fault classification

Before any of the three metrics make sense, the underlying classification of hardware faults has to be in place. ISO 26262 Part 5 §B.1 splits every hardware fault — every transistor failure, every solder-joint break, every register stuck-bit — into one of four buckets, each contributing differently to the safety case. The classification is per-failure-mode, not per-component: a single resistor can have an "open" failure mode that is safe and a "short" failure mode that is dangerous, contributing to different buckets.

BucketSymbolDefinitionContributes to
Safe faultλSFailure mode that cannot violate the safety goal in any operational context. E.g. a status LED dies — informational only.Nothing harmful; ignored by all three metrics.
Single-point faultλSPFSingle hardware failure that directly causes safety goal violation, with no diagnostic mechanism in place to detect it.SPFM (must be small) and PMHF (large contribution).
Residual faultλRFSingle hardware failure that would cause safety goal violation, but is partially covered by a diagnostic — the residual is the uncovered fraction.SPFM (must be small) and PMHF (residual contribution).
Multi-point / latent faultλMPFFailure that, on its own, doesn't violate the safety goal, but combined with a second independent failure does. "Latent" is the subset that isn't detected before the second failure occurs.LFM (must be small).

The total per-component failure rate is decomposed as λ = λS + λSPF + λRF + λMPF. The bucket assignment is the work of the safety analyst — typically driven by an FMEDA (Failure Mode, Effects and Diagnostic Analysis), which is FMEA augmented with diagnostic-coverage scoring. ISO 26262 Part 5 Annex B walks the FMEDA template; tools like Reliasoft, Ansys medini, and FTA Studio's FMEDA mode produce the per-component breakdown automatically once failure modes and diagnostic coverage percentages are entered.

Three things to internalise about this classification before getting to the metrics:

Why three metrics rather than one A single PMHF target — say 10⁻⁸/h — could in principle be met by a design that has 99.99% diagnostic coverage on a high-failure-rate part. But that 0.01% residual could be a stuck-at fault on a critical bus, which is a far more dangerous failure mode than the same residual probability spread across many low-impact components. SPFM forces the coverage to be high per fault mode, not just on average. LFM forces multi-point latents to be either covered by a diagnostic or recovered before the second fault has time to arrive. PMHF rolls everything up into a single rate. The three together prevent the architecture from gaming any one of them.

Step 1The fault classification, applied — choosing buckets

The bucket assignment for any given component is mechanical once you have the FMEA and the diagnostic mechanism descriptions. Take a representative AEB ECU's main MCU as the running example. The MCU has a per-hour failure rate of λ = 100 FIT (= 10⁻⁷/h), broken down by failure mode and the diagnostic coverage in place:

Failure modeλ contributionEffectDiagnosticDC%Bucket
Core stuck-at40 FITCompute path silently produces wrong outputLockstep dual-core comparison99%0.4 FIT residual + 39.6 FIT covered (safe-via-mechanism)
Core transient (SEU)30 FITTransient wrong output, recovers next cycleECC + lockstep99%0.3 FIT residual + 29.7 FIT covered
RAM bit-flip20 FITWrong data, propagates downstreamECC (single-bit correct, double-bit detect)99% (single), 0% (double)~1 FIT residual + 19 FIT covered
Lockstep monitor stuck-OK5 FITMonitor reports "compare OK" even when cores diverge — a multi-point latentPeriodic self-test of the comparator at startup50% (only catches half)2.5 FIT latent + 2.5 FIT covered
Power-supply brown-out3 FITMCU resets cleanly — fail-safe direction3 FIT safe
Clock source drift2 FITTiming errors, wrong outputsExternal watchdog with independent clock95%0.1 FIT residual + 1.9 FIT covered

Tally per bucket — every failure mode's covered portion lands in the multi-point-detected bucket because "covered" means a diagnostic redirects to safe state, not that the fault itself is intrinsically harmless:

λtotal          = 100 FIT
λS (safe)        = 3 FIT          (power-supply brown-out, fails safe-direction)
λSPF             = 0 FIT          (no fault entirely without a mechanism)
λRF (residual)   = 1.99 FIT       (uncovered fractions of partially-covered SPFs)
λMPF,detected    = 92.51 FIT      (covered by a diagnostic that triggers safe state)
λMPF,latent      = 2.5 FIT        (lockstep-monitor stuck-OK, half not caught by self-test)

The residual breaks down as 0.4 (core stuck-at) + 0.3 (core SEU) + 0.19 (RAM single-bit) + 1.0 (RAM double-bit, no DC) + 0.1 (clock drift). The dominant contributor — the RAM double-bit residual at 1 FIT — accounts for half of all residual fault rate. This is the kind of detail SPFM in Step 2 will surface; PMHF on its own would lose it in the noise.

Step 2SPFM — Single Point Fault Metric

SPFM measures the fraction of fault-rate that is not a single-point or residual fault, computed across the dangerous (non-safe) population. Equivalently: the fraction of the dangerous fault rate that has a working safety mechanism between it and the safety goal. Formally:

SPFM = 1 − (Σ λSPF + Σ λRF) / (Σ λtotal − Σ λS)

The denominator excludes safe faults — they don't matter for the metric. The numerator-of-the-bad-fraction is the part of the dangerous rate that can directly violate the safety goal without a second failure being involved. ISO 26262 Part 5 Table 5 gives the per-ASIL targets:

ASILSPFM target
D≥ 99%
C≥ 97%
B≥ 90%
ANo quantitative requirement

Plugging the AEB MCU bucket totals from Step 1 in:

SPFM = 1 − (0 + 1.99) / (100 − 3)
     = 1 − 1.99 / 97
     = 1 − 0.0205
     = 0.9795 = 97.95%

This MCU passes ASIL C (≥ 97%) but fails ASIL D (≥ 99%) by about a percentage point. The conversation with the design team is exactly the conversation the bucket breakdown teed up: half the residual budget is consumed by RAM double-bit errors with no detection. Either the ECC needs upgrading from SECDED (single-error-correct, double-error-detect — only 99% on single-bit) to SECDED with DUE-trap-to-safe (which moves the 1 FIT into λMPF,detected), or the safety goal has to be met with a different SPFM-friendly mitigation. The metric forces the question; it doesn't suggest the answer.

Why SPFM looks pass-able and isn't For a 100 FIT total component, ASIL D's 99% SPFM target leaves 1 FIT × 1% = a residual budget of about 1 FIT across all failure modes combined. Sounds like plenty until you tally the residuals: every partially-covered failure mode at 99% DC contributes 1% × λ to the residual, which adds up fast across a typical MCU's 6–10 distinct failure-mode classes. Reaching 99% SPFM on a complex component routinely requires multiple diagnostics layered: ECC + lockstep + watchdog + voltage monitoring, with the residuals being whatever leaks past all of them. This is the structural reason ASIL D MCUs cost more than ASIL B ones — not the components themselves but the diagnostic infrastructure required to push residual fault rates below the SPFM threshold.

Step 3LFM — Latent Fault Metric

LFM measures the fraction of multi-point faults that are detected by a diagnostic, recovered, or otherwise prevented from sitting silently in the system waiting for a second fault. Latent faults are the dangerous category here: a stuck-OK watchdog, an ECC mechanism that has degraded, a redundant channel that has failed but not been noticed. The system looks healthy until the second fault hits.

LFM = 1 − (Σ λMPF,latent) / (Σ λtotal − Σ λS − Σ λSPF − Σ λRF)

The denominator is the multi-point-fault population only — i.e. faults that aren't safe and aren't single-point. The numerator is the latent (undetected) subset. ISO 26262 Part 5 Table 6 thresholds:

ASILLFM target
D≥ 90%
C≥ 80%
B≥ 60%
ANo quantitative requirement

For the AEB MCU:

LFM = 1 − 2.5 / (100 − 3 − 0 − 1.99)
    = 1 − 2.5 / 95.01
    = 1 − 0.0263
    = 0.9737 = 97.37%

This passes ASIL D (≥ 90%) comfortably. The MCU's only meaningful latent is the lockstep-monitor stuck-OK fault (2.5 FIT undetected), and even that is a small fraction of the multi-point pool dominated by the covered core and RAM faults.

Two structural points about LFM that change how design teams think about it:

What the two metrics together force the architecture to do SPFM forces: every dangerous fault mode is covered by a diagnostic with high coverage. LFM forces: every diagnostic is itself covered by a meta-diagnostic. The two together say "build redundancy AND verify the redundancy works". A design that meets both is the ISO 26262 hardware-architectural-metrics-pass; a design that meets one and not the other has a structural weakness that PMHF won't catch on its own. The next step's PMHF metric is the integration check, not a substitute for either of these.

Step 4PMHF — Probabilistic Metric for Hardware Failures

PMHF is the rate per hour at which the safety goal is violated due to random hardware failures, averaged over the operational lifetime of the vehicle. Where SPFM and LFM are dimensionless ratios, PMHF is a rate — comparable directly against numerical targets without further interpretation. The Part 5 Table 6 thresholds:

ASILPMHF target
D≤ 10⁻⁸/h (10 FIT)
C≤ 10⁻⁷/h (100 FIT)
B≤ 10⁻⁷/h (100 FIT)
ANo quantitative requirement

For a single-channel architecture with safety mechanisms — which is what our AEB MCU is, considered standalone — the PMHF contribution decomposes as:

PMHF ≈ Σ λRF                                      ← residual fraction of single-point faults
     + Σ λMPF,latent · λpartner · Tlife / 2     ← latent + partner during lifetime

The residual term contributes directly — every uncovered fraction of a single-point fault adds to the rate at which the safety goal is violated. The latent term is the probabilistic combination of "the latent fault has already occurred" with "the partner fault then occurs during the remaining lifetime"; the factor of Tlife/2 is the average exposure window for a uniformly-distributed second-fault arrival. ISO 26262 Part 5 §9.4.2.3 gives the full decomposition with refinements for proof-test intervals and detected multi-point faults; the simplified form above captures >95% of the contribution for typical automotive mission profiles.

Plugging the AEB MCU buckets in, with vehicle lifetime Tlife = 10,000 h and the latent's partner-fault rate ≈ 70 FIT (the cores and RAM the lockstep monitor protects):

Residual term:  Σ λRF     = 1.99 FIT = 1.99×10⁻⁹ /h
Latent term:    2.5×10⁻⁹ · 70×10⁻⁹ · 10,000 / 2
              ≈ 8.75×10⁻¹³ /h     ← negligible

PMHF ≈ 1.99×10⁻⁹ + 8.75×10⁻¹³ ≈ 2.0×10⁻⁹ /h

2×10⁻⁹/h vs an ASIL D target of 10⁻⁸/h. Passes ASIL D with a 5× margin. The latent contribution is four orders of magnitude smaller than the residual contribution — an extreme version of the general pattern, where for component-level PMHF the residual fault rate is what matters and the latent term is a rounding error unless λpartner × Tlife is unusually large.

PMHF is the easy metric to pass and the easy metric to point at Component-level PMHF for a typical ASIL D design comes out around 10⁻⁹/h or below — comfortably under the 10⁻⁸ target. Teams under deadline pressure compute PMHF, see the margin, and report "PMHF: pass, ASIL D met". The reviewer asks for SPFM and LFM next, and the architecture often reveals a 96-97% SPFM that was missed because everyone was looking at the rate. The order in which to compute the metrics is: SPFM first (most likely to fail), then LFM, then PMHF as a sanity-check. Reversing the order is the most common audit finding in ASIL D safety cases.

Step 5Unified verdict on the AEB MCU

The three metrics evaluated against ASIL D targets, side by side:

MetricComputedASIL D targetVerdict
SPFM97.95%≥ 99%FAIL by 1.05 pp
LFM97.37%≥ 90%PASS by 7.4 pp
PMHF2.0×10⁻⁹/h≤ 10⁻⁸/hPASS with 5× margin

The MCU does not meet ASIL D. It meets ASIL C on all three, comfortably. The single failing metric is SPFM, and the single dominant contributor inside SPFM is the RAM double-bit residual at 1 FIT. Three credible design responses:

  1. Upgrade ECC to detect and safely handle double-bit errors. SECDED with DUE-trap-to-safe (the double-bit error triggers a controlled reset to safe state) moves the 1 FIT from λRF into λMPF,detected. New SPFM: 1 − 0.99/97 = 98.98%. Still fails ASIL D, but only by 0.02 pp — close enough that further small improvements (e.g. tightening the 95% clock-drift DC to 99%) push it over.
  2. Re-architect to eliminate the high-residual component. A different memory controller with intrinsically lower failure rate, or a different memory technology (FRAM, MRAM) where the dominant failure mode isn't bit-flips. Expensive but cleanly removes the gating constraint.
  3. Apply ASIL decomposition. The MCU stays at ASIL C; the safety goal moves to ASIL B(D) + ASIL B(D) at the system level (cf. Article 6), and the MCU's per-channel ASIL drops to B. SPFM target relaxes to 90%, which the MCU clears comfortably. The decomposition has its own independence-and-CCF cost, but at the MCU level the metric pressure dissolves.

Notice what didn't make the list: "improve the safety mechanisms we already have". The SPFM shortfall is concentrated in a fault mode that has no diagnostic at all (RAM double-bit). Tightening lockstep coverage from 99% to 99.5% wouldn't shift the answer — it improves a residual that's already small. The metric structure tells the design team where to look; the FMEDA tells them what to fix; the decision is which of the three responses fits the cost / schedule / risk envelope.

The other thing the table makes obvious: this MCU passes PMHF and LFM with substantial margin and fails SPFM by a hair. A team that had only computed PMHF would have shipped this for ASIL D, and the audit would have caught it. The three-metric structure is what prevents the architecture from passing ISO 26262's hardware integrity claim with a fault profile that the standard's authors knew was dangerous. Compute all three. In that order. Every time.

The right order to compute, and the right conversation per metric SPFM first. If it fails, fix the residual fault distribution before doing anything else. Conversation: "which fault modes have inadequate diagnostic coverage, and what would cover them?". LFM second. If it fails, add diagnostics-on-diagnostics. Conversation: "which safety mechanisms can fail silently, and what catches their failure?". PMHF third as the integration check. If SPFM and LFM pass, PMHF will almost always pass too — and if it doesn't, the issue is usually a catastrophic-rate component that needs replacing rather than the diagnostic structure. The metrics are not interchangeable; they are sequenced.

Five pitfalls a reviewer will catch

Where to go next