Home / Guides / PMHF, SPFM, LFM for ASIL D

ISO 26262 · Hardware metrics

Hardware random-failure metrics for ASIL D — PMHF, SPFM, LFM

ISO 26262 Part 5 imposes three quantitative hardware-level metrics that any ASIL D component must clear: SPFM (Single Point Fault Metric), LFM (Latent Fault Metric), and PMHF (Probabilistic Metric for Hardware Failures). Each measures a different way the architecture can fail, and passing one says nothing about whether the others pass. The most common audit finding is a team that computed PMHF, declared the design ASIL D, and never noticed an SPFM at 96% on a 99% target. This guide explains the underlying fault classification, defines each metric formally, walks an AEB ECU through all three, and names the operational mistakes.

≈ 19 min read Worked example: AEB ECU (ASIL D) Standards: ISO 26262 Part 5 §8–9

What's actually being measured — the hardware fault classification

Before any of the three metrics make sense, the underlying classification of hardware faults has to be in place. ISO 26262 Part 5 §B.1 splits every hardware fault — every transistor failure, every solder-joint break, every register stuck-bit — into one of four buckets, each contributing differently to the safety case. The classification is per-failure-mode, not per-component: a single resistor can have an "open" failure mode that is safe and a "short" failure mode that is dangerous, contributing to different buckets.

Bucket	Symbol	Definition	Contributes to
Safe fault	λ_S	Failure mode that cannot violate the safety goal in any operational context. E.g. a status LED dies — informational only.	Nothing harmful; ignored by all three metrics.
Single-point fault	λ_SPF	Single hardware failure that directly causes safety goal violation, with no diagnostic mechanism in place to detect it.	SPFM (must be small) and PMHF (large contribution).
Residual fault	λ_RF	Single hardware failure that would cause safety goal violation, but is partially covered by a diagnostic — the residual is the uncovered fraction.	SPFM (must be small) and PMHF (residual contribution).
Multi-point / latent fault	λ_MPF	Failure that, on its own, doesn't violate the safety goal, but combined with a second independent failure does. "Latent" is the subset that isn't detected before the second failure occurs.	LFM (must be small).

The total per-component failure rate is decomposed as λ = λ_S + λ_SPF + λ_RF + λ_MPF. The bucket assignment is the work of the safety analyst — typically driven by an FMEDA (Failure Mode, Effects and Diagnostic Analysis), which is FMEA augmented with diagnostic-coverage scoring. ISO 26262 Part 5 Annex B walks the FMEDA template; tools like Reliasoft, Ansys medini, and FTA Studio's FMEDA mode produce the per-component breakdown automatically once failure modes and diagnostic coverage percentages are entered.

Three things to internalise about this classification before getting to the metrics:

"Safe fault" is a strong claim. A failure mode is safe only if it provably cannot violate the safety goal — not "unlikely to", not "only matters in edge cases". The fail-safe direction of fault propagation has to be argued. Safe faults are excluded from SPFM denominator, so over-claiming safe-fault status is the most common way to fudge the metric.
Diagnostic coverage is a per-fault-mode percentage. A diagnostic that covers 90% of stuck-at faults but 0% of bridging faults gets credit for the stuck-at modes only; the bridging modes count as fully residual. Lumping the two together at "90% DC" is a frequent FMEDA error.
Multi-point latent faults are easy to ignore and hard to find. They look harmless in isolation. The classical example is a stuck-OK output of a watchdog timer — the watchdog stops doing anything useful, but until the watched-for fault occurs, nothing visible breaks. LFM is the metric that forces this category to be accounted for.

Why three metrics rather than one A single PMHF target — say 10⁻⁸/h — could in principle be met by a design that has 99.99% diagnostic coverage on a high-failure-rate part. But that 0.01% residual could be a stuck-at fault on a critical bus, which is a far more dangerous failure mode than the same residual probability spread across many low-impact components. SPFM forces the coverage to be high per fault mode, not just on average. LFM forces multi-point latents to be either covered by a diagnostic or recovered before the second fault has time to arrive. PMHF rolls everything up into a single rate. The three together prevent the architecture from gaming any one of them.

Step 1The fault classification, applied — choosing buckets

The bucket assignment for any given component is mechanical once you have the FMEA and the diagnostic mechanism descriptions. Take a representative AEB ECU's main MCU as the running example. The MCU has a per-hour failure rate of λ = 100 FIT (= 10⁻⁷/h), broken down by failure mode and the diagnostic coverage in place:

Failure mode	λ contribution	Effect	Diagnostic	DC%	Bucket
Core stuck-at	40 FIT	Compute path silently produces wrong output	Lockstep dual-core comparison	99%	0.4 FIT residual + 39.6 FIT covered (safe-via-mechanism)
Core transient (SEU)	30 FIT	Transient wrong output, recovers next cycle	ECC + lockstep	99%	0.3 FIT residual + 29.7 FIT covered
RAM bit-flip	20 FIT	Wrong data, propagates downstream	ECC (single-bit correct, double-bit detect)	99% (single), 0% (double)	~1 FIT residual + 19 FIT covered
Lockstep monitor stuck-OK	5 FIT	Monitor reports "compare OK" even when cores diverge — a multi-point latent	Periodic self-test of the comparator at startup	50% (only catches half)	2.5 FIT latent + 2.5 FIT covered
Power-supply brown-out	3 FIT	MCU resets cleanly — fail-safe direction	—	—	3 FIT safe
Clock source drift	2 FIT	Timing errors, wrong outputs	External watchdog with independent clock	95%	0.1 FIT residual + 1.9 FIT covered

Tally per bucket — every failure mode's covered portion lands in the multi-point-detected bucket because "covered" means a diagnostic redirects to safe state, not that the fault itself is intrinsically harmless:

λ_total          = 100 FIT
λ_S (safe)        = 3 FIT          (power-supply brown-out, fails safe-direction)
λ_SPF             = 0 FIT          (no fault entirely without a mechanism)
λ_RF (residual)   = 1.99 FIT       (uncovered fractions of partially-covered SPFs)
λ_MPF,detected    = 92.51 FIT      (covered by a diagnostic that triggers safe state)
λ_MPF,latent      = 2.5 FIT        (lockstep-monitor stuck-OK, half not caught by self-test)

The residual breaks down as 0.4 (core stuck-at) + 0.3 (core SEU) + 0.19 (RAM single-bit) + 1.0 (RAM double-bit, no DC) + 0.1 (clock drift). The dominant contributor — the RAM double-bit residual at 1 FIT — accounts for half of all residual fault rate. This is the kind of detail SPFM in Step 2 will surface; PMHF on its own would lose it in the noise.

Step 2SPFM — Single Point Fault Metric

SPFM measures the fraction of fault-rate that is not a single-point or residual fault, computed across the dangerous (non-safe) population. Equivalently: the fraction of the dangerous fault rate that has a working safety mechanism between it and the safety goal. Formally:

SPFM = 1 − (Σ λ_SPF + Σ λ_RF) / (Σ λ_total − Σ λ_S)

The denominator excludes safe faults — they don't matter for the metric. The numerator-of-the-bad-fraction is the part of the dangerous rate that can directly violate the safety goal without a second failure being involved. ISO 26262 Part 5 Table 5 gives the per-ASIL targets:

ASIL	SPFM target
D	≥ 99%
C	≥ 97%
B	≥ 90%
A	No quantitative requirement

Plugging the AEB MCU bucket totals from Step 1 in:

SPFM = 1 − (0 + 1.99) / (100 − 3)
     = 1 − 1.99 / 97
     = 1 − 0.0205
     = 0.9795 = 97.95%

This MCU passes ASIL C (≥ 97%) but fails ASIL D (≥ 99%) by about a percentage point. The conversation with the design team is exactly the conversation the bucket breakdown teed up: half the residual budget is consumed by RAM double-bit errors with no detection. Either the ECC needs upgrading from SECDED (single-error-correct, double-error-detect — only 99% on single-bit) to SECDED with DUE-trap-to-safe (which moves the 1 FIT into λ_MPF,detected), or the safety goal has to be met with a different SPFM-friendly mitigation. The metric forces the question; it doesn't suggest the answer.

Why SPFM looks pass-able and isn't For a 100 FIT total component, ASIL D's 99% SPFM target leaves 1 FIT × 1% = a residual budget of about 1 FIT across all failure modes combined. Sounds like plenty until you tally the residuals: every partially-covered failure mode at 99% DC contributes 1% × λ to the residual, which adds up fast across a typical MCU's 6–10 distinct failure-mode classes. Reaching 99% SPFM on a complex component routinely requires multiple diagnostics layered: ECC + lockstep + watchdog + voltage monitoring, with the residuals being whatever leaks past all of them. This is the structural reason ASIL D MCUs cost more than ASIL B ones — not the components themselves but the diagnostic infrastructure required to push residual fault rates below the SPFM threshold.

Step 3LFM — Latent Fault Metric

LFM measures the fraction of multi-point faults that are detected by a diagnostic, recovered, or otherwise prevented from sitting silently in the system waiting for a second fault. Latent faults are the dangerous category here: a stuck-OK watchdog, an ECC mechanism that has degraded, a redundant channel that has failed but not been noticed. The system looks healthy until the second fault hits.

LFM = 1 − (Σ λ_MPF,latent) / (Σ λ_total − Σ λ_S − Σ λ_SPF − Σ λ_RF)

The denominator is the multi-point-fault population only — i.e. faults that aren't safe and aren't single-point. The numerator is the latent (undetected) subset. ISO 26262 Part 5 Table 6 thresholds:

ASIL	LFM target
D	≥ 90%
C	≥ 80%
B	≥ 60%
A	No quantitative requirement

For the AEB MCU:

LFM = 1 − 2.5 / (100 − 3 − 0 − 1.99)
    = 1 − 2.5 / 95.01
    = 1 − 0.0263
    = 0.9737 = 97.37%

This passes ASIL D (≥ 90%) comfortably. The MCU's only meaningful latent is the lockstep-monitor stuck-OK fault (2.5 FIT undetected), and even that is a small fraction of the multi-point pool dominated by the covered core and RAM faults.

Two structural points about LFM that change how design teams think about it:

LFM is dominated by the diagnostics on diagnostics. The lockstep-monitor self-test in our example is the LFM-relevant mechanism — it's not protecting against the original core fault, it's protecting against the lockstep monitor itself failing silently. Every safety mechanism in an ASIL D component needs its own diagnostic, otherwise it sits in λ_MPF,latent at full rate. This is the structural reason ASIL D safety-mechanism design produces nested verification: every monitor has a meta-monitor, recursively, until the rates are low enough.
LFM uses a different denominator from SPFM. The set of faults that count in each metric is different, which means a high LFM doesn't help SPFM and vice-versa. Teams that compute one from the other — "we have 99% DC on the watchdog, so LFM and SPFM should both be 99%" — miss the structural difference and produce wrong numbers. Always compute both from the bucket totals separately.

What the two metrics together force the architecture to do SPFM forces: every dangerous fault mode is covered by a diagnostic with high coverage. LFM forces: every diagnostic is itself covered by a meta-diagnostic. The two together say "build redundancy AND verify the redundancy works". A design that meets both is the ISO 26262 hardware-architectural-metrics-pass; a design that meets one and not the other has a structural weakness that PMHF won't catch on its own. The next step's PMHF metric is the integration check, not a substitute for either of these.

Step 4PMHF — Probabilistic Metric for Hardware Failures

PMHF is the rate per hour at which the safety goal is violated due to random hardware failures, averaged over the operational lifetime of the vehicle. Where SPFM and LFM are dimensionless ratios, PMHF is a rate — comparable directly against numerical targets without further interpretation. The Part 5 Table 6 thresholds:

ASIL	PMHF target
D	≤ 10⁻⁸/h (10 FIT)
C	≤ 10⁻⁷/h (100 FIT)
B	≤ 10⁻⁷/h (100 FIT)
A	No quantitative requirement

For a single-channel architecture with safety mechanisms — which is what our AEB MCU is, considered standalone — the PMHF contribution decomposes as:

PMHF ≈ Σ λ_RF                                      ← residual fraction of single-point faults
     + Σ λ_MPF,latent · λ_partner · T_life / 2     ← latent + partner during lifetime

The residual term contributes directly — every uncovered fraction of a single-point fault adds to the rate at which the safety goal is violated. The latent term is the probabilistic combination of "the latent fault has already occurred" with "the partner fault then occurs during the remaining lifetime"; the factor of T_life/2 is the average exposure window for a uniformly-distributed second-fault arrival. ISO 26262 Part 5 §9.4.2.3 gives the full decomposition with refinements for proof-test intervals and detected multi-point faults; the simplified form above captures >95% of the contribution for typical automotive mission profiles.

Plugging the AEB MCU buckets in, with vehicle lifetime T_life = 10,000 h and the latent's partner-fault rate ≈ 70 FIT (the cores and RAM the lockstep monitor protects):

Residual term:  Σ λ_RF     = 1.99 FIT = 1.99×10⁻⁹ /h
Latent term:    2.5×10⁻⁹ · 70×10⁻⁹ · 10,000 / 2
              ≈ 8.75×10⁻¹³ /h     ← negligible

PMHF ≈ 1.99×10⁻⁹ + 8.75×10⁻¹³ ≈ 2.0×10⁻⁹ /h

2×10⁻⁹/h vs an ASIL D target of 10⁻⁸/h. Passes ASIL D with a 5× margin. The latent contribution is four orders of magnitude smaller than the residual contribution — an extreme version of the general pattern, where for component-level PMHF the residual fault rate is what matters and the latent term is a rounding error unless λ_partner × T_life is unusually large.

PMHF is the easy metric to pass and the easy metric to point at Component-level PMHF for a typical ASIL D design comes out around 10⁻⁹/h or below — comfortably under the 10⁻⁸ target. Teams under deadline pressure compute PMHF, see the margin, and report "PMHF: pass, ASIL D met". The reviewer asks for SPFM and LFM next, and the architecture often reveals a 96-97% SPFM that was missed because everyone was looking at the rate. The order in which to compute the metrics is: SPFM first (most likely to fail), then LFM, then PMHF as a sanity-check. Reversing the order is the most common audit finding in ASIL D safety cases.

Step 5Unified verdict on the AEB MCU

The three metrics evaluated against ASIL D targets, side by side:

Metric	Computed	ASIL D target	Verdict
SPFM	97.95%	≥ 99%	FAIL by 1.05 pp
LFM	97.37%	≥ 90%	PASS by 7.4 pp
PMHF	2.0×10⁻⁹/h	≤ 10⁻⁸/h	PASS with 5× margin

The MCU does not meet ASIL D. It meets ASIL C on all three, comfortably. The single failing metric is SPFM, and the single dominant contributor inside SPFM is the RAM double-bit residual at 1 FIT. Three credible design responses:

Upgrade ECC to detect and safely handle double-bit errors. SECDED with DUE-trap-to-safe (the double-bit error triggers a controlled reset to safe state) moves the 1 FIT from λ_RF into λ_MPF,detected. New SPFM: 1 − 0.99/97 = 98.98%. Still fails ASIL D, but only by 0.02 pp — close enough that further small improvements (e.g. tightening the 95% clock-drift DC to 99%) push it over.
Re-architect to eliminate the high-residual component. A different memory controller with intrinsically lower failure rate, or a different memory technology (FRAM, MRAM) where the dominant failure mode isn't bit-flips. Expensive but cleanly removes the gating constraint.
Apply ASIL decomposition. The MCU stays at ASIL C; the safety goal moves to ASIL B(D) + ASIL B(D) at the system level (cf. Article 6), and the MCU's per-channel ASIL drops to B. SPFM target relaxes to 90%, which the MCU clears comfortably. The decomposition has its own independence-and-CCF cost, but at the MCU level the metric pressure dissolves.

Notice what didn't make the list: "improve the safety mechanisms we already have". The SPFM shortfall is concentrated in a fault mode that has no diagnostic at all (RAM double-bit). Tightening lockstep coverage from 99% to 99.5% wouldn't shift the answer — it improves a residual that's already small. The metric structure tells the design team where to look; the FMEDA tells them what to fix; the decision is which of the three responses fits the cost / schedule / risk envelope.

The other thing the table makes obvious: this MCU passes PMHF and LFM with substantial margin and fails SPFM by a hair. A team that had only computed PMHF would have shipped this for ASIL D, and the audit would have caught it. The three-metric structure is what prevents the architecture from passing ISO 26262's hardware integrity claim with a fault profile that the standard's authors knew was dangerous. Compute all three. In that order. Every time.

The right order to compute, and the right conversation per metric SPFM first. If it fails, fix the residual fault distribution before doing anything else. Conversation: "which fault modes have inadequate diagnostic coverage, and what would cover them?". LFM second. If it fails, add diagnostics-on-diagnostics. Conversation: "which safety mechanisms can fail silently, and what catches their failure?". PMHF third as the integration check. If SPFM and LFM pass, PMHF will almost always pass too — and if it doesn't, the issue is usually a catastrophic-rate component that needs replacing rather than the diagnostic structure. The metrics are not interchangeable; they are sequenced.

Five pitfalls a reviewer will catch

Over-claiming "safe" faults. A failure mode is in λ_S only if it provably cannot violate the safety goal across the operational profile — temperature range, voltage range, all input patterns, all internal states. The fail-safe direction has to be proven, not asserted. Reviewers' standard question: "show me the fault-injection test that confirms this fault mode produces a safe state in the worst-case operational corner". Failing to defend a λ_S claim shrinks the safe pool, which moves rate into λ_SPF or λ_RF and tanks SPFM.
Datasheet DC values used directly. Vendor datasheets publish coverage figures ("ECC catches 99.9% of single-bit errors") under specific assumptions: nominal access pattern, room temperature, typical refresh rate. The actual coverage in your system depends on how the part is used. ISO 26262 Part 5 §B.3 requires DC values to come from FMEDA validation — typically a structured fault-injection campaign (the SAE J3187 framework, or Synopsys/Cadence simulation tools) — not from datasheets. Reviewers ask for the FMEDA validation report; "vendor says 99%" doesn't survive that question.
Diagnostic test interval forgotten. A watchdog with a 100 ms timeout catches faults within 100 ms. A power-on self-test catches faults at restart only. ISO 26262 Part 5 §B.4.2 reduces the effective DC by the fraction of the operational time the diagnostic isn't actively running — which for periodic diagnostics is meaningful. A "99% DC" diagnostic that only runs once a drive cycle has effective DC under 50% in steady-state operation. Compute test interval explicitly; it shows up as part of the FMEDA.
Component-level pass ≠ safety-goal-level pass. SPFM, LFM and PMHF are computed per component. The safety goal is enforced by an architecture composed of many components, and ISO 26262 Part 5 §9.4.2.5 requires the metrics to be aggregated up to the safety-goal level. A radar ECU passing all three metrics for ASIL D and a fusion ECU also passing for ASIL D doesn't guarantee that the combined system meets ASIL D — common-cause failures, integration faults, and propagation paths between components can degrade the architecture-level metric. The component-level pass is necessary, not sufficient.
DC values not re-validated when the operational profile changes. A DC measured at 25 °C, 5.0 V, nominal access pattern doesn't apply at −40 °C, 4.5 V, or atypical access patterns. If the vehicle's operational profile expands (a radar from passenger-car validation now used in a commercial-truck programme), the FMEDA needs re-running. The most common change-impact-analysis miss in production is reusing DC values across programmes without re-validation; reviewers spot it by asking for the operational-profile envelope referenced in the FMEDA.

Where to go next

Run the FMEDA on your own component. Open FTA Studio — the Failure Rate Database (Enterprise) ships with FIDES, IEC 62380 and Siemens SN 29500 reliability data feeding directly into the FMEDA template. The bucket totals come out as automatically as the metrics.
Cross-check against ASIL decomposition. Article 6 covers the architecture-level alternative: when component-level ASIL D is too expensive, drop to ASIL B via decomposition and the per-component metric targets relax accordingly.
For the CCF interaction, Article 5 covers β-factor and MGL — relevant because PMHF for a multi-channel architecture has an explicit β·λ term that often dominates the answer at the architecture level.
For the underlying λ values, our failure-rate reference page covers the typical FIT ranges for automotive components by industry source. Use validated data, not handbook numbers from outside your domain.