Home / Guides / Importance measures — when to use each
Importance measures · Comparative

Birnbaum, Fussell-Vesely, RAW, RRW — when to use each

Once you've got cut sets and a top-event probability, the next question is always the same: which basic event matters most? Four importance measures claim to answer that. They give different answers — sometimes wildly different — and the four are not interchangeable. Picking the wrong one for the question you're trying to settle is a common, expensive mistake. This guide defines all four, computes them side-by-side on a real tree, shows where they disagree, and maps each to the standard that asks for it.

≈ 16 min read Worked tree: rail / SPAD (Article 1) Standards: IEC 61025, ISO 26262, IEC 61508, NRC PRA

Why we need importance measures

The point of quantifying a fault tree isn't the top-event probability — that's just one number, and on its own it tells you whether the system meets target or doesn't. The actionable output is a rank order of basic events, ordered by how much they're driving the answer. The rank tells the design team where to spend the next pound: which component to make more reliable, which redundancy to add, which monitoring to tighten. Without a rank, FTA stops at "we passed" or "we failed" and the engineering judgement gets done somewhere else.

Importance is intuitively obvious — but the moment you try to formalise it, you find there are at least four reasonable formalisations and they're measuring different things:

Birnbaum, Fussell-Vesely, RAW and RRW operationalise these four questions respectively. They will agree on the broad shape — events in the dominant cut sets always rank high — but they disagree at the margin, and the margin is where most design decisions live.

The example we'll use throughout The same SPAD fault tree built from scratch in Article 1. Eight basic events, eight minimal cut sets, top-event probability ≈ 4.65×10⁻³ per train per year (after the wrong-side correction). If you haven't read Article 1, the short version: BE-001 is a wayside signal lamp wrong-side failure; BE-004..BE-007 are the ATP and emergency-brake basic events behind a 2-of-2 AND defence; BE-008 and BE-009 are driver errors. The numbers and cut sets are tabulated again where they're needed below — you don't have to refer back.

Step 1The four measures, defined

All four measures take the same inputs — the fault tree's Boolean structure and a probability per basic event — and produce a number per basic event. They differ in what that number means, which is what determines whether the rank order is the right one to act on.

Birnbaum (Bi)

Bi = ∂P(TOP)/∂Pi = P(TOP|i=1) − P(TOP|i=0)

Asks: how structurally sensitive is the top event to this event's probability? Independent of how likely the event actually is — it's the slope of the top probability with respect to the i-th basic-event probability, evaluated at the current operating point.

Fussell-Vesely (FVi)

FVi = P(at least one min-cut containing i is true) / P(TOP)

Asks: what fraction of the current top-event probability runs through cut sets that contain this event? Bounded between 0 and 1. The natural reading is "share of the answer attributable to this event."

Risk Achievement Worth (RAWi)

RAWi = P(TOP | Pi=1) / P(TOP)

Asks: if this event were certain, how many times worse would the top become? A multiplier ≥ 1. Big RAW means losing this event would be catastrophic — it identifies the events whose continued reliability the answer depends on.

Risk Reduction Worth (RRWi)

RRWi = P(TOP) / P(TOP | Pi=0)

Asks: if this event were impossible, how many times better would the top become? A multiplier ≥ 1. Big RRW means perfecting this event would buy the most risk reduction — it identifies the candidates for upgrade investment.

Two structural facts before any numbers. First, Birnbaum is the only one of the four that doesn't depend on the basic event's own probability — it asks a purely topological question about the tree. The other three are weighted by how likely the event actually is, and that's why they can disagree with Birnbaum on small-probability events sitting in dominant cut sets. Second, RAW and RRW are mirror images of each other, but they're not symmetric: RAW pushes events toward certain failure (asking what we're protected against), RRW pushes them toward perfection (asking what we'd gain). They give different rank orders precisely because real systems have asymmetric headroom in each direction.

One more piece of housekeeping: there's an algebraic identity between F-V and Birnbaum, namely FVi = Bi · Pi / P(TOP). F-V is Birnbaum scaled by the event's own probability and normalised by the top. That's why F-V "downweights" structurally important but practically rare events relative to Birnbaum — exactly the difference we'll see in the SPAD numbers in Step 2.

Step 2Compute all four on the SPAD tree

For self-containment, here is the SPAD tree's input data — eight minimal cut sets and their per-train-per-year probabilities, taken straight from Article 1's wrong-side-corrected quantification:

#CutP(cut) /yr
1{BE-001}4.37×10⁻³
2{BE-002}1.75×10⁻⁴
3{BE-003}8.76×10⁻⁵
4{BE-004, BE-006}7.65×10⁻⁶
5{BE-004, BE-007}3.83×10⁻⁶
6{BE-005, BE-006}4.59×10⁻⁶
7{BE-005, BE-007}2.30×10⁻⁶
8{BE-008, BE-009}1.00×10⁻⁷
P(TOP) ≈ Σ P(cut)4.65×10⁻³

One event computed in detail

Take BE-006 (brake-pipe pressure application failure, P = 1.75×10⁻³, in cuts 4 and 6). The four measures, longhand:

The asymmetry is the point: BE-006's worst-case impact (RAW ≈ 2.5) is substantial, but its best-case impact (RRW ≈ 1.002) is essentially nil. That's because BE-006 is sitting behind a 2-of-2 AND barrier — fixing one half of a redundancy buys very little, but losing one half exposes the other.

The full table

The same calculation, applied to every basic event in the tree:

BEPiBirnbaum BiF-ViRAWiRRWi
BE-0014.37×10⁻³≈ 1.000.94021516.6
BE-0021.75×10⁻⁴≈ 1.000.0382151.039
BE-0038.76×10⁻⁵≈ 1.000.0192151.019
BE-0044.37×10⁻³2.63×10⁻³2.5×10⁻³1.561.0022
BE-0052.62×10⁻³2.63×10⁻³1.5×10⁻³1.561.0013
BE-0061.75×10⁻³6.99×10⁻³2.6×10⁻³2.501.0024
BE-0078.76×10⁻⁴6.99×10⁻³1.3×10⁻³2.501.0011
BE-0081.00×10⁻³1.00×10⁻⁴2.2×10⁻⁵1.0221.00002
BE-0091.00×10⁻⁴1.00×10⁻³2.2×10⁻⁵1.2151.00002

And the rank order each measure produces, top-to-bottom:

RankBy BirnbaumBy F-VBy RAWBy RRW
1BE-001 = BE-002 = BE-003 (tied)BE-001BE-001 = BE-002 = BE-003 (tied)BE-001
2BE-006 = BE-007BE-002BE-006 = BE-007BE-002
3BE-004 = BE-005BE-003BE-004 = BE-005BE-003
4BE-009BE-006BE-009BE-006
5BE-008BE-004BE-008BE-004
6BE-005BE-005
7BE-007BE-007
8BE-008 = BE-009 (tied)BE-008 = BE-009 (tied)

Three things stand out, and each is the kind of thing a reviewer will ask about:

Counter-intuitive: BE-009 ranks above BE-008 in Birnbaum / RAW Both are in the same cut {BE-008, BE-009}. But Birnbaum of BE-009 = P(BE-008) = 10⁻³, whereas Birnbaum of BE-008 = P(BE-009) = 10⁻⁴ — the "structural importance" of an event in an AND cut is the probability of its partner, not itself. RAW inherits the same effect. In F-V the two are tied (they share the same cut). This kind of inversion is why people who confuse Birnbaum and F-V can ship a wrong recommendation: focusing reliability budget on BE-009 because its Birnbaum is highest, when in fact reducing either event in the cut buys exactly the same risk reduction.

Step 3Choose by question, not by habit

The question to ask isn't "which importance measure is best?" — none of them dominates the others. The question is "what design or regulatory question am I trying to settle, and which measure answers that?". Four typical questions, with the right measure for each:

The question you're askingMeasure that answers itWhy
Where should I spend reliability budget? — i.e. if I make one component better, where do I get the most lift? RRW RRW is exactly the multiplier you get on P(TOP) by perfecting an event. The event with the highest RRW is the cheapest place per unit of risk-reduction (subject to engineering cost, of course). In our SPAD tree, BE-001 wrong-side at RRW = 16.6 says: a 10× improvement on the lamp's wrong-side rate gets you almost a 10× improvement on the whole top.
What's the system's structural weak point? — independent of how likely individual events happen to be today. Birnbaum Birnbaum is the sensitivity, not the contribution. It identifies events whose probability swings would move the top dramatically — useful when basic-event probabilities themselves are uncertain or expected to drift. Single-event cuts always rank highest in Birnbaum, regardless of their current probability.
What's currently driving the risk? — the regulator wants to know which cut sets matter now. F-V F-V reads as "share of the answer". It's bounded 0..1, so the values are immediately interpretable. NRC PRA submissions, ASME/ANS PRA standards, and most reviewer-facing documents lead with F-V. In the SPAD tree, F-V says BE-001 wrong-side accounts for 94% of the top — the rest of the model is a rounding error.
What's holding the risk down? — i.e. which events am I depending on staying reliable? RAW RAW pushes each event to certain failure and reports the multiplier. High RAW flags events whose current good-behaviour is masking risk. In the SPAD tree, BE-001/002/003 each have RAW = 215 — any one of them going wrong-side certain produces a SPAD with probability 1. This is the single-point-of-failure detector.

Two of these questions get conflated in practice. "Where should I spend reliability budget?" and "what's currently driving the risk?" sound similar — RRW and F-V — but they pull in subtly different directions when the same event sits in multiple cut sets at different orders. F-V counts the event's appearance in every cut; RRW counts only the marginal change from setting it to zero. For a tree with strong inclusion-exclusion overlap they can disagree by a factor of two or more. The tie-breaker is what you actually intend to do with the answer: diagnose the existing risk profile (F-V) or plan a specific upgrade (RRW).

The two-measure default for a defensible safety case The pragmatic convention in NRC-style PRAs, IEC 61508 functional-safety submissions, and most ARP 4761 SSAs is to report F-V and RAW together. F-V tells the reader what's contributing today; RAW tells the reader what would happen if today's reliability assumptions failed. Together they bracket the conversation. Birnbaum and RRW are derived quantities the reviewer can compute from the same tree if they want to, and asking why the rankings differ is itself the most informative question to discuss in a review.

Step 4Which standards ask for which

Different standards have different conventions, and "which importance measure does the standard mandate?" is one of those questions where the literal answer ("none — they ask for sensitivity analysis without naming a measure") is less useful than the practical answer ("here's what reviewers expect to see"). The mapping:

Standard / contextConventional measure(s)What the reviewer is looking for
ISO 26262 (automotive) Birnbaum-equivalent (via SPFM, LFM, PMHF metrics); F-V for cut-set ranking Hardware-architectural-metrics (SPFM, LFM) and PMHF are themselves discrete importance measures over hardware faults — they ask "what fraction of the failure modes is detectable / safe / tolerated", which is structural in the Birnbaum sense. For cut-set-level FTA in support of an ASIL, F-V is the convention.
IEC 61508 / 61511 (functional safety / SIS) RRW (via Risk Reduction Factor) The Risk Reduction Factor of a SIF is algebraically RRW for the SIF's basic events relative to the demand. SIL upgrades are framed as "raising the RRF from 100 to 1000", which is a per-event RRW argument.
NRC PRA (Reg Guides 1.174 / 1.177) F-V and RAW (both required) The NRC's risk-informed framework explicitly defines categorisation thresholds in F-V and RAW (e.g. "high-safety-significant" if F-V > 0.005 or RAW > 2). This is the cleanest formal use of importance measures in any regulatory regime.
EN 50126 / 50128 / 50129 (rail RAMS) F-V (CSM-RA convention); Birnbaum for sensitivity at SIL 3+ The standards mandate sensitivity analysis but don't name a measure. Common Safety Method on Risk Assessment (CSM-RA, EU Regulation 402/2013) submissions lead with F-V; the SIL apportionment process at SIL 3 / 4 typically uses Birnbaum-style sensitivity to show robustness to data uncertainty.
ARP 4761 (aerospace) F-V for cut-set contribution; Birnbaum for single-failure sensitivity The SSA's "particular risk" and "common-cause" analyses are F-V driven (ranking cuts by contribution). The single-failure assessment ("show that no single failure causes a catastrophic event") is a Birnbaum / structural argument.
MIL-STD-882E (defence) Not specified; F-V or RAW typical The standard talks about risk drivers but leaves the measure to the analyst. Most submissions adopt F-V plus RAW out of NRC-PRA habit.

If you're operating in a domain not represented above and the standard doesn't pin down a measure, default to F-V and RAW reported together (the NRC convention). It survives reviewer scrutiny in every other domain because it answers the two questions every reviewer eventually asks: what's driving the risk now? and what's the system depending on?.

Common misuses

Five mistakes that show up routinely in design reviews. Each is the kind of thing where a reviewer can ship a wrong recommendation if they don't notice — or where an analyst can defend a wrong recommendation by quoting the correct number for the wrong question.

Where to go next