Home / Guides / Importance measures — when to use each

Importance measures · Comparative

Birnbaum, Fussell-Vesely, RAW, RRW — when to use each

Once you've got cut sets and a top-event probability, the next question is always the same: which basic event matters most? Four importance measures claim to answer that. They give different answers — sometimes wildly different — and the four are not interchangeable. Picking the wrong one for the question you're trying to settle is a common, expensive mistake. This guide defines all four, computes them side-by-side on a real tree, shows where they disagree, and maps each to the standard that asks for it.

≈ 16 min read Worked tree: rail / SPAD (Article 1) Standards: IEC 61025, ISO 26262, IEC 61508, NRC PRA

Why we need importance measures

The point of quantifying a fault tree isn't the top-event probability — that's just one number, and on its own it tells you whether the system meets target or doesn't. The actionable output is a rank order of basic events, ordered by how much they're driving the answer. The rank tells the design team where to spend the next pound: which component to make more reliable, which redundancy to add, which monitoring to tighten. Without a rank, FTA stops at "we passed" or "we failed" and the engineering judgement gets done somewhere else.

Importance is intuitively obvious — but the moment you try to formalise it, you find there are at least four reasonable formalisations and they're measuring different things:

Sensitivity: if I tweak this event's probability, how much does the top move?
Contribution: how much of the current top probability runs through cut sets containing this event?
Worst-case: if this event becomes certain, how much worse does the top get?
Best-case: if this event becomes impossible, how much better does the top get?

Birnbaum, Fussell-Vesely, RAW and RRW operationalise these four questions respectively. They will agree on the broad shape — events in the dominant cut sets always rank high — but they disagree at the margin, and the margin is where most design decisions live.

The example we'll use throughout The same SPAD fault tree built from scratch in Article 1. Eight basic events, eight minimal cut sets, top-event probability ≈ 4.65×10⁻³ per train per year (after the wrong-side correction). If you haven't read Article 1, the short version: BE-001 is a wayside signal lamp wrong-side failure; BE-004..BE-007 are the ATP and emergency-brake basic events behind a 2-of-2 AND defence; BE-008 and BE-009 are driver errors. The numbers and cut sets are tabulated again where they're needed below — you don't have to refer back.

Step 1The four measures, defined

All four measures take the same inputs — the fault tree's Boolean structure and a probability per basic event — and produce a number per basic event. They differ in what that number means, which is what determines whether the rank order is the right one to act on.

Birnbaum (B_i)

B_i = ∂P(TOP)/∂P_i = P(TOP|i=1) − P(TOP|i=0)

Asks: how structurally sensitive is the top event to this event's probability? Independent of how likely the event actually is — it's the slope of the top probability with respect to the i-th basic-event probability, evaluated at the current operating point.

Fussell-Vesely (FV_i)

FV_i = P(at least one min-cut containing i is true) / P(TOP)

Asks: what fraction of the current top-event probability runs through cut sets that contain this event? Bounded between 0 and 1. The natural reading is "share of the answer attributable to this event."

Risk Achievement Worth (RAW_i)

RAW_i = P(TOP | P_i=1) / P(TOP)

Asks: if this event were certain, how many times worse would the top become? A multiplier ≥ 1. Big RAW means losing this event would be catastrophic — it identifies the events whose continued reliability the answer depends on.

Risk Reduction Worth (RRW_i)

RRW_i = P(TOP) / P(TOP | P_i=0)

Asks: if this event were impossible, how many times better would the top become? A multiplier ≥ 1. Big RRW means perfecting this event would buy the most risk reduction — it identifies the candidates for upgrade investment.

Two structural facts before any numbers. First, Birnbaum is the only one of the four that doesn't depend on the basic event's own probability — it asks a purely topological question about the tree. The other three are weighted by how likely the event actually is, and that's why they can disagree with Birnbaum on small-probability events sitting in dominant cut sets. Second, RAW and RRW are mirror images of each other, but they're not symmetric: RAW pushes events toward certain failure (asking what we're protected against), RRW pushes them toward perfection (asking what we'd gain). They give different rank orders precisely because real systems have asymmetric headroom in each direction.

One more piece of housekeeping: there's an algebraic identity between F-V and Birnbaum, namely FV_i = B_i · P_i / P(TOP). F-V is Birnbaum scaled by the event's own probability and normalised by the top. That's why F-V "downweights" structurally important but practically rare events relative to Birnbaum — exactly the difference we'll see in the SPAD numbers in Step 2.

Step 2Compute all four on the SPAD tree

For self-containment, here is the SPAD tree's input data — eight minimal cut sets and their per-train-per-year probabilities, taken straight from Article 1's wrong-side-corrected quantification:

#	Cut	P(cut) /yr
1	`{BE-001}`	4.37×10⁻³
2	`{BE-002}`	1.75×10⁻⁴
3	`{BE-003}`	8.76×10⁻⁵
4	`{BE-004, BE-006}`	7.65×10⁻⁶
5	`{BE-004, BE-007}`	3.83×10⁻⁶
6	`{BE-005, BE-006}`	4.59×10⁻⁶
7	`{BE-005, BE-007}`	2.30×10⁻⁶
8	`{BE-008, BE-009}`	1.00×10⁻⁷
P(TOP) ≈ Σ P(cut)		4.65×10⁻³

One event computed in detail

Take BE-006 (brake-pipe pressure application failure, P = 1.75×10⁻³, in cuts 4 and 6). The four measures, longhand:

Birnbaum. B = P(TOP|BE-006=1) − P(TOP|BE-006=0). With BE-006 forced true, cuts 4 and 6 collapse to single-events {BE-004} and {BE-005}: P(TOP|=1) ≈ 4.37e-3 + 1.75e-4 + 8.76e-5 + 4.37e-3 + 2.62e-3 + 3.83e-6 + 2.30e-6 + 1e-7 ≈ 1.16e-2. With BE-006 forced false, cuts 4 and 6 vanish: P(TOP|=0) ≈ 4.64e-3. B(BE-006) ≈ 6.99×10⁻³. Equivalently — and faster — sum the partner-event probabilities across the cuts containing BE-006: P(BE-004) + P(BE-005) = 4.37e-3 + 2.62e-3 = 6.99e-3. The shortcut works in the rare-event regime.
Fussell-Vesely. FV = (P(cut 4) + P(cut 6)) / P(TOP) = (7.65e-6 + 4.59e-6) / 4.65e-3 ≈ 2.6×10⁻³.
RAW. RAW = P(TOP|=1) / P(TOP) = 1.16e-2 / 4.65e-3 ≈ 2.50. If the brake pipe were always failed, the SPAD risk would rise 2.5×.
RRW. RRW = P(TOP) / P(TOP|=0) = 4.65e-3 / 4.64e-3 ≈ 1.002. If the brake pipe were perfect, the SPAD risk would fall by 0.2%.

The asymmetry is the point: BE-006's worst-case impact (RAW ≈ 2.5) is substantial, but its best-case impact (RRW ≈ 1.002) is essentially nil. That's because BE-006 is sitting behind a 2-of-2 AND barrier — fixing one half of a redundancy buys very little, but losing one half exposes the other.

The full table

The same calculation, applied to every basic event in the tree:

BE	P_i	Birnbaum B_i	F-V_i	RAW_i	RRW_i
`BE-001`	4.37×10⁻³	≈ 1.00	0.940	215	16.6
`BE-002`	1.75×10⁻⁴	≈ 1.00	0.038	215	1.039
`BE-003`	8.76×10⁻⁵	≈ 1.00	0.019	215	1.019
`BE-004`	4.37×10⁻³	2.63×10⁻³	2.5×10⁻³	1.56	1.0022
`BE-005`	2.62×10⁻³	2.63×10⁻³	1.5×10⁻³	1.56	1.0013
`BE-006`	1.75×10⁻³	6.99×10⁻³	2.6×10⁻³	2.50	1.0024
`BE-007`	8.76×10⁻⁴	6.99×10⁻³	1.3×10⁻³	2.50	1.0011
`BE-008`	1.00×10⁻³	1.00×10⁻⁴	2.2×10⁻⁵	1.022	1.00002
`BE-009`	1.00×10⁻⁴	1.00×10⁻³	2.2×10⁻⁵	1.215	1.00002

And the rank order each measure produces, top-to-bottom:

Rank	By Birnbaum	By F-V	By RAW	By RRW
1	`BE-001 = BE-002 = BE-003` (tied)	`BE-001`	`BE-001 = BE-002 = BE-003` (tied)	`BE-001`
2	`BE-006 = BE-007`	`BE-002`	`BE-006 = BE-007`	`BE-002`
3	`BE-004 = BE-005`	`BE-003`	`BE-004 = BE-005`	`BE-003`
4	`BE-009`	`BE-006`	`BE-009`	`BE-006`
5	`BE-008`	`BE-004`	`BE-008`	`BE-004`
6	—	`BE-005`	—	`BE-005`
7	—	`BE-007`	—	`BE-007`
8	—	`BE-008 = BE-009` (tied)	—	`BE-008 = BE-009` (tied)

Three things stand out, and each is the kind of thing a reviewer will ask about:

Birnbaum and RAW give identical rankings here. They will, whenever the events of interest sit in cut sets dominated by their own contribution rather than by other cuts. They diverge in trees with strong inclusion-exclusion corrections; for any rare-event-approximation tree they're effectively the same ranking.
F-V and RRW also give identical rankings. Same reason — both are weighted by P_i, both ask "what fraction of the answer is this event responsible for". They tie when events live in the same cut.
The Birnbaum/RAW rank disagrees with the F-V/RRW rank. Birnbaum and RAW say BE-002 and BE-003 are as important as BE-001 (they're all single points of failure — making any one certain produces a SPAD with probability 1). F-V and RRW say BE-001 swamps them by an order of magnitude or more, because BE-001's actual wrong-side probability is much higher.

Counter-intuitive: BE-009 ranks above BE-008 in Birnbaum / RAW Both are in the same cut {BE-008, BE-009}. But Birnbaum of BE-009 = P(BE-008) = 10⁻³, whereas Birnbaum of BE-008 = P(BE-009) = 10⁻⁴ — the "structural importance" of an event in an AND cut is the probability of its partner, not itself. RAW inherits the same effect. In F-V the two are tied (they share the same cut). This kind of inversion is why people who confuse Birnbaum and F-V can ship a wrong recommendation: focusing reliability budget on BE-009 because its Birnbaum is highest, when in fact reducing either event in the cut buys exactly the same risk reduction.

Step 3Choose by question, not by habit

The question to ask isn't "which importance measure is best?" — none of them dominates the others. The question is "what design or regulatory question am I trying to settle, and which measure answers that?". Four typical questions, with the right measure for each:

The question you're asking	Measure that answers it	Why
Where should I spend reliability budget? — i.e. if I make one component better, where do I get the most lift?	RRW	RRW is exactly the multiplier you get on P(TOP) by perfecting an event. The event with the highest RRW is the cheapest place per unit of risk-reduction (subject to engineering cost, of course). In our SPAD tree, BE-001 wrong-side at RRW = 16.6 says: a 10× improvement on the lamp's wrong-side rate gets you almost a 10× improvement on the whole top.
What's the system's structural weak point? — independent of how likely individual events happen to be today.	Birnbaum	Birnbaum is the sensitivity, not the contribution. It identifies events whose probability swings would move the top dramatically — useful when basic-event probabilities themselves are uncertain or expected to drift. Single-event cuts always rank highest in Birnbaum, regardless of their current probability.
What's currently driving the risk? — the regulator wants to know which cut sets matter now.	F-V	F-V reads as "share of the answer". It's bounded 0..1, so the values are immediately interpretable. NRC PRA submissions, ASME/ANS PRA standards, and most reviewer-facing documents lead with F-V. In the SPAD tree, F-V says BE-001 wrong-side accounts for 94% of the top — the rest of the model is a rounding error.
What's holding the risk down? — i.e. which events am I depending on staying reliable?	RAW	RAW pushes each event to certain failure and reports the multiplier. High RAW flags events whose current good-behaviour is masking risk. In the SPAD tree, BE-001/002/003 each have RAW = 215 — any one of them going wrong-side certain produces a SPAD with probability 1. This is the single-point-of-failure detector.

Two of these questions get conflated in practice. "Where should I spend reliability budget?" and "what's currently driving the risk?" sound similar — RRW and F-V — but they pull in subtly different directions when the same event sits in multiple cut sets at different orders. F-V counts the event's appearance in every cut; RRW counts only the marginal change from setting it to zero. For a tree with strong inclusion-exclusion overlap they can disagree by a factor of two or more. The tie-breaker is what you actually intend to do with the answer: diagnose the existing risk profile (F-V) or plan a specific upgrade (RRW).

The two-measure default for a defensible safety case The pragmatic convention in NRC-style PRAs, IEC 61508 functional-safety submissions, and most ARP 4761 SSAs is to report F-V and RAW together. F-V tells the reader what's contributing today; RAW tells the reader what would happen if today's reliability assumptions failed. Together they bracket the conversation. Birnbaum and RRW are derived quantities the reviewer can compute from the same tree if they want to, and asking why the rankings differ is itself the most informative question to discuss in a review.

Step 4Which standards ask for which

Different standards have different conventions, and "which importance measure does the standard mandate?" is one of those questions where the literal answer ("none — they ask for sensitivity analysis without naming a measure") is less useful than the practical answer ("here's what reviewers expect to see"). The mapping:

Standard / context	Conventional measure(s)	What the reviewer is looking for
ISO 26262 (automotive)	Birnbaum-equivalent (via SPFM, LFM, PMHF metrics); F-V for cut-set ranking	Hardware-architectural-metrics (SPFM, LFM) and PMHF are themselves discrete importance measures over hardware faults — they ask "what fraction of the failure modes is detectable / safe / tolerated", which is structural in the Birnbaum sense. For cut-set-level FTA in support of an ASIL, F-V is the convention.
IEC 61508 / 61511 (functional safety / SIS)	RRW (via Risk Reduction Factor)	The Risk Reduction Factor of a SIF is algebraically RRW for the SIF's basic events relative to the demand. SIL upgrades are framed as "raising the RRF from 100 to 1000", which is a per-event RRW argument.
NRC PRA (Reg Guides 1.174 / 1.177)	F-V and RAW (both required)	The NRC's risk-informed framework explicitly defines categorisation thresholds in F-V and RAW (e.g. "high-safety-significant" if F-V > 0.005 or RAW > 2). This is the cleanest formal use of importance measures in any regulatory regime.
EN 50126 / 50128 / 50129 (rail RAMS)	F-V (CSM-RA convention); Birnbaum for sensitivity at SIL 3+	The standards mandate sensitivity analysis but don't name a measure. Common Safety Method on Risk Assessment (CSM-RA, EU Regulation 402/2013) submissions lead with F-V; the SIL apportionment process at SIL 3 / 4 typically uses Birnbaum-style sensitivity to show robustness to data uncertainty.
ARP 4761 (aerospace)	F-V for cut-set contribution; Birnbaum for single-failure sensitivity	The SSA's "particular risk" and "common-cause" analyses are F-V driven (ranking cuts by contribution). The single-failure assessment ("show that no single failure causes a catastrophic event") is a Birnbaum / structural argument.
MIL-STD-882E (defence)	Not specified; F-V or RAW typical	The standard talks about risk drivers but leaves the measure to the analyst. Most submissions adopt F-V plus RAW out of NRC-PRA habit.

If you're operating in a domain not represented above and the standard doesn't pin down a measure, default to F-V and RAW reported together (the NRC convention). It survives reviewer scrutiny in every other domain because it answers the two questions every reviewer eventually asks: what's driving the risk now? and what's the system depending on?.

Common misuses

Five mistakes that show up routinely in design reviews. Each is the kind of thing where a reviewer can ship a wrong recommendation if they don't notice — or where an analyst can defend a wrong recommendation by quoting the correct number for the wrong question.

Reporting only F-V. F-V tells the reviewer what's driving risk now. It says nothing about what's holding risk down. An event with current probability of 10⁻⁶ in a single-event cut shows up at F-V ≈ 10⁻⁶ / P(TOP) — invisibly small. But its RAW is huge, because if its probability rose to 1, the top would too. F-V alone misses this entire category. Always pair with RAW.
Reporting only Birnbaum. Birnbaum is structural — it doesn't care about the current probability. An event with Birnbaum = 1 in a single-event cut cannot be improved at the tree level if its probability is already at the floor of physical achievability. Birnbaum highlights it; Birnbaum can't tell you whether spending money there will help.
Confusing importance with cost-effectiveness. The event with the highest RRW is the most leverage-per-unit-improvement, but improvement isn't free. RRW × cost-of-improvement is what you actually want to minimise; reporting RRW alone hides the cost dimension. The right output of an importance analysis is "rank by RRW, then sort by engineering cost", never just "rank by RRW".
Comparing measures across different trees. RAW values aren't comparable between trees with different P(TOP). RAW = 215 in our SPAD tree means "this event going certain produces P(TOP) = 1, which is 215× the current value of 4.65×10⁻³". RAW = 215 in a different tree where P(TOP) is 10⁻⁶ would mean a top probability of 2.15×10⁻⁴ — a very different absolute outcome. F-V is the only one of the four that's bounded 0..1 and therefore tree-comparable as a fraction.
Not tracking importance over time. Importance values are computed against current basic-event probabilities. If your data updates — field reliability gets better than predicted, or a new failure mode is discovered — the rankings can flip. Importance is a snapshot, not a permanent property of the event. The good practice is to recompute and compare on every safety-case revision; an event whose F-V quintupled between releases is the conversation to have.

Where to go next

Compute the measures on your own tree. Open FTA Studio, build or import a tree, and the importance panel reports all four measures per basic event. The cut-set table exports to CSV alongside.
Add Monte Carlo uncertainty bands. Importance values inherit the uncertainty of the leaf probabilities. A point-estimate F-V of 0.94 on a basic event whose probability is itself uncertain by a factor of 3 is much less actionable than the point estimate suggests. Our browser-only Monte Carlo tool propagates lognormal uncertainty on each leaf and gives a 5th–95th percentile band on every importance value.
Read Article 1 if you haven't. The SPAD-from-scratch guide shows how the eight cut sets used here were derived from a top-event statement. The cut sets came from somewhere; the importance numbers are downstream of that derivation.
Match the measure to the standard. For ISO 26262 hardware metrics see the ISO 26262 reference page; for the NRC F-V/RAW threshold convention see Reg Guide 1.174 (external).