Home / Guides / Build a fault tree — SPAD worked example

Foundations · Worked example

How to build a fault tree from scratch — a worked SPAD example

Most FTA tutorials show you a finished tree and explain the symbols. That's backwards. The hard part isn't the notation — it's getting from a vague hazard ("the train shouldn't run a red light") to a top event you can actually quantify, then decomposing it without inventing branches that don't exist. This guide walks the full workflow on a real EN 50126 case: a railway Signal Passed at Danger (SPAD).

≈ 18 min read Worked tree: rail / SPAD Standards: IEC 61025, EN 50126

What we'll build, and why this example

A SPAD — Signal Passed at Danger — is when a train crosses a stop-aspect signal without authority. It's the dominant precursor to head-on and rear-end collisions on conventionally signalled railways, which is why every infrastructure manager in Europe tracks it as a Common Safety Indicator under the CSM-RA. The hazard is concrete, the failure modes are well-understood across signalling, on-board ATP and human factors, and the outcome is severe enough that the analysis has to be defensible to a regulator. That makes it a good first tree: rich enough to exercise OR / AND composition and importance measures, but small enough to fit on one page.

By the end of this guide we'll have:

A precisely scoped top event (not "trains shouldn't crash" — something a regulator can sign off).
A three-branch structural decomposition by independent failure pathway.
Nine basic events with documented failure rates or per-demand probabilities.
The minimal cut sets from MOCUS, ranked by contribution.
A top-event probability we can defend, plus a frank discussion of where the model is weakest.
An importance ranking that tells the design team where the next pound of risk-reduction effort should go.

The finished tree is one of the eight templates that ship with FTA Studio. You can open it directly in the editor at any point — the live, interactive version is embedded further down — but you don't need to. Every number in this article is in the text.

Step 1Define the top event

This is the step everyone skips and everyone regrets later. The top event is a Boolean — at any given instant it is either true or false — so it has to be phrased in a way that, in principle, you could check. "Trains are unsafe" can't be checked. "Train passes a stop-aspect signal without movement authority" can.

Three properties to test the wording against:

Observable. A reasonable observer with access to the right data could decide whether it occurred. SPAD passes — there's a track-circuit interruption past a red signal, the signaller knows immediately. "Driver was inattentive" fails — inattention is a state of mind.
Single, specific. If the top event collapses two distinct hazards ("collision OR derailment"), you'll spend the rest of the analysis untangling them. Pick one. SPAD is one event; the downstream collision is a separate analysis (an event tree or a separate fault tree with SPAD as a basic event).
Independent of mitigation you haven't designed yet. Don't write "SPAD that the ATP fails to catch" as the top event — the ATP is part of the system you're analysing, not a precondition. Put it inside the tree as a branch.

Our top event, to the standard required by an EN 50126 hazard log:

Top event TOP-001 Train proceeds past a stop-aspect (red) signal without a movement authority that supersedes it.
Risk classification (per CENELEC R2A2): consequence Catastrophic; frequency target Improbable (<10⁻⁹/h per train) for a SIL-4 system.

Two phrasing choices worth flagging. We say "stop-aspect (red) signal" not "red signal" because some systems use a stop aspect that isn't red (e.g. a position-light dwarf signal). And we add "without a movement authority that supersedes it" to exclude the legitimate case where the signaller verbally authorises the driver past a failed signal — that's a different procedure with its own controls, not a SPAD.

The frequency target — improbable, <10⁻⁹/h — sets the bar the quantification has to clear. Write it down now so the analysis has something to compare to. If your final number comes back at 10⁻⁵/h, you don't have a model that's "wrong"; you have a system that doesn't meet the target and needs design changes.

Step 2First-level decomposition

The top event is true if and only if a train physically crosses a stop signal. Walk backwards one step: under what conditions does that happen? In a modern signalled railway there are three independent barriers, and the train only gets past the signal if at least one barrier path goes wrong. Each barrier corresponds to one branch of the tree.

G-001 · Signalling System FailureOR

The signalling infrastructure presents a proceed (or no) aspect when a stop aspect was due. The driver and the ATP both believe they have authority. Any one of several signalling components can cause this — hence OR.

G-002 · ATP fails to interveneAND

The signal correctly shows red and the driver hasn't reacted, but the on-board ATP fails to apply the emergency brake. This requires both the ATP logic chain to fail to issue the brake command and the emergency brake itself to fail to stop the train — hence AND.

G-005 · Driver ErrorAND

Driver fails to observe the stop aspect and fails to apply the brakes in time. The "and" matters: a momentary lapse caught and corrected is not a SPAD. Both miss-events have to coincide.

And the top gate is an OR of the three: the train passes the signal if signalling fails, or ATP-with-brake fails, or the driver double-misses. Any one path is enough.

Common pitfall — bottom-up tree-building It is tempting to start by listing every component on a train and asking "how can it fail?" That gives an FMEA, not a fault tree. The deductive method (IEC 61025 §5) goes the other way: start at the top, ask "what immediately causes this?", and only descend when each immediate cause has been named. You stop adding branches when the next level would be invented to make the picture symmetrical, not because reality demands it.

Step 3Drill each branch to basic events

The top event has been resolved into three branches. Now resolve each branch into the basic events that an analyst can actually quantify — components with a published failure rate or a per-demand human-error probability backed by a recognised database.

Signalling branch (G-001, OR)

For the wayside signal to display the wrong aspect, one of three subsystems has to misbehave: the signal head itself (lamp/LED unit), the interlocking logic that drives it, or the cable run between them. These are physically and electrically independent, so they enter as parallel children of the OR.

BE-001 — Signal lamp / LED unit failure. Modelled as a single basic event; in a real submission you'd split it further by failure mode (lamp short, filament open, LED matrix dim) but for first-pass quantification the aggregate λ is enough.
BE-002 — Signal controller logic fault (vital relay or electronic interlocking output).
BE-003 — Cable / wiring fault between interlocking and head.

ATP branch (G-002, AND of two ORs)

This is the structurally interesting branch. The ATP "fails to intervene" only if both the ATP logic fails to command the brake and the emergency brake fails to act. So G-002 is an AND of two sub-gates, each of which is itself an OR over its own components:

G-003 — On-board ATP fails to issue stop command (OR of BE-004 balise reader and BE-005 on-board ATP computer).
G-004 — Emergency brake fails to stop the train (OR of BE-006 brake pipe pressure application and BE-007 compressed-air supply).

The AND between G-003 and G-004 captures exactly the safety-architecture intent: ATP and brakes are two layers of defence, and the train only loses both at once if both fail in the same window.

Driver branch (G-005, AND)

BE-008 — Driver fails to observe the stop aspect (per-demand probability — not a rate).
BE-009 — Driver, having observed the signal, fails to apply the brakes adequately in time (per-demand, conditional on observation).

Treating these as independent is a known simplification — fatigue or distraction degrades both — but it's the conservative default for a first-pass tree. We'll revisit it in Step 6.

The full structure, drawn live in the embedded editor below, is exactly the SPAD template that ships with FTA Studio. Pan, zoom and click any node to see its description, failure mode, detection mechanism and λ/P value. (No data is sent anywhere — the renderer is offline.)

Live tree · pan / zoom / click nodesOpen in new tab ↗

Step 4Attach data to the basic events

Eight basic events, two kinds of data. Hardware components get a failure rate λ (failures per hour, exponential model — constant during the useful-life region of the bathtub curve). The two driver events get a per-demand probability — there is no "rate of driver-not-seeing-things"; there is a probability that, on any given approach to a stop signal, the driver fails to see it.

Indicative numbers, drawn from the typical EN 50126 / EIRENE / IEC TR 61511-3 ranges and aligned with the FTA Studio rail SPAD template. For a real submission you would substitute your own field-validated data and cite the source per basic event.

UID	Basic event	λ (per h) or P (per demand)	Detection / mitigation already in design
`BE-001`	Signal lamp / LED unit failure	λ = 5×10⁻⁵	Lamp proving — dark signal treated as most restrictive aspect by operating rule
`BE-002`	Signal controller logic fault	λ = 2×10⁻⁶	Vital relay cross-checking; double-output proving
`BE-003`	Signal cable / wiring fault	λ = 1×10⁻⁶	Lamp proving detects open / short / earth; cable route protection
`BE-004`	Balise / transponder reader fault	λ = 5×10⁻⁷	Telegram CRC check; two independent readers per train
`BE-005`	On-board ATP computer fault	λ = 3×10⁻⁷	Dual-processor comparison; watchdog; loss-of-output triggers service brake
`BE-006`	Brake pipe pressure application failure	λ = 2×10⁻⁷	Fail-safe pneumatic design — brakes apply on pressure loss
`BE-007`	Compressed-air supply failure	λ = 1×10⁻⁷	Low-reservoir pressure inhibits departure
`BE-008`	Driver fails to observe stop aspect	P = 1×10⁻³	Driver Reminder System (DRS); SPAD indicator boards; vigilance device
`BE-009`	Driver fails to apply brakes in time	P = 1×10⁻⁴	ATP back-up provides automatic intervention; reduced approach speed

The "detection / mitigation already in design" column is not decoration. It's where most analysts go wrong on the second pass — see the next step.

To turn rates into the per-year probabilities the OR/AND arithmetic needs, use mission time T = 8760 h and the exponential reliability model: P = 1 − exp(−λT) ≈ λT when λT is small. For BE-001 the small-λT approximation breaks down (λT = 0.438), so we use the exact form: P = 1 − exp(−0.438) ≈ 0.354. For every other hardware basic event, λT < 0.02 and the linear approximation is accurate to better than 1%.

Why these particular numbers — and a candid warning The λ for a signal lamp at 5×10⁻⁵/h includes all lamp failures, not just wrong-side ones. Most lamp failures produce a dark signal which, by operating rule, is treated as the most restrictive aspect — driver stops, no SPAD. If you propagate the raw λ straight up the tree, you'll over-count the contribution of the signalling branch by roughly two orders of magnitude. We'll quantify the tree as-built first (Step 5), expose this exact distortion in the importance ranking (Step 6), and then come back to fix it. Walking through the wrong answer once is the fastest way to learn why the correction matters.

Step 5Qualitative analysis — minimal cut sets

Before any numbers, the tree should be reduced to its minimal cut sets: the smallest combinations of basic events whose joint occurrence makes the top event true. The qualitative result tells you the structure of the failure space — how many single points of failure exist, where the redundancies actually live, and which combinations the design depends on. The standard algorithm is MOCUS (Method for Obtaining Cut Sets), which substitutes children into parents top-down and applies absorption.

Top-down substitution for our tree:

TOP = G-001 OR G-002 OR G-005
Expand G-001 (OR): BE-001 OR BE-002 OR BE-003 — three single-event cuts.
Expand G-002 (AND): G-003 · G-004. Then expand each: (BE-004 OR BE-005) · (BE-006 OR BE-007). Distribute the AND over OR — four two-event cuts.
Expand G-005 (AND): BE-008 · BE-009 — one two-event cut.

No cut is a superset of another, so absorption removes nothing. Eight minimal cut sets in total:

#	Cut set	Order	Branch
1	`{BE-001}`	1	Signalling — lamp
2	`{BE-002}`	1	Signalling — controller
3	`{BE-003}`	1	Signalling — cable
4	`{BE-004, BE-006}`	2	ATP reader · brake pipe
5	`{BE-004, BE-007}`	2	ATP reader · air supply
6	`{BE-005, BE-006}`	2	ATP computer · brake pipe
7	`{BE-005, BE-007}`	2	ATP computer · air supply
8	`{BE-008, BE-009}`	2	Driver double-miss

Three structural facts fall out of this table immediately, before a single probability is multiplied:

Three single points of failure. Cuts 1, 2 and 3 are all order-1 — any single signalling failure can cause a SPAD on its own. That is the design weakness the lamp-proving rule and vital-relay logic are there to compensate for. If those mitigations don't actually reduce the wrong-side rate, no amount of ATP redundancy will save the model.
The ATP layer genuinely buys two-fold defence. Cuts 4–7 are all order-2, and the two events in each cut belong to physically separate subsystems (on-board electronics vs pneumatics). That is what the AND in G-002 was supposed to encode, and the cut sets confirm it.
The driver path is order-2 and entirely human. The two human-error events sit in the same person at the same moment, which is the classical place to suspect dependence — we'll revisit this below.

What MOCUS just gave you for free A list of every single-point-of-failure in the design, ranked by order. Even if you stop here and never quantify, the order-1 cuts are the conversation to have first with the design team — adding a redundant ATP channel does nothing about them.

Step 6Quantitative evaluation and importance

Each cut set has a probability — for an AND cut, the product of its event probabilities (assuming independence); for a single-event cut, the event's probability directly. The top-event probability is the probability of at least one cut set occurring. With small cut-set probabilities, the rare-event approximation P(TOP) ≈ Σ P(cut) is accurate to four or five decimal places.

As-built quantification

Plug in the per-year probabilities from Step 4 (P = 1 − exp(−λT), T = 8760 h):

#	Cut set	P(cut) per year
1	`{BE-001}`	3.54×10⁻¹
2	`{BE-002}`	1.74×10⁻²
3	`{BE-003}`	8.72×10⁻³
4	`{BE-004, BE-006}`	7.65×10⁻⁶
5	`{BE-004, BE-007}`	3.83×10⁻⁶
6	`{BE-005, BE-006}`	4.59×10⁻⁶
7	`{BE-005, BE-007}`	2.30×10⁻⁶
8	`{BE-008, BE-009}`	1.00×10⁻⁷
Σ (top-event probability per train per year)		≈ 3.80×10⁻¹

Divide through by mission time to recover a rate: 3.80×10⁻¹ / 8760 ≈ 4.3×10⁻⁵ SPADs per train per hour. The target was 10⁻⁹/h. We're four orders of magnitude over, and a single cut — {BE-001}, the signal lamp — is responsible for 93% of the answer.

Two things can be true at this point: the model is wrong, or the system is unsafe. Both deserve checking, but the order-of-magnitude size of the gap is the tell. Real signalled railways are nowhere near 10⁻⁵/h SPAD rates per train. The model is the problem.

The wrong-side correction

The lamp's 5×10⁻⁵/h failure rate is for any failure, but only wrong-side failures (lamp shows proceed when stop was commanded) actually cause a SPAD. The lamp-proving circuit detects open / short / dark conditions and the operating rule treats a dark signal as most restrictive — so a dark lamp produces a stopped train, not a SPAD. The same argument applies to the controller logic (vital relay cross-checking catches most logic failures before they reach the head) and to the cable (lamp proving detects open and short faults).

Industry experience for vital signalling is a wrong-side fraction of roughly 1% — the rate at which failures escape the proving and produce a proceed-when-stop indication. Apply 1% to BE-001..BE-003 and re-quantify:

#	Cut set	P(cut) per year — corrected
1	`{BE-001}` wrong-side	4.37×10⁻³
2	`{BE-002}` wrong-side	1.75×10⁻⁴
3	`{BE-003}` wrong-side	8.76×10⁻⁵
4–7	ATP × brake combinations	1.84×10⁻⁵ (sum)
8	`{BE-008, BE-009}`	1.00×10⁻⁷
Σ (top-event probability per train per year)		≈ 4.65×10⁻³

Per hour: ≈ 5.3×10⁻⁷/h. Still 500× above target — but now the model is plausibly right, and the residual gap is a system question, not an arithmetic one. Closing it would mean modelling per-demand signal approaches (a train doesn't approach a stop signal continuously for 8760 h), ATP coverage fraction, and downstream barriers like track circuits and signaller intervention. Those refinements are out of scope for a first-pass tree but they're now well-defined questions, which is the point.

Importance measures — where the next pound goes

The Fussell-Vesely importance of a basic event is the fraction of the top-event probability attributable to cut sets containing that event. Computed against the corrected numbers:

Basic event	F-V importance	Implication
`BE-001` (lamp wrong-side)	≈ 94%	Single dominant contributor. LED arrays with per-emitter monitoring, or independent secondary lamp confirmation, would directly attack this.
`BE-002` (controller wrong-side)	≈ 3.8%	Already mitigated by vital-relay cross-checking. Diminishing returns.
`BE-003` (cable wrong-side)	≈ 1.9%	Lamp proving covers the dominant failure modes.
`BE-004..007` (ATP layer)	≈ 0.4% combined	Adding more ATP redundancy buys negligible risk reduction at this point. The ATP is already over-engineered relative to the binding constraint.
`BE-008`, `BE-009` (driver)	< 0.01%	Driver path is dominated by other layers. DRS and SPAD indicator boards are already adequate at this level of resolution.

The rank order is the actionable output. It tells the design team that the next round of risk-reduction effort should target the wayside lamp's wrong-side failure mode — independent confirmation of aspect, redundant emitters, faster proving — and that an extra ATP channel would be money spent on the wrong barrier. That is the kind of statement a fault tree exists to make.

Sanity-check: did the model agree with itself? The corrected per-train-per-year SPAD probability of 4.65×10⁻³ is in the right ballpark for the per-train-per-year SPAD rates published by European infrastructure managers (ORR data for GB rail, around 3×10⁻³ per train per year). That coincidence isn't proof — the underlying populations are different — but it's the kind of independent reality check that belongs in the safety case alongside the calculation.

What this guide deliberately left out

A first-pass tree is a deliberate simplification. The four most consequential things this one ignored, in roughly the order an EN 50126 reviewer would raise them:

Mission time as a per-hour proxy. A train doesn't approach a stop signal continuously for 8760 h; it makes some number of approaches per year. Re-cast as a per-demand calculation (signal failures per approach × approaches per year) and the signalling branch shrinks again. We left this out because it doesn't change the structure or the importance ranking — it changes the absolute number, which is the easiest thing to fix later.
ATP coverage fraction. The model assumes ATP is active on every signal. On most networks this is <100% — legacy track sections, fitment lag, on-board failures. The honest treatment is an INHIBIT gate above G-002 conditioned on "ATP active"; IEC 61025 §A.5 covers the symbol.
Dependence between BE-008 and BE-009. Fatigue, distraction or sighting issues are common causes of both driver failures. Treating them as independent (P = 10⁻⁷) is conservative-looking but actually under-counts the risk. The correct treatment is a β-factor common-cause link between the two events; with β = 0.1 the driver cut becomes ≈10⁻⁴ rather than 10⁻⁷ and rises in the importance ranking, though it remains dominated by signalling.
Repair and downtime. Constant-λ exponential modelling assumes faults persist until something fails. In practice, lamp proving alarms produce a maintenance call within hours. A Markov or fault-tree-with-repair model would propagate mean-time-to-repair and reduce P(BE-001) accordingly. This is the single biggest legitimate refinement available.

None of these change the qualitative shape of the answer — signalling wrong-side dominates, ATP is over-engineered relative to it, the driver path is structurally weak but quantitatively small. They tighten the absolute number. That's the right order to do refinement: get the structure right, get the importance ranking right, then chase decimals.

Where to go next

You've got a defendable first-pass tree, an importance ranking that points at the lamp wrong-side rate, and a list of four refinements that would tighten the number. Three things you might do from here:

Open the tree in the editor via the embed above (or launch FTA Studio and pick the rail SPAD template) to add hazard-register entries, run the built-in MOCUS, and export an IEC 61025-format JSON.
Add Monte Carlo uncertainty — every λ in this article is a point estimate. Lognormal distributions on each leaf with the same medians give you a 5th–95th percentile band on P(TOP), which is what most regulators now expect. Our browser Monte Carlo tool runs the calculation without any data leaving your machine.
Repeat the process for your own top event. The structural workflow — observable-Boolean top event, deductive decomposition by independent barriers, basic-event data with explicit detection/mitigation columns, MOCUS first then quantification, importance ranking before chasing absolute numbers — is identical for an automotive HARA, an aerospace SSA, or a process IPL credit claim. The standards differ; the workflow doesn't.

How to build a fault tree from scratch — a worked SPAD example

What we'll build, and why this example

Step 1Define the top event

Step 2First-level decomposition

G-001 · Signalling System FailureOR

G-002 · ATP fails to interveneAND

G-005 · Driver ErrorAND

Step 3Drill each branch to basic events

Signalling branch (G-001, OR)

ATP branch (G-002, AND of two ORs)

Driver branch (G-005, AND)

Step 4Attach data to the basic events

Step 5Qualitative analysis — minimal cut sets

Step 6Quantitative evaluation and importance

As-built quantification

The wrong-side correction

Importance measures — where the next pound goes

What this guide deliberately left out

Where to go next

Continue