Home / Guides / Build a fault tree — SPAD worked example
Foundations · Worked example

How to build a fault tree from scratch — a worked SPAD example

Most FTA tutorials show you a finished tree and explain the symbols. That's backwards. The hard part isn't the notation — it's getting from a vague hazard ("the train shouldn't run a red light") to a top event you can actually quantify, then decomposing it without inventing branches that don't exist. This guide walks the full workflow on a real EN 50126 case: a railway Signal Passed at Danger (SPAD).

≈ 18 min read Worked tree: rail / SPAD Standards: IEC 61025, EN 50126

What we'll build, and why this example

A SPAD — Signal Passed at Danger — is when a train crosses a stop-aspect signal without authority. It's the dominant precursor to head-on and rear-end collisions on conventionally signalled railways, which is why every infrastructure manager in Europe tracks it as a Common Safety Indicator under the CSM-RA. The hazard is concrete, the failure modes are well-understood across signalling, on-board ATP and human factors, and the outcome is severe enough that the analysis has to be defensible to a regulator. That makes it a good first tree: rich enough to exercise OR / AND composition and importance measures, but small enough to fit on one page.

By the end of this guide we'll have:

The finished tree is one of the eight templates that ship with FTA Studio. You can open it directly in the editor at any point — the live, interactive version is embedded further down — but you don't need to. Every number in this article is in the text.

Step 1Define the top event

This is the step everyone skips and everyone regrets later. The top event is a Boolean — at any given instant it is either true or false — so it has to be phrased in a way that, in principle, you could check. "Trains are unsafe" can't be checked. "Train passes a stop-aspect signal without movement authority" can.

Three properties to test the wording against:

Our top event, to the standard required by an EN 50126 hazard log:

Top event TOP-001 Train proceeds past a stop-aspect (red) signal without a movement authority that supersedes it.
Risk classification (per CENELEC R2A2): consequence Catastrophic; frequency target Improbable (<10⁻⁹/h per train) for a SIL-4 system.

Two phrasing choices worth flagging. We say "stop-aspect (red) signal" not "red signal" because some systems use a stop aspect that isn't red (e.g. a position-light dwarf signal). And we add "without a movement authority that supersedes it" to exclude the legitimate case where the signaller verbally authorises the driver past a failed signal — that's a different procedure with its own controls, not a SPAD.

The frequency target — improbable, <10⁻⁹/h — sets the bar the quantification has to clear. Write it down now so the analysis has something to compare to. If your final number comes back at 10⁻⁵/h, you don't have a model that's "wrong"; you have a system that doesn't meet the target and needs design changes.

Step 2First-level decomposition

The top event is true if and only if a train physically crosses a stop signal. Walk backwards one step: under what conditions does that happen? In a modern signalled railway there are three independent barriers, and the train only gets past the signal if at least one barrier path goes wrong. Each barrier corresponds to one branch of the tree.

G-001 · Signalling System FailureOR

The signalling infrastructure presents a proceed (or no) aspect when a stop aspect was due. The driver and the ATP both believe they have authority. Any one of several signalling components can cause this — hence OR.

G-002 · ATP fails to interveneAND

The signal correctly shows red and the driver hasn't reacted, but the on-board ATP fails to apply the emergency brake. This requires both the ATP logic chain to fail to issue the brake command and the emergency brake itself to fail to stop the train — hence AND.

G-005 · Driver ErrorAND

Driver fails to observe the stop aspect and fails to apply the brakes in time. The "and" matters: a momentary lapse caught and corrected is not a SPAD. Both miss-events have to coincide.

And the top gate is an OR of the three: the train passes the signal if signalling fails, or ATP-with-brake fails, or the driver double-misses. Any one path is enough.

Common pitfall — bottom-up tree-building It is tempting to start by listing every component on a train and asking "how can it fail?" That gives an FMEA, not a fault tree. The deductive method (IEC 61025 §5) goes the other way: start at the top, ask "what immediately causes this?", and only descend when each immediate cause has been named. You stop adding branches when the next level would be invented to make the picture symmetrical, not because reality demands it.

Step 3Drill each branch to basic events

The top event has been resolved into three branches. Now resolve each branch into the basic events that an analyst can actually quantify — components with a published failure rate or a per-demand human-error probability backed by a recognised database.

Signalling branch (G-001, OR)

For the wayside signal to display the wrong aspect, one of three subsystems has to misbehave: the signal head itself (lamp/LED unit), the interlocking logic that drives it, or the cable run between them. These are physically and electrically independent, so they enter as parallel children of the OR.

ATP branch (G-002, AND of two ORs)

This is the structurally interesting branch. The ATP "fails to intervene" only if both the ATP logic fails to command the brake and the emergency brake fails to act. So G-002 is an AND of two sub-gates, each of which is itself an OR over its own components:

The AND between G-003 and G-004 captures exactly the safety-architecture intent: ATP and brakes are two layers of defence, and the train only loses both at once if both fail in the same window.

Driver branch (G-005, AND)

Treating these as independent is a known simplification — fatigue or distraction degrades both — but it's the conservative default for a first-pass tree. We'll revisit it in Step 6.

The full structure, drawn live in the embedded editor below, is exactly the SPAD template that ships with FTA Studio. Pan, zoom and click any node to see its description, failure mode, detection mechanism and λ/P value. (No data is sent anywhere — the renderer is offline.)

Live tree · pan / zoom / click nodesOpen in new tab ↗

Step 4Attach data to the basic events

Eight basic events, two kinds of data. Hardware components get a failure rate λ (failures per hour, exponential model — constant during the useful-life region of the bathtub curve). The two driver events get a per-demand probability — there is no "rate of driver-not-seeing-things"; there is a probability that, on any given approach to a stop signal, the driver fails to see it.

Indicative numbers, drawn from the typical EN 50126 / EIRENE / IEC TR 61511-3 ranges and aligned with the FTA Studio rail SPAD template. For a real submission you would substitute your own field-validated data and cite the source per basic event.

UIDBasic eventλ (per h) or P (per demand)Detection / mitigation already in design
BE-001Signal lamp / LED unit failureλ = 5×10⁻⁵Lamp proving — dark signal treated as most restrictive aspect by operating rule
BE-002Signal controller logic faultλ = 2×10⁻⁶Vital relay cross-checking; double-output proving
BE-003Signal cable / wiring faultλ = 1×10⁻⁶Lamp proving detects open / short / earth; cable route protection
BE-004Balise / transponder reader faultλ = 5×10⁻⁷Telegram CRC check; two independent readers per train
BE-005On-board ATP computer faultλ = 3×10⁻⁷Dual-processor comparison; watchdog; loss-of-output triggers service brake
BE-006Brake pipe pressure application failureλ = 2×10⁻⁷Fail-safe pneumatic design — brakes apply on pressure loss
BE-007Compressed-air supply failureλ = 1×10⁻⁷Low-reservoir pressure inhibits departure
BE-008Driver fails to observe stop aspectP = 1×10⁻³Driver Reminder System (DRS); SPAD indicator boards; vigilance device
BE-009Driver fails to apply brakes in timeP = 1×10⁻⁴ATP back-up provides automatic intervention; reduced approach speed

The "detection / mitigation already in design" column is not decoration. It's where most analysts go wrong on the second pass — see the next step.

To turn rates into the per-year probabilities the OR/AND arithmetic needs, use mission time T = 8760 h and the exponential reliability model: P = 1 − exp(−λT) ≈ λT when λT is small. For BE-001 the small-λT approximation breaks down (λT = 0.438), so we use the exact form: P = 1 − exp(−0.438) ≈ 0.354. For every other hardware basic event, λT < 0.02 and the linear approximation is accurate to better than 1%.

Why these particular numbers — and a candid warning The λ for a signal lamp at 5×10⁻⁵/h includes all lamp failures, not just wrong-side ones. Most lamp failures produce a dark signal which, by operating rule, is treated as the most restrictive aspect — driver stops, no SPAD. If you propagate the raw λ straight up the tree, you'll over-count the contribution of the signalling branch by roughly two orders of magnitude. We'll quantify the tree as-built first (Step 5), expose this exact distortion in the importance ranking (Step 6), and then come back to fix it. Walking through the wrong answer once is the fastest way to learn why the correction matters.

Step 5Qualitative analysis — minimal cut sets

Before any numbers, the tree should be reduced to its minimal cut sets: the smallest combinations of basic events whose joint occurrence makes the top event true. The qualitative result tells you the structure of the failure space — how many single points of failure exist, where the redundancies actually live, and which combinations the design depends on. The standard algorithm is MOCUS (Method for Obtaining Cut Sets), which substitutes children into parents top-down and applies absorption.

Top-down substitution for our tree:

  1. TOP = G-001 OR G-002 OR G-005
  2. Expand G-001 (OR): BE-001 OR BE-002 OR BE-003 — three single-event cuts.
  3. Expand G-002 (AND): G-003 · G-004. Then expand each: (BE-004 OR BE-005) · (BE-006 OR BE-007). Distribute the AND over OR — four two-event cuts.
  4. Expand G-005 (AND): BE-008 · BE-009 — one two-event cut.

No cut is a superset of another, so absorption removes nothing. Eight minimal cut sets in total:

#Cut setOrderBranch
1{BE-001}1Signalling — lamp
2{BE-002}1Signalling — controller
3{BE-003}1Signalling — cable
4{BE-004, BE-006}2ATP reader · brake pipe
5{BE-004, BE-007}2ATP reader · air supply
6{BE-005, BE-006}2ATP computer · brake pipe
7{BE-005, BE-007}2ATP computer · air supply
8{BE-008, BE-009}2Driver double-miss

Three structural facts fall out of this table immediately, before a single probability is multiplied:

What MOCUS just gave you for free A list of every single-point-of-failure in the design, ranked by order. Even if you stop here and never quantify, the order-1 cuts are the conversation to have first with the design team — adding a redundant ATP channel does nothing about them.

Step 6Quantitative evaluation and importance

Each cut set has a probability — for an AND cut, the product of its event probabilities (assuming independence); for a single-event cut, the event's probability directly. The top-event probability is the probability of at least one cut set occurring. With small cut-set probabilities, the rare-event approximation P(TOP) ≈ Σ P(cut) is accurate to four or five decimal places.

As-built quantification

Plug in the per-year probabilities from Step 4 (P = 1 − exp(−λT), T = 8760 h):

#Cut setP(cut) per year
1{BE-001}3.54×10⁻¹
2{BE-002}1.74×10⁻²
3{BE-003}8.72×10⁻³
4{BE-004, BE-006}7.65×10⁻⁶
5{BE-004, BE-007}3.83×10⁻⁶
6{BE-005, BE-006}4.59×10⁻⁶
7{BE-005, BE-007}2.30×10⁻⁶
8{BE-008, BE-009}1.00×10⁻⁷
Σ (top-event probability per train per year)≈ 3.80×10⁻¹

Divide through by mission time to recover a rate: 3.80×10⁻¹ / 8760 ≈ 4.3×10⁻⁵ SPADs per train per hour. The target was 10⁻⁹/h. We're four orders of magnitude over, and a single cut — {BE-001}, the signal lamp — is responsible for 93% of the answer.

Two things can be true at this point: the model is wrong, or the system is unsafe. Both deserve checking, but the order-of-magnitude size of the gap is the tell. Real signalled railways are nowhere near 10⁻⁵/h SPAD rates per train. The model is the problem.

The wrong-side correction

The lamp's 5×10⁻⁵/h failure rate is for any failure, but only wrong-side failures (lamp shows proceed when stop was commanded) actually cause a SPAD. The lamp-proving circuit detects open / short / dark conditions and the operating rule treats a dark signal as most restrictive — so a dark lamp produces a stopped train, not a SPAD. The same argument applies to the controller logic (vital relay cross-checking catches most logic failures before they reach the head) and to the cable (lamp proving detects open and short faults).

Industry experience for vital signalling is a wrong-side fraction of roughly 1% — the rate at which failures escape the proving and produce a proceed-when-stop indication. Apply 1% to BE-001..BE-003 and re-quantify:

#Cut setP(cut) per year — corrected
1{BE-001} wrong-side4.37×10⁻³
2{BE-002} wrong-side1.75×10⁻⁴
3{BE-003} wrong-side8.76×10⁻⁵
4–7ATP × brake combinations1.84×10⁻⁵ (sum)
8{BE-008, BE-009}1.00×10⁻⁷
Σ (top-event probability per train per year)≈ 4.65×10⁻³

Per hour: ≈ 5.3×10⁻⁷/h. Still 500× above target — but now the model is plausibly right, and the residual gap is a system question, not an arithmetic one. Closing it would mean modelling per-demand signal approaches (a train doesn't approach a stop signal continuously for 8760 h), ATP coverage fraction, and downstream barriers like track circuits and signaller intervention. Those refinements are out of scope for a first-pass tree but they're now well-defined questions, which is the point.

Importance measures — where the next pound goes

The Fussell-Vesely importance of a basic event is the fraction of the top-event probability attributable to cut sets containing that event. Computed against the corrected numbers:

Basic eventF-V importanceImplication
BE-001 (lamp wrong-side)≈ 94%Single dominant contributor. LED arrays with per-emitter monitoring, or independent secondary lamp confirmation, would directly attack this.
BE-002 (controller wrong-side)≈ 3.8%Already mitigated by vital-relay cross-checking. Diminishing returns.
BE-003 (cable wrong-side)≈ 1.9%Lamp proving covers the dominant failure modes.
BE-004..007 (ATP layer)≈ 0.4% combinedAdding more ATP redundancy buys negligible risk reduction at this point. The ATP is already over-engineered relative to the binding constraint.
BE-008, BE-009 (driver)< 0.01%Driver path is dominated by other layers. DRS and SPAD indicator boards are already adequate at this level of resolution.

The rank order is the actionable output. It tells the design team that the next round of risk-reduction effort should target the wayside lamp's wrong-side failure mode — independent confirmation of aspect, redundant emitters, faster proving — and that an extra ATP channel would be money spent on the wrong barrier. That is the kind of statement a fault tree exists to make.

Sanity-check: did the model agree with itself? The corrected per-train-per-year SPAD probability of 4.65×10⁻³ is in the right ballpark for the per-train-per-year SPAD rates published by European infrastructure managers (ORR data for GB rail, around 3×10⁻³ per train per year). That coincidence isn't proof — the underlying populations are different — but it's the kind of independent reality check that belongs in the safety case alongside the calculation.

What this guide deliberately left out

A first-pass tree is a deliberate simplification. The four most consequential things this one ignored, in roughly the order an EN 50126 reviewer would raise them:

None of these change the qualitative shape of the answer — signalling wrong-side dominates, ATP is over-engineered relative to it, the driver path is structurally weak but quantitatively small. They tighten the absolute number. That's the right order to do refinement: get the structure right, get the importance ranking right, then chase decimals.

Where to go next

You've got a defendable first-pass tree, an importance ranking that points at the lamp wrong-side rate, and a list of four refinements that would tighten the number. Three things you might do from here: