EN 50126 RAMS — running FTA at SIL 4
SIL 4 is the highest railway safety integrity level — Tolerable Hazard Rate ≤ 10⁻⁹ per hour, the bar at which a single national authority and an Independent Safety Assessor both have to be persuaded the system won't kill people. EN 50126's RAMS process threads fault tree analysis through the V-cycle in three distinct roles: SIL apportionment (top-down), design verification (bottom-up), and safety-acceptance evidence at validation. Cross-acceptance between national authorities, the ISA review, and CSM-RA's three risk-acceptance pillars all assume the FTAs hold up. This guide covers the framework, the apportionment math, a worked ETCS Movement Authority enforcement tree, and what an ISA actually checks.
Why rail SIL 4 is its own conversation
Rail safety cases differ from automotive (ISO 26262) and aerospace (ARP 4761) submissions in three structural ways that shape how the FTA is built:
- Cross-acceptance between national authorities. A signalling system designed for Deutsche Bahn must be re-acceptable to SBB, SNCF, NS, ADIF and a dozen other infrastructure managers, each with its own national notified body and its own historical preferences for risk demonstration. EU Regulation 402/2013 (the Common Safety Method on Risk Assessment, CSM-RA) is the harmonisation framework, but in practice each national authority still applies its own emphasis. An FTA that survives BNetzA scrutiny may be challenged on different grounds by ORR, even though both reference the same standards. The tree's structural conventions — the choice of cut-set ordering, the level of decomposition, the data-source citations — have to anticipate this multi-jurisdiction review.
- Independent Safety Assessment is mandatory at SIL 3 and SIL 4. EN 50129 §5.1.4 requires an ISA performed by an entity organisationally independent of the developer (an accredited NoBo or ISA — DEKRA, TÜV, RINA, Lloyd's Register, BV, Certifer). The ISA reads every FTA in the safety case and produces an independent opinion. ISAs are typically more sceptical than internal reviews; an FTA that passes internal verification routinely fails ISA on points the developer didn't consider contentious.
- CSM-RA's three risk-acceptance pillars treat FTA as one of three justification routes. The CSM-RA framework lets a hazard be accepted by (a) following an established Code of Practice, (b) similarity to a Reference System with established safety record, or (c) Explicit Risk Estimation backed by analysis. FTA underpins (c). For new technologies (ETCS Level 3, autonomous train operation, FRMCS-based supervision) where Codes of Practice don't yet exist and Reference Systems are sparse, FTA is the only available pillar. SIL 4 systems built on novel architectures lean on FTA harder than systems with decades of operational pedigree.
The fourth structural difference, less specific to SIL 4 but pervasive in rail, is the lifetime: trains are in service 30–40 years, signalling systems often longer. The FTA has to survive that long with operational data feeding back through the RAMS process at every modification. Phases 11–14 of EN 50126:2017 (Operation, Performance Monitoring, Modification, Decommissioning) all generate inputs that potentially revise the FTA — a frequency rail engineers take for granted but that startles cross-domain colleagues.
Step 1The RAMS V-cycle and where FTA enters
EN 50126:2017 specifies a 14-phase V-cycle covering the system from concept to disposal. FTA isn't called for in every phase, but it threads through several with different roles each time:
| RAMS phase | FTA role |
|---|---|
| Phase 3 — Risk analysis & evaluation | Preliminary fault trees identify top events for each hazardous functional failure. Quantification is order-of-magnitude — used to assign initial SIL targets via CENELEC R2A2 risk matrix or CSM-RA equivalent. |
| Phase 4 — System requirements specification | Top-event probability targets (Tolerable Hazard Rates, THRs) become safety requirements traced to the FHA-equivalent register. Each SIL 4 hazard's THR (typically 10⁻⁹/h to 10⁻⁸/h) is fixed at this point and doesn't change downstream without a re-justification cycle. |
| Phase 5 — Apportionment of system requirements | Primary FTA usage. Top-down decomposition of each SIL 4 hazard's THR into subsystem-level allocations. Each branch's basic-event THRs sum (or multiply, under AND gates) to the parent THR. Output: a per-subsystem Safety Requirement Specification with allocated THRs and SILs. |
| Phase 6 — Design and implementation | Primary FTA usage. Bottom-up verification: as each subsystem's design data becomes available, its actual THR is computed and compared to the Phase-5 allocation. Re-quantification with validated data; re-design when allocations are missed. |
| Phase 7 — Manufacturing | FMEA at component level, traceability of FMEA failure modes to FTA basic events. Reused from Phase 6 if the design hasn't changed. |
| Phase 8 — Installation | Installation-specific hazards (track geometry, climate, traffic profile) added as new basic events; the per-installation FTA may differ from the generic-product FTA. |
| Phase 9 — System validation | Cut-set verification against operational test data. ISA reviews the FTA artefacts at this phase. |
| Phases 11–13 — Operation, monitoring, modification | Operational reliability data feeds back; basic-event probabilities update; the FTA is re-quantified periodically (typically annually for SIL 4) and on every modification. |
The two phases where FTA is the primary deliverable are 5 (apportionment) and 6 (verification). The same tree structure is used for both, but the basic-event probabilities differ — Phase 5 uses allocations the engineering team is committing to, Phase 6 uses validated component data once the design is real. Reviewers compare the two versions: the gap between "what we allocated" and "what we actually achieved" is itself a piece of evidence about the maturity of the design process.
Step 2SIL apportionment — the math that drives Phase 5
Apportionment is the inverse problem of FTA: instead of "what's the top-event probability given the basic events", it's "what THR must each basic event meet so the top satisfies its SIL target". The starting point is the EN 50129 SIL/THR mapping (Table A.1, mildly summarised):
| SIL | Tolerable Hazard Rate (per hour) |
|---|---|
| SIL 4 | 10⁻⁹ ≤ THR < 10⁻⁸ |
| SIL 3 | 10⁻⁸ ≤ THR < 10⁻⁷ |
| SIL 2 | 10⁻⁷ ≤ THR < 10⁻⁶ |
| SIL 1 | 10⁻⁶ ≤ THR < 10⁻⁵ |
The apportionment-through-the-tree rules follow Boolean algebra. For an OR gate, child THRs sum to the parent's. For an AND gate, child THRs multiply (with a mission-time / proof-test-interval factor). For k-of-N voting, the formulas in EN 50129 Annex E apply.
OR-gate apportionment — straightforward but unforgiving
If a SIL 4 hazard at THR ≤ 10⁻⁹/h is reached through three independent OR'd causes, the budget splits across them. A naive even split gives each child THR = 10⁻⁹/3 ≈ 3.3×10⁻¹⁰/h — still inside the SIL 4 band, so each child remains SIL 4. The OR structure does not relax the per-branch SIL. The only way OR-decomposition lowers the per-branch burden is if the children have intrinsically different rates (one branch at 10⁻¹¹/h, two at 5×10⁻¹⁰/h, etc.), but the per-branch requirement stays at SIL 4 even when the apportionment is uneven.
AND-gate apportionment — where SIL relaxation lives
Under AND, the math changes substantively. Two independent channels each with dangerous-undetected failure rate λDU proof-tested at interval T have combined system rate:
λ1oo2 ≈ (λDU,1 · λDU,2) · Tproof
Setting λ1oo2 = 10⁻⁹/h for SIL 4, with Tproof = 10⁵ hours (≈ 11 years between proof tests, typical for installed signalling) and assuming identical channels:
λDU ≤ √(10⁻⁹ / 10⁵) = √10⁻¹⁴ = 10⁻⁷ /h
Each channel only needs to clear λDU ≤ 10⁻⁷/h — the SIL 2 band. A SIL 4 system can be built from two SIL 2 channels under AND, given proven independence and the proof-testing programme. This is the rail equivalent of ASIL decomposition; the math is the same, the vocabulary differs.
Tighter proof-test intervals shift the apportionment further. At Tproof = 10⁴ h (annual): λDU ≤ √10⁻¹³ ≈ 3.2×10⁻⁷/h, still SIL 2. At Tproof = 10³ h (six-weekly): λDU ≤ √10⁻¹² ≈ 10⁻⁶/h, SIL 1. Annual proof testing buys a per-channel SIL relaxation of 1–2 levels; weekly proof testing buys 2–3.
Voting — the third structural pattern
Most safety-critical signalling vital functions historically used 2-out-of-3 voting (the Westrace / SmartLock pattern). For three independent channels, system fails if at least 2 fail:
λ2oo3 ≈ 3 · λDU² · Tproof (rare-event approximation)
Setting λ2oo3 ≤ 10⁻⁹/h with Tproof = 10⁵ h: λDU ≤ √(10⁻⁹ / (3·10⁵)) ≈ 5.8×10⁻⁸/h — still SIL 2, but tighter than the 1oo2 case because three failure-pair combinations exist. Voting buys robustness against single-channel transients (any one channel can fail without service disruption) at the cost of slightly tighter per-channel rates.
Two structural points the apportionment math makes explicit:
- The proof-test interval is a first-class design parameter. Shrinking Tproof from 10 years to 1 year buys ~3× per-channel rate margin. Field operators sometimes resist tightening proof tests because they cost service time; the FTA is what shows them the SIL trade-off.
- Independence is again the load-bearing assumption. The (λDU)² term presupposes the two channels are stochastically independent. If they share a root cause — same vendor batch, same firmware, same maintenance team — the (λDU)² term gets joined by a β·λDU term that often dominates. Article 5's β-factor analysis is exactly the calculation rail teams run after the apportionment to verify the redundancy isn't illusory.
Step 3Worked example — ETCS L2 Movement Authority enforcement
Take the failure condition: "Train overruns end of Movement Authority without intervention." ETCS Level 2 architecture, 200 km/h main-line operation. CSM-RA risk-matrix classification places this at Catastrophic / Frequent → Intolerable without mitigation; mitigation drives the THR target down to 10⁻⁹/h per train, ASIL-4-equivalent in CENELEC terms.
Top-level decomposition into three independent failure modes — each capable of producing the top event on its own:
Train overruns end of MA
│
▼ (OR)
┌────┼────┐
SC-1 SC-2 SC-3
│ │ │
│ │ └─ SC-3: on-board enforcement failure
│ │ (EVC fails to command brake AND brake wrong-side)
│ └────── SC-2: GSM-R/FRMCS comms corrupts MA undetected
└─────────── SC-1: RBC trackside issues over-permissive MA
Phase 5 — apportionment (committed THRs)
Phase 5 is where the engineering team takes commitments. The 10⁻⁹/h TOP budget is split across the three OR branches; each subsystem THR becomes a contractual safety requirement on the subsystem supplier:
| Subsystem | Allocated THR /h | Implied SIL per channel | Architectural rationale |
|---|---|---|---|
| SC-1 — RBC trackside (Alstom Iconis, Siemens Trackguard) | 3×10⁻¹⁰ | SIL 4 (single-channel vital) | Vital interlocking compute, no decomposition; SIL 4 development across the lifecycle. |
| SC-2 — GSM-R / FRMCS comms link | 3×10⁻¹⁰ | SIL 4 protocol layer | EuroRadio safe-protocol layer (UNISIG SUBSET-037) provides cryptographic message authentication; failure rate is the residual undetected-corruption rate. |
| SC-3 — On-board EVC + brake (AND) | 4×10⁻¹⁰ | EVC: SIL 2 per channel (1oo2D); brake: SIL 4 (fail-safe pneumatic) | Dual-channel EVC with comparison + diagnostic permits per-channel SIL 2 development per Step 2 apportionment math. Brake is fail-safe wrong-side at intrinsic SIL 4. |
| TOP (sum, OR gate) | 1.0×10⁻⁹ | SIL 4 | Within the 10⁻⁹/h THR target by allocation. |
The on-board allocation of 4×10⁻¹⁰/h is split internally via the SC-3 AND gate: with EVC dual-channel at λDU = 10⁻⁷/h per channel, Tproof = 10⁴ h → λEVC,1oo2 ≈ 10⁻¹⁰/h. Brake fail-safe rate ≈ 4×10⁻¹⁰/h (the tightest constraint in the on-board chain). AND product modulated by demand profile gives a combined SC-3 rate clearing the budget.
Phase 6 — verification (validated THRs from operational + FMEDA data)
Phase 6 takes place after the design is real and component data is available — vendor reliability reports, FMEDA from the supplier, operational data from prior installations of the same RBC family, EuroRadio security analysis pinned to the actual MAC algorithm version. The verified rates routinely come in tighter than the Phase-5 allocations (because the allocations were chosen with margin) — but reviewers compare the two side-by-side:
| Subsystem | Phase 5 allocated /h | Phase 6 verified /h | Margin |
|---|---|---|---|
| SC-1 — RBC trackside | 3×10⁻¹⁰ | 1.2×10⁻¹⁰ | 2.5× under allocation |
| SC-2 — GSM-R safe-protocol residual | 3×10⁻¹⁰ | 5×10⁻¹¹ | 6× under allocation |
| SC-3 — On-board EVC + brake | 4×10⁻¹⁰ | 8×10⁻¹¹ | 5× under allocation |
| TOP (Σ verified) | 1.0×10⁻⁹ | 2.5×10⁻¹⁰ | 4× under SIL 4 target |
2.5×10⁻¹⁰/h vs the SIL 4 target of 10⁻⁹/h. The architecture passes Phase 6 with a 4× margin. The Phase-5/Phase-6 gap (allocations were uniformly conservative by 2.5–6×) is the kind of evidence ISAs read as "engineering team understood the headroom they were giving themselves" — a positive signal.
Cut-set verification
The minimal cut-set list for the post-verification tree, ordered by contribution to TOP:
Order 1 cuts:
{RBC vital interlock failure} 1.2×10⁻¹⁰/h [48%]
{EuroRadio MAC residual corruption} 5×10⁻¹¹/h [20%]
Order 2 cuts (within SC-3):
{EVC channel-A wrong-side, EVC channel-B wrong-side} ~10⁻¹¹/h
{EVC dual-channel agreement on wrong output, brake wrong-side} ~10⁻¹³/h
... (multiple combinations under the on-board AND)
Two single-event (order-1) cuts at SIL 4 — each is allowed because each is itself developed to SIL 4 by intrinsic-integrity argument (no decomposition relaxes them). The "no single failure causes catastrophic" structural check is satisfied at the subsystem-developed-to-SIL-4 level rather than at the architectural level: the RBC and the EuroRadio layer are each themselves SIL 4 single-channel components, so single-event cuts are expected and the integrity claim rests on the SIL 4 development rigour rather than on architectural redundancy. Aviation reviewers (cf. Article 8) reject this pattern; rail reviewers accept it because the component history is dispositive.
Step 4What the ISA actually checks at Phase 9
The Independent Safety Assessor's review at Phase 9 (System Validation) is the gate the safety case has to pass before commissioning. ISAs aren't auditing the developer's process — they're answering the question "would I sign that this system is safe to put into service?". Six things they probe systematically on every FTA in the safety case:
| What's checked | What "fails" looks like |
|---|---|
| 1. Phase 5 / Phase 6 traceability per branch. Every allocated THR in Phase 5 has a corresponding verified THR in Phase 6, with an explicit margin. Branches missing one or the other trigger investigation. | "SC-2 GSM-R allocation 3×10⁻¹⁰/h appears in Phase 5; no corresponding Phase 6 verification entry. Explain or supply." Common when comms-layer security is supplied by an external party and the ISA hasn't seen its safety case. |
| 2. CSM-RA pillar declaration. Each hazard accepted explicitly via one of three pillars: Code of Practice (e.g. existing TSI), Reference System (similar legacy operation), or Explicit Risk Estimation (FTA-backed). FTA-only acceptance carries the highest justification burden. | "Hazard H-1 declared accepted via CSM-RA Pillar 3 (Explicit Risk Estimation); FTA references Reference System data without showing equivalence justification per CSM-RA §2.4." Mixing pillars without naming the dominant one is the typical finding. |
| 3. Independence claims at every AND / 1oo2 / 2oo3 gate. Each redundancy claim has explicit β-factor analysis (cf. Article 5) plus a cross-link to the CCA artefact showing the diversity dimensions covered. | "On-board EVC dual-channel claimed independent; no β-factor analysis attached; both channels run identical Embedded Application code on identical Infineon AURIX hardware. CCF expected." Most common ISA finding; ~40% of first-pass reviews surface this. |
| 4. Operational data citations with provenance. Every basic-event THR cites a specific data source with revision number and population (fleet size × in-service hours × failure modes covered). | "Basic event 'EVC channel-A wrong-side' cited at λ = 8×10⁻⁸/h with reference 'Industry experience'. Specify fleet, duration, failure-mode coverage." Vague citations are flagged systematically. |
| 5. Cross-acceptance compatibility evidence. If the system is intended for multi-national deployment, the safety case shows alignment with each target authority's national specifics (TSI variants, national STM rules, language and units convention). | "Phase 9 safety case is written against ORR expectations only; intended deployment includes ProRail and Infrabel. Show equivalence to RailNet and FOD MV evaluation criteria." Catches a programme that planned for one authority and discovered three. |
| 6. Operational reliability programme committed. Phases 11-13 inputs (operational monitoring, modification, performance review) feed back into the FTA. The ISA wants the agreed cadence — typically annual re-quantification at SIL 4 — documented as a contractual commitment between the developer, the infrastructure manager, and the train operator. | "Operational reliability data feedback cycle not specified; FTA validity period claimed at 'design lifetime' with no review trigger." A SIL 4 system without an annual re-quant cycle isn't going to survive its own operational data drift. |
The single most-asked ISA question that catches developers off-guard: "if the verified rates in Phase 6 came out 10× tighter than the Phase 5 allocations, why was the allocation set so loose?" The right answer ("we wanted margin") is fine. The wrong answer ("we used handbook values") suggests Phase 5 wasn't grounded in real engineering, and the ISA will probe further.
Five pitfalls a rail-domain reviewer will catch
- Phase 5 / Phase 6 drift after architecture changes. A late-stage redesign (Hitachi pivots from a single-vendor RBC to a multi-vendor coalition; a comms-layer protocol switches from GSM-R to FRMCS) invalidates the Phase 5 apportionment. Trees that show the verified rates without the corresponding apportionment update produce "where did the allocations come from?" findings.
- β-factor analysis missing on redundant channels. Most common ISA finding by volume. The 1oo2D EVC architecture mathematically apportions to per-channel SIL 2 — but only if independence holds. Article 5's β-factor analysis is the artefact that defends independence; absence of it is interpreted as "the development team didn't think about it".
- Mixing per-flight-hour conventions into rail trees. Engineers crossing from aerospace sometimes import per-flight-hour rates without converting; rail uses per-operational-hour or per-train-hour. Per-train-hour is the convention for vehicle-mounted equipment; per-operational-hour for trackside. Mixing the two produces incomparable cut sets.
- SIL apportionment mathematically valid but architecturally unjustifiable. The math says "each channel at SIL 2 is fine" but the actual development was at SIL 1 rigour (no formal V&V, no fault-injection campaign, no FMEDA). The ISA cross-checks the apportionment SIL against the Phase 6 development evidence; a mismatch is fatal.
- Operational data citation laundering. "Fleet experience" without naming the fleet. "Industry data" without naming the industry source. "Manufacturer-supplied" without the manufacturer's reliability prediction methodology. Each of these is a flag; the correct citation form is "ProRail GSM-R fleet, 2018-2023, 12.3M train-hours, 2 wrong-side incidents, 95% confidence one-sided λ ≤ 1.2×10⁻⁷/h".
Where to go next
- Build the apportionment tree. Open FTA Studio — the rail SPAD template (Article 1's worked example) is a SIL 4 hazard at smaller scale that exercises the same apportionment math without the ETCS complexity.
- For β-factor on redundant rail channels, Article 5 covers the math; the IEC 61508-6 Annex D scoring sheet that EN 50129 §B.4 references inherits is the artefact ISAs expect to see.
- For the cross-domain context, Article 6 (automotive ASIL decomposition) and Article 8 (aerospace ARP 4761) cover the equivalent regulatory frameworks. The structural patterns repeat with vocabulary variations.
- For the standards-side overview, the EN 50126 reference page covers RAMS at the standards level.