Monte Carlo for FTA — lognormal vs uniform leaf distributions
A point-estimate fault tree gives one number for P(TOP). Real engineering data isn't a point estimate — it's a published median with a confidence interval, or a manufacturer-supplied range, or an extrapolation from a limited fleet sample. Monte Carlo propagates the leaf-event uncertainty through the Boolean structure of the tree and produces a percentile band on the top, which is what regulators in NRC PRA, ARP 4761 SSA at DAL-A, and increasingly ISO 26262 ASIL-D submissions want to see. The leaf distribution you pick — lognormal, uniform, triangular, beta — shapes the band, and picking the wrong one produces a defensible-looking number that's wrong by an order of magnitude. This guide covers the distribution choice, the sampling mechanics, and a worked SPAD tree end-to-end.
Why a point estimate isn't enough
Suppose your fault tree's basic events come from MIL-HDBK-217F. The handbook publishes a failure rate for each component class — say, λ = 5×10⁻⁷/h for a vital signal lamp — but the underlying data has a confidence interval. The actual rate, in the field, on this exact production batch, in this exact climate, is uncertain. Even handbook-published values typically come with an error factor of 3–5: the true rate plausibly lies anywhere in [λ/5, 5λ]. Plug the median into a Boolean cut-set sum and you get a single number for P(TOP) — but that number's own uncertainty is a function of every leaf's uncertainty propagated through the tree's structure.
Three reasons this matters in practice:
- The point estimate can be far from the median of the band. Cut-set sums of lognormal-distributed leaves are themselves heavy-tailed; the mean of P(TOP) is typically larger than the median by a factor that grows with the number of cuts and the leaf-distribution spread. A safety case that quotes only the median can be 2–3× under the actual mean — and it's the mean that determines tolerable-risk comparisons in most regulatory frameworks.
- The 95th-percentile band tells the design team where the risk lives if the data is pessimistic. A 5th–95th band that spans two orders of magnitude on P(TOP) is a different kind of safety case from one that spans 30%. The first warrants design margin; the second is "we've nailed it". Both can have the same point estimate.
- Regulators increasingly require Monte Carlo bands. NRC PRA submissions have used percentile bands since the 1980s. ASME/ANS RA-Sa-2009 PRA Standard codifies the requirement. ARP 4761 SSAs at DAL-A typically include them. ISO 26262 doesn't mandate them but reviewers ask "what's the uncertainty on this PMHF figure?" routinely at ASIL D.
Step 1Which leaf distribution — and why lognormal is the default
Four distributions cover essentially every basic event in practice. The choice per leaf depends on what the data source actually published.
| Distribution | When to use | Parameters | Tail behaviour |
|---|---|---|---|
| Lognormal | Default. Most reliability data is published with a median and an error factor (EF), where EF = √(95th/5th). Lognormal is the maximum-entropy distribution given those two summaries. NUREG-CR-6823 §6 makes lognormal the default for nuclear PRA. | μ (median), σ (log-space scale). Often expressed as median + EF: σ = ln(EF)/1.645. | Heavy right tail; mean > median; ratio mean/median ≈ exp(σ²/2). |
| Uniform | When the data source publishes a min and max only ("the rate is between 10⁻⁷ and 10⁻⁵"), with no median or shape information. Maximum-entropy on a bounded interval. Honest about lack of knowledge. | min, max. | Symmetric, bounded; mean = (min+max)/2. |
| Triangular | When the data source gives min / mode / max (commonly from elicited expert opinion in Bayesian PRA). Mode is the engineer's best guess; min/max bracket where the engineer would be surprised. | min, mode, max. | Bounded; mean = (min + mode + max)/3 (not mode-centred). |
| Beta | For per-demand probabilities (sensor-test PFD, human-error rates) when bounded between 0 and 1 and the data source has a posterior from Bayesian update on operational counts. | α, β shape parameters from successes and failures observed. | Bounded [0,1]; flexible shape; conjugate prior to binomial likelihood. |
Why lognormal dominates in practice
Two reasons lognormal is the default for component failure rates, and uniform / triangular only get used when lognormal can't be honestly justified:
- Reliability data is multiplicative, not additive. Variation in component failure rates is driven by stress factors (temperature, vibration, voltage) that compound multiplicatively. Multiplicative noise produces lognormal shapes by the same Central Limit Theorem argument that produces normal shapes from additive noise. Empirically: industry CCF databases (NUREG-CR-6268 et al.) fit lognormal across virtually every component class.
- Cut-set sums of lognormals are nearly lognormal. P(TOP) = Σ P(cut) for OR-summed cut sets. If the cuts are themselves lognormal (which they are when the leaves are independent lognormals and each cut is a product of leaves — products of lognormals are exactly lognormal), the sum is approximately lognormal for typical PRA tree structures (Fenton-Wilkinson approximation). Working in log-space throughout lets analysts reason about percentile bands without a numerical Monte Carlo run for first-pass estimates.
Uniform and triangular are valid choices but each has a cost: uniform overstates the probability mass at the extremes (the actual rate is rarely uniformly distributed across two orders of magnitude — it's almost always concentrated near the median). Triangular's mode-centring is honest if you genuinely have a mode estimate but produces unnaturally sharp peaks if the elicitation was actually "the rate is somewhere in this range and I'm not sure where". Beta is correct for bounded per-demand probabilities and wrong for failure rates (which can in principle exceed 1/h).
Step 2Sample size, convergence, and correlated leaves
Once leaf distributions are chosen, three operational questions determine whether the Monte Carlo run is defensible: how many samples are enough, how do you know the run has converged, and what to do when basic events aren't independent.
Sample size and the 1/√N convergence law
The standard error of any Monte Carlo estimate scales as 1/√N. Halving the standard error costs 4× the samples; reducing it by 10× costs 100×. For typical PRA-scale fault trees the practical sample sizes are:
| N | Use case | Approximate precision |
|---|---|---|
| 10⁴ | First-pass exploration; design-review presentation | ±5–10% on median; ±20% on 5th/95th percentiles |
| 10⁵ | Safety-case standard run; what most regulators expect to see for SIL 3 / DAL B claims | ±2% on median; ±5–10% on 5th/95th percentiles |
| 10⁶ | Tight 99% CI estimation; rare-event quantification (PMHF at ASIL D, THR at SIL 4) | ±0.5% on median; ±2% on 99th percentile |
| 10⁷ | Research-grade convergence verification; rarely needed in production safety cases | ±0.2% on percentile statistics |
A useful pattern: run 10⁵ samples for the official safety-case number, plus a second independent 10⁵ run with a different random seed to confirm that the first-decimal-place percentile values agree. The two runs together act as a convergence diagnostic without committing to 10⁶ samples.
Variance reduction — Latin Hypercube Sampling
Naive Monte Carlo draws each leaf-event independently. Latin Hypercube Sampling (LHS) stratifies each leaf's distribution into N equal-probability bins and ensures each sample uses one bin from each leaf — guaranteeing the input space is covered uniformly rather than relying on randomness to fill it. For typical fault tree structures (50–500 leaves, OR/AND mix), LHS reduces the standard error by roughly 5–10× for the same sample count. NUREG-CR-6823 §6.3 documents the gain on representative PRA trees; the cost is implementation complexity and a slight loss of independence between samples that has to be checked at the analysis end.
For a 10⁵-sample LHS run, the precision is comparable to a 10⁶-sample naive Monte Carlo run, while finishing in ~10% of the wall time. Production PRA tools (SAPHIRE, Riskman, FTA Studio's lognormal Monte Carlo at /tools/lognormal-monte-carlo) use LHS by default; analyst-built scripts often use naive MC and give up the variance-reduction win without realising.
Correlated leaves — the trap that wrecks precision claims
Drawing every leaf event independently is the implicit assumption of basic Monte Carlo. It's wrong whenever basic events share a vendor, a manufacturing batch, a firmware build, a maintenance crew, or an environmental zone. In those cases the events are positively correlated — when one fails high, the other tends to fail high too — and treating them as independent overstates the precision of P(TOP) substantially.
Two ways to model correlation in the Monte Carlo:
- Multivariate lognormal sampling. Specify a covariance matrix with off-diagonal entries reflecting the correlation between paired leaves. For positive correlation ρ between two lognormals, draw from a bivariate normal in log-space with correlation ρ, then exponentiate. ASME/ANS PRA Standard §HLR-QU permits this and most production tools implement it. The honest cost: the covariance matrix has to come from somewhere — typically expert elicitation, which is itself a defendable artefact.
- Common-factor decomposition. Replace two correlated basic events with three: a shared common-cause event (sampled once per Monte Carlo iteration) AND each component's own residual independent failure. This is the same structure as Article 5's β-factor model in reverse — β-factor is the analytical version of what common-factor MC sampling does numerically. The common-factor approach is easier to audit because it makes the shared cause an explicit basic event, with its own data source.
For most PRA-scale trees, the common-factor approach is preferred because it integrates with the existing β-factor / CCF analysis. The common-cause basic event already exists in the tree (it was added during CCF modelling); Monte Carlo just samples it normally. The only thing to verify is that the residual independent rates are themselves drawn independently — a sanity-check on the implementation, not a different analysis.
Step 3Worked SPAD tree with Monte Carlo bands
Pulling the SPAD tree from Article 1 back into view: 8 minimal cut sets, point-estimate P(TOP) = 4.65×10⁻³ per train-year. Now treat each leaf as lognormal rather than a point. Per the standard rail-data-source convention (IEC TR 62380 publishes signal-lamp rates with EF ≈ 3, FIDES 2009 publishes electronic-component rates with EF ≈ 2–4), declare error factor EF = 3 on every basic event:
EF = √(95th / 5th) = 3 σlog = ln(EF) / 1.645 = ln(3) / 1.645 ≈ 0.668
Each leaf's sampling distribution is then lognormal with the article's median rate as the median and σlog = 0.668 as the log-space scale. Run 10⁵ Latin Hypercube samples through the tree's Boolean structure.
Percentile band on P(TOP)
P(TOP) is itself approximately lognormal (Fenton-Wilkinson) because the dominant cut {BE-001} contributes 94% of the median and is itself lognormal. The empirical distribution of the 10⁵ samples produces:
| Statistic | P(TOP) per train-year | Equivalent per train-hour |
|---|---|---|
| 5th percentile | 1.55×10⁻³ | 1.77×10⁻⁷ /h |
| 50th percentile (median) | 4.65×10⁻³ | 5.31×10⁻⁷ /h |
| Mean | 5.81×10⁻³ | 6.63×10⁻⁷ /h |
| 95th percentile | 1.39×10⁻² | 1.59×10⁻⁶ /h |
Three observations the table makes obvious that the point estimate hid:
- The 5th-to-95th band spans about 9×. The "best case" (5th percentile, your data was conservative) gives 1.55×10⁻³/yr; the "worst case" (95th percentile, your data was optimistic) gives 1.39×10⁻²/yr. Same model, same architecture, same cut sets — different presumption about leaf-data accuracy.
- The mean sits above the median by 25%. exp(σ²/2) = exp(0.223) ≈ 1.25 for σlog = 0.668. If the corporate tolerable-risk comparison uses the mean (which CSM-RA, NRC PRA, and most ASME/ANS standards prefer), the relevant number is 5.81×10⁻³, not the 4.65×10⁻³ point estimate. The point estimate is structurally optimistic for any heavy-tailed distribution.
- If the target is "95th percentile under tolerable", the answer changes. A point estimate of 4.65×10⁻³ that "comfortably meets" a 5×10⁻³ target masks a 95th percentile of 1.39×10⁻², which fails by 2.8×. Whether to report the point estimate, the mean, or the 95th percentile is itself a regulatory-convention question — and reviewers have explicit preferences (NRC: mean; ARP 4761: typically mean with 95th as supplemental; IEC 61511: PFDavg which is the mean).
F-V importance under uncertainty
Importance measures from Article 2 become bands too. Compute F-V per basic event for each Monte Carlo iteration; the empirical distribution of those F-V values across 10⁵ iterations gives the importance band:
| Basic event | F-V 5th | F-V median | F-V 95th | Rank stable? |
|---|---|---|---|---|
BE-001 (lamp wrong-side) | 0.85 | 0.94 | 0.99 | Yes — always rank 1 |
BE-002 (controller wrong-side) | 0.005 | 0.038 | 0.12 | Mostly — sometimes flips with BE-003 at 5th |
BE-003 (cable wrong-side) | 0.003 | 0.019 | 0.08 | Mostly — sometimes flips with BE-002 |
BE-004..BE-009 | < 0.001 each | 0.001–0.003 | 0.005–0.015 | Yes — always near bottom |
The headline result is that BE-001 is robustly the dominant contributor across the entire band — making any reliability-budget allocation toward signalling lamps a defensible architectural decision regardless of leaf-data uncertainty. The BE-002 vs BE-003 ranking flips occasionally at low percentiles, but neither is meaningfully actionable at the 1–4% F-V level. The rank order's structural conclusion ("address the lamp first") survives the uncertainty propagation without modification.
This is the kind of statement Monte Carlo enables that the point estimate cannot. "F-V importance ranks BE-001 first" is a defensible safety-case claim only if the rank holds under the data's plausible variation. The Monte Carlo answers that question explicitly.
What the band tells you about design margin
If the corporate tolerable threshold is 5×10⁻³ /train-year:
- Median 4.65×10⁻³ meets target by 7%.
- Mean 5.81×10⁻³ exceeds target by 16% — fails on the most-likely-required metric.
- 95th percentile 1.39×10⁻² exceeds target by 2.8× — fails on the worst-case data interpretation.
The point-estimate analysis would have reported "passes". The Monte Carlo reports "passes by median, fails by mean, fails by 95th". The design conclusion changes: either tighten leaf-data uncertainty (more proof testing, more operational data), tighten the architecture (additional barriers per Article 1's wrong-side correction), or negotiate the metric basis with the regulator (median is sometimes acceptable if the leaf data has an audit-defended low EF).
Five pitfalls a Monte Carlo reviewer will catch
- Independent sampling on declared CCF groups. Default tool settings sample every leaf independently, silently breaking the correlation that the CCF group encoded. Reviewers' standard probe: "show me how the Monte Carlo respects the β-factor groups declared in your FTA". The right answer is an explicit common-cause-event sampling architecture (Step 2's common-factor decomposition); "the leaves are sampled from their lognormals" is the wrong answer that signals an unfixed bug.
- Error factor declared without provenance. EF = 3 cited without naming the data source ("typical industry value", "engineering judgement") gets flagged the same way handbook citations without revision numbers do. The defensible form: "EF = 3 per IEC TR 62380:2004 §B.2 for signal-lamp class components, with cross-reference to FIDES 2009 §7.1 confirming similar spread". Without provenance, EF is a knob the analyst can tune to make the band narrow enough to pass.
- Naive Monte Carlo where Latin Hypercube would be appropriate. A 10⁵-sample naive MC has the precision of a ~10⁴-sample LHS run, but takes 10× the wall time. More importantly, naive MC's random variability between independent runs is larger, which makes the convergence-diagnostic test (run twice, compare) less informative. NRC PRA tools default to LHS; analyst-built scripts in NumPy / Python often use naive sampling without realising. Specify the sampling strategy in the safety case methodology section.
- Reporting median when regulator wants mean. The 25% gap between mean and median for typical EF values is enough to flip a safety case from "passes" to "fails". Most regulatory standards (CSM-RA, NRC RG 1.174, ASME/ANS RA-Sa-2009) compare against the mean. IEC 61511 PFDavg is structurally a mean. Reporting the median to a mean-comparing regulator is itself a finding even when the calculation is otherwise correct. Carry both numbers in the safety case; let the regulator pick.
- Truncated MOCUS cuts interact with Monte Carlo. If the FTA used MOCUS with a probability cut-off (cf. Article 3), every truncated cut contributes zero to every Monte Carlo iteration's P(TOP) — uniformly biasing the band downward. The 95th percentile is also under-counted. The fix is either re-run MOCUS with a tighter truncation (10⁻¹⁵ rather than 10⁻¹²) or switch to BDD for the Monte Carlo backbone. Reviewers ask "how does your truncation affect the percentile estimates?"; the right answer cites the convergence study showing percentiles stable to within target precision below the chosen truncation level.
Where to go next
- Run Monte Carlo on your own tree. Open the lognormal Monte Carlo tool — browser-only, free, supports lognormal / uniform / triangular / beta per leaf, LHS sampling, 10⁵ default sample count, percentile + mean + variance reporting. Output is auditable; no data leaves the browser.
- For the CCF group setup that Monte Carlo respects, Article 5 covers the β-factor and MGL models. The common-factor MC sampling described in Step 2 of this article is the numerical implementation of those models.
- For importance bands under uncertainty, Article 2 covers the four measures. Step 3 of this article shows how each gets its own Monte Carlo band; the rank-stability question becomes a percentile-bandwidth question.
- For the regulatory context, the IEC 61025 reference page covers the standards-side overview; Article 8 covers what aviation reviewers expect to see in the percentile reporting; Article 9 covers rail conventions.