Hardware Safety Metrics (SPFM, LFM, PMHF) - Functional Safety (ISO 26262)

ISO 26262 Part 5 Hardware Metrics Overview

Metric	Full Name	What It Measures	ASIL-D Target
SPFM	Single-Point Fault Metric	Fraction of single-point faults covered by safety mechanisms	≥ 99%
LFM	Latent Fault Metric	Fraction of latent faults covered by diagnostic or driver perception	≥ 90%
PMHF	Probabilistic Metric for Hardware Failures	Residual random hardware failure rate at item level	< 10 FIT (1×10⁻⁸/h)

Single-Point Fault Metric (SPFM)

SPFM Calculation

  SPFM = 1 - (Σ residual SPF rate) / (Σ total SPF rate)

  Where:
  - Single-point fault (SPF): fault in ONE element that directly causes safety violation
    (no redundancy; not a latent fault in a redundant channel)
  - Residual SPF rate = fault_rate × (1 - DC)
    DC = diagnostic coverage of the safety mechanism for this fault

  Example: EPS Torque Sensor (ASIL-D)

  Element             Failure Rate   DC      Residual
  Torque sensor Ch-A  50 FIT         97%     1.5 FIT    (range check)
  Torque sensor Ch-B  50 FIT         97%     1.5 FIT    (range check)
  ADC converter       20 FIT         95%     1.0 FIT    (ADC self-test)
  Safety monitor SW   10 FIT         99%     0.1 FIT    (monitor-of-monitor)
  MCU power supply    30 FIT         95%     1.5 FIT    (supply monitor)
  ──────────────────────────────────────────────────────
  Total SPF rate:     160 FIT        
  Residual SPF rate:  5.6 FIT

  SPFM = 1 - 5.6/160 = 96.5%  ← FAILS ASIL-D target (≥ 99%)
  → Must improve DC of torque sensor channels (target DC ≥ 99.3%)
    or add additional safety mechanism

Latent Fault Metric (LFM)

LFM Calculation

  LFM = 1 - (Σ residual LF rate) / (Σ total LF rate)

  Latent fault: fault in a redundant element (safety mechanism or second channel)
  that is not immediately detected; only causes safety violation when COMBINED
  with a second fault (the primary fault it was meant to protect against)

  Example: EPS dual-channel architecture

  Latent fault:              Rate  DC_latent   Residual_latent
  Monitor SW has silent bug   5 FIT  60% (periodic test)   2 FIT
  Secondary sensor dead      50 FIT  80% (periodic check)  10 FIT
  Watchdog disabled by bug   10 FIT  70% (WDG self-test)    3 FIT
  ──────────────────────────────────────────────────────────────
  Total LF rate:             65 FIT
  Residual LF rate:          15 FIT

  LFM = 1 - 15/65 = 76.9%  ← FAILS ASIL-D target (≥ 90%)
  → Must add more frequent diagnostic tests for latent faults
    (e.g., run periodic sensor plausibility check every 10 min)

  Key insight: latent faults require DIFFERENT diagnostic mechanisms than SPFs
  - SPF diagnostics: continuous runtime checks (CRC, range, watchdog)
  - Latent fault diagnostics: periodic tests that exercise the safety mechanism itself

PMHF Calculation

Pythonpmhf_calc.py

#!/usr/bin/env python3
# PMHF (Probabilistic Metric for Hardware Failures) calculation
# ISO 26262 Part 5 Annex C

# PMHF = sum of all residual failure rates contributing to safety violations
# Units: FIT = Failures In Time = 10^-9 per hour
# ASIL-D target: < 10 FIT (< 1e-8/h)

# FIT data from component FMEA + safety mechanism DC values

elements = [
    # name, rate_FIT, fault_type, DC_percent
    # fault_type: 'SPF'=single-point, 'MPF_R'=residual MPF, 'MPF_L'=latent MPF
    ("Torque sensor Ch-A",    50, "SPF",   97.0),
    ("Torque sensor Ch-B",    50, "SPF",   97.0),
    ("ADC converter",         20, "SPF",   95.0),
    ("MCU CPU core",          15, "MPF_R", 99.0),  # redundant with watchdog
    ("MCU power supply",      30, "SPF",   95.0),
    ("Safety monitor SW",     10, "MPF_L", 80.0),  # latent fault in monitor
    ("CAN transceiver",        8, "SPF",   90.0),
    ("Watchdog circuit",      12, "MPF_L", 70.0),  # latent fault
]

spf_total = 0; spf_residual = 0
mpf_r_total = 0; mpf_r_residual = 0
mpf_l_total = 0; mpf_l_residual = 0

for name, rate, ftype, dc in elements:
    residual = rate * (1 - dc / 100)
    if ftype == "SPF":
        spf_total += rate; spf_residual += residual
    elif ftype == "MPF_R":
        mpf_r_total += rate; mpf_r_residual += residual
    else:
        mpf_l_total += rate; mpf_l_residual += residual
    print(f"  {name:30s}  {rate:4.0f} FIT  DC={dc:4.1f}%  residual={residual:.2f} FIT")

# SPFM and LFM
all_residual_spf = spf_residual + mpf_r_residual
all_total_spf    = spf_total    + mpf_r_total
SPFM = 1 - all_residual_spf / all_total_spf if all_total_spf else 1
LFM  = 1 - mpf_l_residual / mpf_l_total     if mpf_l_total  else 1

# PMHF: residual SPF + residual MPF_R + residual MPF_L × (window factor)
EXPOSURE_HOURS = 4380  # 0.5 year exposure in hours
mpf_l_pmhf = mpf_l_residual * EXPOSURE_HOURS * 1e-9  # approximate window factor

PMHF_FIT = all_residual_spf + mpf_l_residual  # simplified (see ISO 26262 Annex C)

print(f"\nSPFM = {SPFM*100:.1f}%  (ASIL-D target: ≥ 99%)")
print(f"LFM  = {LFM*100:.1f}%  (ASIL-D target: ≥ 90%)")
print(f"PMHF = {PMHF_FIT:.2f} FIT  (ASIL-D target: < 10 FIT)")
if SPFM >= 0.99 and LFM >= 0.90 and PMHF_FIT < 10:
    print("✓ All ASIL-D hardware metric targets met")
else:
    print("✗ Hardware metric targets NOT met — design improvements required")

Summary

The three hardware metrics — SPFM, LFM, and PMHF — provide quantitative evidence that the hardware architecture provides sufficient protection against random failures. SPFM targets are met by high-coverage runtime diagnostics (range checks, CRC, watchdog). LFM targets are met by periodic tests that exercise the safety mechanism itself (the mechanism that is monitoring for primary faults must itself be tested periodically). PMHF gives the absolute residual failure rate: even with 99% SPFM and 90% LFM, the residual 1% and 10% of uncovered faults contribute to PMHF and must be shown to sum to below 10 FIT for ASIL-D.

🔬 SPFM, LFM, and PMHF — Calculation Methodology

ISO 26262 Part 5 defines three hardware safety metrics that must be calculated and met for each safety goal's hardware architecture. These quantitative metrics are mandatory for ASIL C and D, and recommended for ASIL A and B:

SPFM (Single Point Fault Metric): Fraction of hardware element failure rate covered by safety mechanisms, excluding latent faults. Formula: SPFM = 1 − (λ_SPF / λ_Total_relevant) where λ_SPF is the failure rate of all single-point faults (failures that directly cause a safety goal violation without any other fault). Target: ASIL D ≥ 99%, ASIL C ≥ 97%, ASIL B ≥ 90%.
LFM (Latent Fault Metric): Fraction of hardware element failure rate covered by safety mechanisms, considering latent faults (failures that don't immediately cause a hazard but may combine with another fault later). Formula: LFM = 1 − (λ_latent_not_covered / λ_Total_relevant). Target: ASIL D ≥ 90%, ASIL C ≥ 80%, ASIL B ≥ 60%. LFM is typically harder to achieve than SPFM because latent faults require diagnostic mechanisms that activate even when no fault has occurred yet.
PMHF (Probabilistic Metric for Hardware Failure): The residual failure rate per hour of the overall hardware architecture that can cause the safety goal violation. PMHF is essentially the residual risk quantification. Formula involves summing over all fault categories: single-point, residual, and latent faults weighted by their non-covered fractions. Target: ASIL D < 10⁻⁸/h (10 FIT), ASIL C < 10⁻⁷/h (100 FIT), ASIL B < 10⁻⁶/h (1000 FIT). (1 FIT = 10⁻⁹/h)
Failure rate data sources: Component failure rates (λ) are sourced from: SN 29500, MIL-HDBK-217, IEC 62380, or supplier-provided mission profiles. The choice of database significantly impacts PMHF — SN 29500 gives different values than IEC 62380 for the same component type. Document which database is used in the Hardware Safety Assessment.

🏭 Practical Metric Calculation Examples

ECU power supply supervisor: A voltage monitoring IC monitors the MCU supply voltage. Its failure rate (λ = 50 FIT from SN 29500) contributes to the safety architecture. If it fails open (no detection), a subsequent MCU voltage fault goes undetected — latent fault. If it generates a false alarm (MCU reset without fault), this is an SPF for availability goals. The safety mechanism that monitors the supervisor itself must be counted for LFM.
ASIL D MCU with hardware lockstep: Infineon TC397 has two cores in lockstep (compare outputs every cycle). λ_MCU = 1000 FIT. With lockstep, SPF fraction is dramatically reduced — most MCU single-point failures are detected by the lockstep comparator. SPFM contribution of the MCU alone may be >99.5%, easily meeting ASIL D target.
PMHF budget allocation: If a safety goal has ASIL D PMHF target of 10 FIT, and there are 5 hardware elements in the safety path, each element is typically budgeted 2 FIT of residual failure rate. This budget-driven approach guides hardware component selection during design.

⚠️ Hardware Safety Metric Pitfalls

Counting non-relevant failure modes in λ_Total: Not all failure modes of a component contribute to a safety goal. A CAN transceiver's 'transmit stuck dominant' failure may be irrelevant to a braking safety goal. Only failure modes that could contribute to the hazardous event should be in λ_Total_relevant. Over-counting makes metrics look worse than reality.
Claiming diagnostic coverage without evidence: A safety mechanism must actually detect the claimed failure mode to contribute to coverage. Claiming 99% diagnostic coverage for an ADC based on a plausibility check that only catches 60% of realistic failure modes is a safety assessment error. Coverage estimates require analysis evidence (FTA, FMEA, or testing).
Ignoring common-cause failures: Two MCU cores in lockstep from the same die can both fail due to a common cosmic ray event (single-event upset). Common-cause failure (CCF) analysis must show that the probability of both channels failing simultaneously due to a common cause is below the PMHF target.
Mission profile not matching operational use: Using an industrial temperature profile (−25°C to +85°C, 10 years) for an underbonnet ECU (−40°C to +125°C, 15 years) underestimates failure rates. IEC 62380 temperature acceleration factors are highly non-linear above 85°C.

📊 Industry Note

Most hardware safety assessments at Tier-1 suppliers are done using specialised tools: IEC 61508 Workbench, PTC Integrity Safety, or LDRA LDRA tool suite. The PMHF calculation spreadsheet can have thousands of rows for a complex ECU. Manual calculation errors are a common cause of safety assessment rework — always cross-validate results with an independent calculation.

🧠 Knowledge Check — Click each question to reveal the answer

❓ What is the difference between a Single-Point Fault and a Residual Fault in ISO 26262 hardware safety analysis?

✅ A Single-Point Fault (SPF) is a hardware fault that directly violates the safety goal with no other fault required — and there is NO safety mechanism covering it. A Residual Fault is also a hardware fault that directly violates the safety goal, but there IS a safety mechanism covering it — however the mechanism has imperfect diagnostic coverage (< 100%). The uncovered fraction of a covered fault becomes a residual fault. SPFs are the most dangerous — no protection at all. Residual faults reflect the limits of diagnostic coverage.

❓ Why is LFM typically harder to achieve than SPFM in an ASIL D design?

✅ SPFM considers immediate safety-relevant faults — the safety mechanism just needs to detect them before harm occurs. LFM considers latent faults — failures that don't directly cause a hazard but accumulate silently and may combine with another fault to cause one. To achieve high LFM, the design must include mechanisms that actively test for latent faults even when everything appears to be working normally. This requires diagnostic tests that run during normal operation (e.g., periodic RAM BIST, end-to-end CRC monitoring, watchdog testing) — more design effort and CPU overhead.

❓ An ASIL D safety goal has a PMHF target of 10 FIT. The hardware architecture has two components in a safety path: MCU (500 FIT base rate) and a sensor IC (200 FIT base rate). Is it feasible to meet the PMHF target, and what is required?

✅ Yes, it is feasible — but requires very high coverage. The combined base rate is 700 FIT. To achieve a residual PMHF of 10 FIT, the overall residual fraction must be ≤ 10/700 = 1.4%. This means safety mechanisms must cover ≥ 98.6% of all relevant failure modes. For the MCU, hardware lockstep + SW runtime monitors can achieve >99% coverage. For the sensor IC, analog monitoring (out-of-range plausibility, CRC on digital output) + end-to-end protection can achieve 95%+. The combined PMHF calculation must be formally documented with fault mode breakdown and coverage evidence for each component.