Statistical Evidence & Confidence Levels - SOTIF (ISO 21448)

Statistical Evidence in SOTIF

Why Statistics?

SOTIF residual risk evaluation requires claiming that the probability of hazardous behaviour is below an acceptable threshold. This claim cannot be made from a finite number of test scenarios alone -- it requires statistical inference. ISO 21448 Cl. 9 specifies that residual risk evaluation must be supported by evidence from both targeted testing (specific triggering condition scenarios) and statistical testing (sufficient exposure to support confidence claims).

The core statistical question: "How many test scenarios without failure are needed to claim that the failure probability is below P with confidence C?"

Confidence Level Calculation (Binomial)

Pythonsotif_statistics.py

"""Statistical confidence calculations for SOTIF residual risk."""
import math
from scipy import stats

def required_scenarios_no_failure(target_prob: float,
                                   confidence: float) -> int:
    """
    Calculate N scenarios needed with zero failures to claim
    failure probability <= target_prob at given confidence.
    Uses one-sided binomial confidence interval.
    Formula: N = ln(1 - confidence) / ln(1 - target_prob)
    """
    N = math.log(1 - confidence) / math.log(1 - target_prob)
    return math.ceil(N)

def confidence_after_n_no_failure(n_scenarios: int,
                                   target_prob: float) -> float:
    """
    Calculate achieved confidence after n scenarios with zero failures.
    C = 1 - (1 - target_prob)^n
    """
    return 1 - (1 - target_prob) ** n_scenarios

# AEB example: target P(miss) < 0.1% = 0.001 at 95% confidence
target_failure_prob = 0.001  # max 1 miss per 1000 scenarios
confidence_target   = 0.95

n_required = required_scenarios_no_failure(target_failure_prob,
                                            confidence_target)
print(f"Target: P(failure) < {target_failure_prob} at C={confidence_target}")
print(f"Required scenarios with zero failures: {n_required}")
print()

# How does confidence grow with more tests?
for n in [500, 1000, 2000, 3000]:
    c = confidence_after_n_no_failure(n, target_failure_prob)
    print(f"  {n:5d} scenarios: achieved confidence = {c:.3f}")

Combining Evidence Sources

Evidence Source	Weight	Role in Residual Risk Claim
Simulation (known scenarios)	High for known TCs	Covers all Q1/Q2 scenarios; quantitative pass/fail
Simulation (adversarial)	Medium (model fidelity limits)	Explores Q3/Q4; bounded by simulation accuracy
Closed-track testing	High (real sensors)	TC-specific validation; sensor model validation
Public road testing	High (real scenarios)	Statistical exposure; organic scenario discovery
Fleet data (post-launch)	Highest (real fleet)	Long-term statistical evidence; continuous monitoring

Summary

The binomial confidence calculation reveals why SOTIF evidence requires large test volumes: achieving 95% confidence that the failure probability is below 0.1% (1 per 1000 scenarios) requires nearly 3,000 scenarios with zero failures. For rare triggering conditions, this is only achievable through simulation -- generating 3,000 heavy-rain scenarios in real-world testing would require years of driving in the right weather conditions. The combination of simulation (for volume) and real-world testing (for sensor model validation) is the practical approach: simulation provides the statistical volume, while real-world testing validates that the simulation accurately models the sensor behaviour in the relevant conditions.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.