Post-Mortem Debugging - Debugging & Tracing

Core Dump Architecture

Post-Mortem Data Collection Pipeline

  Fault occurs at runtime (HardFault, ProtectionHook, assertion fail)
       │
       ▼
  Fault Handler (< 1ms):
  ├── Save register context to g_faultContext (SRAM, 128 bytes)
  ├── Save last 1 kB of stack to g_stackSnapshot (SRAM)
  ├── Append fault record to NVM ring buffer (DFlash, 64-byte entry)
  └── Trigger ECU reset (NVIC_SystemReset or WatchdogReset)
       │
       ▼
  Next power-on:
  ├── EcuM startup checks NVM fault ring buffer
  ├── Copies fault data to RAM diagnostic buffer
  └── Makes available via UDS 0x19 (ReadDTCInformation) + 0x22 DID 0xF1A0

  In field:
  ├── Workshop tool reads DID 0xF1A0: raw fault context + stack snapshot
  ├── Engineer loads snapshot in TRACE32 with matching ELF
  └── Reconstructs full call stack and variable state at fault time

NVM Fault Log Implementation

Cfault_log.c

/* NVM fault ring buffer: 16 entries × 64 bytes in Aurix DFlash */
#include "Fee.h"  /* Flash EEPROM Emulation */
#include 

#define FAULT_LOG_MAX_ENTRIES  16u
#define FAULT_LOG_ENTRY_SIZE   64u

typedef struct {
    uint32_t magic;          /* 0xFADE1234: valid entry marker */
    uint32_t fault_type;     /* HardFault=1, ProtectionHook=2, etc. */
    uint32_t pc_at_fault;    /* faulting instruction address */
    uint32_t lr_at_fault;    /* LR at fault (identifies calling context) */
    uint32_t cfsr;           /* Cortex-M CFSR */
    uint32_t bfar;           /* Bus Fault Address Register */
    uint32_t sp_at_fault;    /* SP at time of fault */
    uint32_t stm_timestamp;  /* Aurix STM0 tick at fault */
    uint32_t odometer_km;    /* vehicle odometer for field correlation */
    uint8_t  stack_snapshot[28]; /* partial stack at fault SP */
} FaultLogEntry_t;

static FaultLogEntry_t s_pendingEntry;

void FaultLog_SaveEntry(const FaultContext_t *ctx, uint32_t fault_type) {
    s_pendingEntry.magic         = 0xFADE1234u;
    s_pendingEntry.fault_type    = fault_type;
    s_pendingEntry.pc_at_fault   = ctx->pc;
    s_pendingEntry.lr_at_fault   = ctx->lr;
    s_pendingEntry.cfsr          = ctx->cfsr;
    s_pendingEntry.bfar          = ctx->bfar;
    s_pendingEntry.sp_at_fault   = ctx->sp_at_fault;
    s_pendingEntry.stm_timestamp = (uint32_t)MODULE_STM0.TIM0.U;
    /* stack_snapshot: copy 28 bytes from SP */
    memcpy(s_pendingEntry.stack_snapshot,
           (void *)ctx->sp_at_fault, sizeof(s_pendingEntry.stack_snapshot));
    /* Write to Fee (DFlash EEPROM emulation) on next NvM write cycle */
    NvM_WriteBlock(NVM_BLOCK_FAULT_LOG, &s_pendingEntry);
}

TRACE32 Post-Mortem Analysis from Core Dump

CMMpostmortem_analysis.cmm

// Load core dump file and reconstruct fault context
// Core dump = binary dump of RAM regions saved by fault handler

// Step 1: Normal TRACE32 connect but NO reset
SYStem.RESet
SYStem.CPU TC397
SYStem.JtagClock 10MHz
SYStem.Up
// Do NOT run SYStem.Reset — we want RAM contents from last run

// Step 2: Load matching ELF (same build as the crashed firmware)
Data.LOAD.Elf build/app_v3.2.elf /RELPATH /NOCODE
// /NOCODE: load symbols only; do not overwrite flash with ELF code sections

// Step 3: Restore register context from saved g_faultContext
Register.Set PC Var.VALUE(g_faultContext.pc)
Register.Set LR Var.VALUE(g_faultContext.lr)
Register.Set SP Var.VALUE(g_faultContext.sp_at_fault)
Register.Set R0 Var.VALUE(g_faultContext.r0)

// Step 4: Reconstruct call stack
Frame.view /Locals /Caller    // shows call chain using restored SP + DWARF info

// Step 5: Inspect variables at fault time
Var.View g_vehicleSpeed_mps g_faultContext

// Optionally: load RAM dump from file (if RAM was saved to binary file)
// Data.LOAD.Binary ram_dump.bin 0x70000000
// Then use Var.View / Frame.view as above

Summary

Post-mortem debugging is the primary fault analysis method for field issues where attaching a debugger at the time of failure is impossible. The architecture requires three components: a fault handler that captures register context + partial stack snapshot (< 1 ms overhead), a NVM ring buffer in DFlash (persists across resets), and a matching ELF file from the exact same build. With all three, TRACE32 can reconstruct the full call stack and variable state from a device that crashed days ago in a customer vehicle — no reproduction required.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.