Debugging & Tracing

Overview

Debugging automotive ECUs requires specialized techniques and tools far beyond standard software debugging. Real-time constraints, limited observability, and safety-critical requirements make this a critical skill.

This course covers professional debugging workflows using Lauterbach TRACE32 and iSYSTEM - the two most widely used debugger platforms in the automotive industry - including hardware trace, runtime analysis, and AUTOSAR-aware debugging.

You'll learn to debug complex issues including stack overflows, timing violations, multi-core race conditions, and memory corruption - the problems that consume the most engineering time in real projects.

Course Modules

1

Debugging Fundamentals

5 chapters • 3.2 hrs reading

Debugger Architecture & JTAG/SWDFREE PREVIEW 40 min read

▸ JTAG (IEEE 1149.1) signal lines: TDI (Test Data In), TDO (Test Data Out), TCK (Test Clock, max 50 MHz for Aurix), TMS (Test Mode Select state machine controller), TRST (optional async reset); TAP (Test Access Port) state machine: 16 states driven by TMS - Reset → Run-Test/Idle → DR-Scan (capture/shift/update data registers) → IR-Scan (select instruction register); daisy-chain multiple ECUs on same JTAG bus using multi-ICE or Lauterbach PowerDebug with JTAG chain configuration

▸ SWD (Serial Wire Debug, ARM CoreSight): 2-wire interface replacing JTAG for ARM Cortex - SWDIO (bidirectional data) + SWCLK (clock); faster pin-count reduction (2 vs 4 pins); SWD packet format: 8-bit header (start bit, AP/DP access, R/W, A[2:3] address, parity, stop, park) + 32-bit data + 3-bit parity; SWDP (SWD port) connects to AHB-AP (Access Port) → accesses system memory-mapped resources and core debug registers (DCB, DWT, ITM)

▸ Debug Probe hardware types: Lauterbach TRACE32 PowerDebug (universal, supports JTAG/SWD/Nexus/MIPI, hardware trace capture, scripting API); iSYSTEM winIDEA (supports TC3xx MCDS, Nexus Class 4, up to 128 MB trace buffer); Segger J-Link (SWD/JTAG, ARM-only, RTT for non-halt logging, fast flash programming); PEEDI (multi-target, open JTAG protocol); probe connects via USB3 or GigE to host PC and via JTAG/SWD to ECU debug connector (typically 20-pin ARM Cortex or Infineon TC3xx JTAG-7 pin header on PCB)

▸ CoreSight components in ARM-based ECUs: ETM (Embedded Trace Macrocell) - instruction trace source; ITM (Instrumentation Trace Macrocell) - printf-style software trace via SWDIO; DWT (Data Watchpoint and Trace) - hardware data access watchpoints, PC sampling, cycle counter (DWT_CYCCNT register); TPIU (Trace Port Interface Unit) - serializes trace data to SWO pin (single-wire output, 1–80 Mbps) or full parallel trace port (4-bit TRACEDATA at up to 400 MHz); all components accessible via memory-mapped registers in CoreSight ROM table at 0xE00FF000

Breakpoints, Watchpoints & SteppingFREE PREVIEW 35 min read

▸ Hardware breakpoints vs software breakpoints: hardware BPs use dedicated FPB (Flash Patch and Breakpoint) unit (Cortex-M: 8 BPs typical) - zero overhead, work in ROM/Flash, set via FPB_COMP registers; software BPs overwrite instruction with BKPT opcode (Thumb: 0xBExx, 16-bit) - unlimited count but only work in RAM (not Flash in production), cause undefined behavior if hit in interrupt context without debugger; Infineon Aurix: up to 32 hardware BPs via BRKD (Breakpoint Debug) registers, supports conditional BPs based on core ID or data address range

▸ Watchpoints (data breakpoints): trigger halt when specific memory address is read or written (or both); ARM DWT_COMP/MASK/FUNCTION registers - DWT_FUNCTION bits [3:0]: 0b0101=read, 0b0110=write, 0b0111=read+write; mask register enables watching address ranges (MASK=3 → watch 8-byte-aligned region); TRACE32 syntax: Break.Set 0x20001000 /Write /Size 4 to halt when 4 bytes at address 0x20001000 are written; useful for detecting stack corruption - set watchpoint on last 4 bytes of task stack and break immediately when overwritten

▸ Stepping modes in TRACE32: Step (single instruction step, Step.Single); Go.Return (execute until current function returns - equivalent to step-out); Step.Over (execute through function call without entering it - useful to skip AUTOSAR OS scheduler calls); Step.Into (enter function call); Step.Out (execute until return from current function); for AUTOSAR multi-core: use CORE.select 0 to switch active core context between steps; non-intrusive stepping via ETM trace replay (Step.Emu) avoids halting real-time tasks

▸ Conditional breakpoints and triggers: TRACE32 PRACTICE script: Break.Set MyFunc /PROGRAM /IF (R0>0x100 && R1==0) - halts only when function entry condition met; reduces false stops for high-frequency ISRs; hit count BP: Break.Set ISR_Handler /PROGRAM /COUNT 100 - halts on 100th occurrence; complex trigger with Lauterbach PowerTrace: trigger when address 0x20001000 written with value 0xDEAD AND Core 1 is executing - hardware cross-trigger via CORESIGHT CTI (Cross Trigger Interface) linking Core0 DBG_HALT to Core1 trace start event

Memory View & Register Inspection 40 min read

▸ TRACE32 memory window commands: Data.dump (hex view), Data.List (disassembly), Var.View (C symbol browser using ELF + DWARF debug info); read-modify-write memory while running without halt using memory access via DAP: Data.Set 0x20001000 %Long 0xDEADBEEF - write 32-bit value to RAM; Data.Load.Elf firmware.elf /nocode loads DWARF symbols (function names, variable addresses, types) from ELF file for symbolic debugging; Var.Tab shows all global variables with current values, refreshed at configurable interval

▸ ARM Cortex register set inspection: General-purpose registers R0–R12 (data/address), R13/SP (stack pointer - MSP main stack, PSP process stack), R14/LR (link return address), R15/PC (program counter); Special: xPSR (Program Status Register - N/Z/C/V/Q flags + execution state bits), CONTROL (privilege level, stack select, FPU active), PRIMASK/FAULTMASK/BASEPRI (interrupt masking); TRACE32: Register.view shows all registers live; in hard fault: read LR value - if 0xFFFFFFF9 = from thread mode using MSP, 0xFFFFFFFD = from thread using PSP

▸ Aurix TC3xx register inspection: 16 general-purpose Data Registers (D0–D15) + 16 Address Registers (A0–A15); PSW (Program Status Word) - SAV/AV/SV/V flags + IO privilege level (0=supervisor, 1=user-0, 2=user-1); PCXI register (Previous Context Information) links to saved context chain in stack; SYSCON register controls memory protection system enable; in TRACE32: Register.view /core 0 shows Core 0 registers; R /core 1 A10 reads stack pointer of Core 1; compare A10 against configured stack base in Os.xml to determine stack fill level

▸ Live variable watch without halting: TRACE32 Var.Watch (,address) for non-stop memory read at configurable rate (100 ms default); AUTOSAR measurement via Lauterbach RAM Analyzer - periodically sample variables and log to .cmm file; compare with XCP: for production code use XCP (ASAM MCD-1 XCP) for non-intrusive variable read without debug probe; for safety monitors: watch OS stack patterns - AUTOSAR MemStack fills unused stack with 0xA5A5A5A5 pattern, search stack top to first non-0xA5 to measure peak stack usage: Data.Pattern 0xA5 searches for stack fill pattern boundary

Call Stack Analysis 30 min read

▸ ARM Cortex-M call stack structure: each stack frame (function call) pushes automatically on exception entry: xPSR, PC (return address), LR, R12, R3, R2, R1, R0 (8 registers × 4 bytes = 32 bytes minimum); FPU adds S0–S15 + FPSCR (additional 68 bytes) if lazy FP stacking enabled; manually-saved registers (R4–R11) pushed by callee prologue; TRACE32 Frame.view shows decoded call stack from current SP backward - each frame shows function name (from DWARF), return address, and local variables in scope

▸ Reading a crash call stack: at hard fault, read PSP (process stack pointer from CONTROL register bit 1 → PSP active in thread mode); memory at PSP+24 = saved PC (return address = instruction after the faulting call); PSP+20 = LR; PSP+28 = saved xPSR; use saved PC to identify faulting function in disassembly; LR value reveals context before fault (0xFFFFFFF9 = Handler mode, MSP; 0xFFFFFFFD = Thread mode, PSP); TRACE32: Frame.view /task shows stack for a specific OSEK/AUTOSAR task using saved stack pointer in TCB

▸ Stack unwinding for deeply nested calls: DWARF .debug_frame or .eh_frame section provides Call Frame Information (CFI) directives telling debugger how to restore CFA (Canonical Frame Address) and register values at each PC; without CFI (e.g., assembly ISRs or -fomit-frame-pointer optimized code): use heuristic unwinder - scan stack memory for values matching code segment addresses (PC-like patterns in 0x08000000–0x08080000 range for STM32) to reconstruct approximate call chain; TRACE32: Frame.view /EXT enables extended unwinding mode with CFI support

▸ AUTOSAR OS task context and call stack: AUTOSAR OS (OSEK) saves task context on task switch to TCB (Task Control Block); TRACE32 OS.view shows all tasks with status (RUNNING/READY/WAITING/SUSPENDED) and stack usage; click task → Frame.view shows task's frozen call stack at last suspension point; detect recursive call overflow: watch for task stack usage approaching OsStackSize limit (configurable in OsTask container in ARXML) - increase stack size in DaVinci Configurator or optimize call depth

Hands-On: First Debug Session 50 min read

▸ Hardware setup: connect Lauterbach TRACE32 PowerDebug USB3 to host PC via USB3; connect JTAG pod to ECU (Infineon Aurix TC397 evaluation board) 20-pin debug connector; launch TRACE32 PowerView; in PRACTICE startup script: SYStem.CPU TC397; SYStem.Mode Up - probe connects, halts all 6 cores; verify connection: Register.view shows PC at reset vector 0xA0000000; load ELF: Data.Load.Elf build/firmware.elf /nocode (load symbols only, keep flash content)

▸ Setting up first breakpoint: identify target function - e.g., App_EngineControl() in engine_control.c; in TRACE32: Break.Set App_EngineControl /PROGRAM - sets hardware BP on function entry; Go - resume all cores; wait for BP hit (green bar in Program.view turns red at break); TRACE32 halts all 6 cores simultaneously; Var.view local shows local variables including engine_speed_rpm and torque_demand_Nm with current values; Step into helper function, inspect, then Frame.view to see call chain (SchM → RTE → SWC runnable → App_EngineControl)

▸ Watchpoint exercise: identify suspected corrupt global: g_torque_output (type: uint16_t at address 0x70001234); Break.Set 0x70001234 /Write /Size 2 - hardware watchpoint on write; Go; on watchpoint hit: Frame.view shows which function wrote the value; Register.view shows R1 (write data = 0xFFFF, overflow indicator) - reveals that ISR RxHandler wrote unvalidated CAN signal directly into g_torque_output without bounds check; fix: add saturation clamp, retest to confirm watchpoint no longer triggers with invalid values

▸ Session automation with PRACTICE script: create init.cmm script: SYStem.CPU TC397 ; SYStem.Mode Up ; Data.Load.Elf firmware.elf /nocode ; Break.Set App_EngineControl /PROGRAM ; WinCLEAR ; Var.Watch g_torque_output g_engine_speed g_coolant_temp ; Go; run via t32rem.exe (command-line TRACE32 remote) for CI integration: t32rem.exe localhost port=20000 "CD.DO init.cmm" - enables automated debug session in Jenkins pipeline for regression testing with hardware-in-loop

2

Lauterbach TRACE32

5 chapters • 4.2 hrs reading

TRACE32 Setup & Configuration 45 min read

▸ TRACE32 installation and license: install TRACE32 PowerView (PRACTICE IDE + debugger GUI); license file t32.license placed in installation directory; license types - node-locked (hardware dongle), floating (license server via RLM); for CI: use headless mode via t32rem.exe (remote API) or TRACE32 API (C library/Python ctypes bindings) to script automated test sessions without GUI; configure config.t32 file: SCREEN=, HOST=localhost, PORT=20000, PACKLEN=1024, API=

▸ Startup CMM script structure: title "My ECU Debug" ; SYStem.CPU TC397 sets target CPU (selects register map, memory map, debug unit specifics) ; SYStem.JtagClock 10MHz (set JTAG clock - start slow, increase after connection stable) ; SYStem.Mode Up (power on debug subsystem and connect without reset) or SYStem.Mode Go (connect + run) ; TrOnChip.Cfg MultiCore - enables all 6 Aurix cores in TRACE32 view ; MAP.DenyAccess 0x0 0x7FFFFFFF (protect peripheral space from accidental writes during memory scan)

▸ Memory map configuration for Aurix TC397: configure TRACE32 MAP for correct access method - MAP.DenyAccess for peripheral SFRs (0xF0000000–0xFFFFFFFF) to prevent debugger accidental reads of write-clear status registers; MAP.BusAperture for local RAM (DSPR Core0 at 0x70000000, uncached mirror at 0xD0000000) - ensures debugger reads uncached copy; configure FLASH algorithm: FLASH.Create defines flash device parameters (sector size, erase command, timing); FLASH.TARGET specifies on-chip RAM address for flash programming algorithm download

▸ Multi-core configuration: TrOnChip.Cfg MultiCore ; CORE.select 0 → commands apply to Core 0; CORE.select 1 → commands apply to Core 1; SMP mode: all cores halted together via CTI (Cross Trigger Interface) - Break.Halt halts all simultaneously; AMP mode: each core debugged independently; TRACE32 supports simultaneous display of all 6 core program windows - useful for debugging cross-core interactions; PowerTrace hardware (separate trace hardware module) required for simultaneous 6-core ETM trace capture at >100 MHz

PowerView Scripting (PRACTICE) 50 min read

▸ PRACTICE script language syntax: subroutines with GOSUB/RETURN; variables declared with LOCAL &var; integer math with &val; string operations with STRing.Mid(), STRing.LENgth(); conditional branches: IF &condition GOTO label; loops: RePeaT count (GOTO label); command execution: PRACTICE embeds all TRACE32 commands directly - no escape syntax needed; script entry point: called by DO scriptname.cmm [arguments]; arguments received as &arg1, &arg2; debug scripts stored as .cmm files in project directory

▸ Automated test script pattern: PROC RunTest(testName) ; LOCAL &result ; Break.Set &testName /PROGRAM ; Go ; WAIT !STATE.RUN() 5.0s ; IF STATE.RUN() (PRINT "TIMEOUT: " &testName ; RETURN 1) ; Var.View local ; &result=Var.VALUE(g_test_result) ; Break.Delete ; IF &result==0 (PRINT "PASS: " &testName) ELSE (PRINT "FAIL: " &testName) ; RETURN &result ; ENDPROC - pattern for driving hardware tests from CI via TRACE32 remote API

▸ Memory dump and analysis scripts: Data.SAVE.Binary "coredump.bin" 0x70000000++0xFFFF - save 64KB of Core 0 DSPR to binary file; Data.SAVE.S3Record "flash.s3" 0x80000000++0x1FFFFF - save 2MB flash to Motorola S-record; PRACTICE reads ELF symbol table: &addr=ADDRESS.OFFSET(myVariable) - gets runtime address from DWARF; loop to log variable values to CSV: OPEN #1 "log.csv" /Create ; RePeaT 1000 (WRITE #1 %d. &val WAIT 10ms) ; CLOSE #1; post-process CSV in Python for timing analysis

▸ TRACE32 Python API (alternative to PRACTICE): import t32 ; t32.connect("localhost", 20000) ; t32.cmd("SYStem.CPU TC397") ; t32.cmd("SYStem.Mode Up") ; t32.write_var("g_enable_flag", 1) - write variable by name ; value = t32.read_var("g_engine_speed_rpm") ; breakpoints: t32.cmd("Break.Set App_EngineControl /PROGRAM") ; t32.cmd("Go") ; t32.wait_halt(timeout=5.0) - returns when BP hit or timeout; register read: t32.read_register("PC") ; enables full test orchestration from pytest framework with TRACE32 hardware backend

Flash Programming via TRACE32 40 min read

▸ TRACE32 flash programming sequence: FLASH.Create defines flash device (address range, sector size, bus width, flash type: 1=Intel, 2=AMD, 3=custom algorithm); FLASH.TARGET 0x70000000 0x70002000 0x1000 ~~/demo/tricore/flash/tc3xx.bin - loads flash programming routine into DSPR (target SRAM) and allocates 4KB stack; FLASH.Erase 0x80000000--0x8001FFFF - erase sectors; FLASH.Program build/firmware.hex - program; FLASH.CRC32 0x80000000 0x8001FFFF compares CRC against host-computed value to verify integrity

▸ Aurix TC3xx flash specifics: Program Flash (PFlash): 6 banks × 2 MB each (PF0–PF5) at 0x80000000–0x80BFFFFF; Data Flash (DFlash0, DFlash1): EEPROM emulation at 0xAF000000; UCB (User Configuration Block) at 0xAF100000 - UCB_BMHD for boot mode header (Startup Software reads BMI bits to determine boot source); FLASH.Erase PF0 - erases entire PFlash bank 0 (2 MB, ~1.5 s); write speed ~500 KB/s via JTAG @50 MHz; use FLASH.AUTO for intelligent program+verify skipping matching sectors

▸ Calibration data partition flashing: separate CALIB partition at end of PFlash (e.g., PF5) holds calibration maps from INCA/CDM dataset; flash calibration only (without reflashing application): FLASH.Program /noclear calib_dataset.hex - programs only calibration sector without erasing application; DAM (Data Area Map) in AUTOSAR maps CHARACTERISTIC addresses into DFlash for EEPROM-backed calibration; TRACE32 can update individual calibration parameters: Data.Set 0xAF000100 %Long 0x00001F40 - writes new MAP value directly to DFlash

▸ Boot mode headers and recovery flashing: if UCB corrupted (bad BMI → ECU won't boot), Aurix enters Startup Software Loader (SWDL) mode via HWCFG pins (bootstrap loader via CAN/FlexRay/Ethernet); alternatively TRACE32 can access PFlash via JTAG even when CPU is held in reset: SYStem.Mode Attach (no reset) → flash UCB_BMHD with valid BMI → reset to recover; failsafe: Aurix read/write protection via UCB_RSTA/UCB_WSTA - if write protection locked without matching password, must do destructive erase via UCB_RCFG destructive write to reset protection

OS-Aware Debugging (AUTOSAR/OSEK) 55 min read

▸ TRACE32 OS awareness plugins: TRACE32 ships OSEK/AUTOSAR OS awareness scripts (menu.cmm + os_aware.cmm) - load via DO ~~/demo/tricore/autosar/os_aware.cmm; plugin reads RTOS kernel internals from target memory (task list base address, TCB structure offsets) - these must match the exact AUTOSAR OS vendor and version (Vector MICROSAR, EB tresos); configure OS.Awareness.ListSize for number of tasks; TRACE32 then provides: OS.TASK (task list), OS.ALARM (alarm status), OS.EVENT (event flags), OS.RESOURCE (resource ownership), OS.COUNTER (counter values)

▸ AUTOSAR task debugging workflow: in OS.TASK view: see all tasks (name, state READY/RUNNING/SUSPENDED/WAITING, priority, activation count, preemption count, stack fill %); click RUNNING task → Frame.view shows current call stack at breakpoint; click SUSPENDED task → Frame.view shows stack at last suspension point (useful for deadlock analysis - task suspended waiting for resource while holding another); OS.RESOURCE shows resource-holding task → identify priority inversion (low-priority task holds resource needed by high-priority task)

▸ Timing hook instrumentation: AUTOSAR OS OsHook_TaskActivateTaskUser / OsHook_TaskStartUser / OsHook_TaskTerminateUser hooks - implement these in user code to toggle GPIO pin or write timestamp to circular buffer; toggle GPIO on task start/end → observe on oscilloscope or logic analyzer for actual task timing; TRACE32 AUTOSAR awareness: OS.TIMINGAnalyzer records OS hook timestamps (via ETM trace or breakpoint-based sampling) and displays task Gantt chart showing preemption, blocking, and CPU utilization per task

▸ AUTOSAR OS error debugging: ErrorHook(StatusType Error) - AUTOSAR OS calls ErrorHook for any OS API error (E_OS_RESOURCE, E_OS_LIMIT, E_OS_CALLEVEL); set breakpoint in ErrorHook implementation; on hit: read OSErrorGetServiceId() for which API was called, OSError_ActivateTask_TaskID() for task ID; common errors: E_OS_LIMIT (task activated beyond MaxActivations) → increase MaxActivations in ARXML or find runaway activation trigger; E_OS_CALLEVEL (OS API called from wrong context, e.g., WaitEvent() from ISR) → trace call stack to find ISR calling sleeping API

Hands-On: AUTOSAR Stack Debugging 60 min read

▸ Scenario setup: AUTOSAR-based application on Aurix TC397 with 4 tasks (10ms engine control, 20ms body, 50ms diagnosis, 100ms calibration); intentionally introduced bug: Task_Engine reads CAN signal before Com_MainFunctionRx() runs → stale value used; load firmware, load OSEK awareness plugin, connect TRACE32; open OS.TASK - confirm all 4 tasks visible and switching correctly at expected rates

▸ Bug hunt phase 1 - trace RTE signal call: set breakpoint in Task_Engine at RTE call Rte_Read_EngineSpeed(&speed); Go; on hit: Var.view shows speed = 0 (stale initial value); check if Com_MainFunctionRx already ran by inspecting ComRxCounter global (timestamp of last Com_MainFunctionRx call); OS.TIMINGAnalyzer shows Task_Engine (priority 10) runs before Task_ComMain (priority 5) - lower priority task not yet scheduled; fix: increase Com_MainFunctionRx task priority or move it to same task as engine control

▸ Bug hunt phase 2 - NVM write corruption: intermittent issue where NVM_WriteAll() corrupts adjacent data block; set data watchpoint on NVM block boundary address (0x70003FFC, 4 bytes); run 10 ignition cycles; watchpoint hits on cycle 7 - Frame.view shows NVM_WriteBlock() writing 6 bytes to a 4-byte block (length parameter off-by-one from mismatched struct size between caller and NVM driver config); fix: update NVM block size in NvM block descriptor ARXML to match actual data type sizeof()

▸ Results documentation: capture screenshots of OS.TASK timing view (show task preemption pattern), Frame.view at each bug location (show call stack), Var.view showing before/after variable values; export trace to TRACE32 PDF report: Report.Create "debug_session.pdf" - generates timestamped session report; file in Jira ticket with root cause, fix description, and verification evidence (watchpoint no longer triggers after fix, data integrity confirmed over 50 ignition cycles)

3

Runtime Trace & Analysis

5 chapters • 3.8 hrs reading

Hardware Trace (ETM, MCDS) 50 min read

▸ ARM ETM (Embedded Trace Macrocell) architecture: ETM generates compressed instruction trace packets (P-header packets for direct branch addresses, I-sync periodically for absolute PC anchors, context ID for RTOS task tracking); trace buffer options: ETB (Embedded Trace Buffer, on-chip SRAM, typ. 4–32 KB = ~1M instructions at 1 bit/instruction compressed) vs. off-chip via TPIU → external capture hardware (Lauterbach PowerTrace III, 256 MB DDR buffer); ETM configuration: ETM_CTRL register enables branch broadcast, data trace, cycle counting; enable via TRACE32: Trace.Method CortexM /ETM ; Trace.Size 0x100000 ; Trace.ON

▸ Infineon Aurix MCDS (Multi-Core Debug Solution): Aurix-specific trace hardware capturing instruction trace from all 6 TriCore cores simultaneously at full CPU speed (300 MHz); MCDS FIFO outputs trace data via 4-bit DDR MCDS port at up to 1.2 Gbps to Lauterbach PowerTrace; MCDS_TC registers configure per-core trace enable, address range filters (trace only specific code regions), trigger conditions (trace start/stop on program counter value); filter example: trace only scheduler function 0x80012000–0x80012FFF to reduce trace volume while capturing all scheduling decisions

▸ Instruction trace post-processing in TRACE32: after capture, Trace.List shows instruction execution trace in time order (address, instruction, cycles, branch targets); Trace.STATISTIC.FUNC shows per-function execution count and CPU cycles (identifies hot functions) - sort by Total_Cycles descending to find optimization targets; Trace.Chart.FUNC displays Gantt-like timeline showing which function executed when; instruction trace replay: Step.Emu steps backwards through recorded trace without running target - enables deterministic debugging of intermittent bugs captured in trace buffer

▸ Data trace with ETM/MCDS: DWT data trace (ARM): DWT_COMPn configures address to trace; DWT_FUNCTIONn sets trace type (data read, write, or both) + size; trace records contain address, data value, and timestamp; MCDS data trace: configure MCDS_WP (watchpoint) module to generate trace event on memory access to range [0x70001000–0x70001FFF]; post-process to reconstruct variable value over time without halting; identify write sequence causing corruption - full address + data + timestamp timeline in Trace.List for data accesses

Runtime Measurement & Profiling 45 min read

▸ DWT cycle counter for runtime measurement (ARM Cortex-M): enable DWT_CTRL.CYCCNTENA = 1 → DWT_CYCCNT increments every CPU clock cycle (200 MHz = 5 ns resolution); measurement pattern: uint32_t t_start = DWT->CYCCNT; /* function */ uint32_t elapsed = DWT->CYCCNT - t_start; uint32_t elapsed_ns = elapsed * 5; - measures execution time in nanoseconds; use for AUTOSAR task WCET measurement; compare against AUTOSAR OsTask timing monitoring (OsTaskTimingMonitor) budget to verify no budget exceeded

▸ Aurix performance counters (CCTRL/CCNTL): TC3xx Core Counter: CCTRL.CE=1 enables counting; CCNTL[0] counts CPU clock cycles; CCNTL[1] configurable to count pipeline stalls, cache misses, branch mispredictions; read via TriCore machine instructions MTCR/MFCR; use iSYSTEM MCDS performance monitor: configure MCDS_TC.PERF to count events (instruction stalls, data cache miss rate); display in winIDEA ProView as live bar graph; target: data cache miss rate < 5% for real-time predictability

▸ TRACE32 profiling: Perf.STATISTIC.FUNC - statistical profiling via periodic PC sampling (100 µs sample interval, no ETM required); after 1 million samples: sort by Sample_% descending; functions with >5% CPU reveal algorithmic hotspots; alternative: ETM-based exact profiling (Trace.STATISTIC.FUNC) - cycle-accurate but requires ETM trace hardware; compare statistical vs exact results to identify functions where sampling misrepresents due to caching effects; export profiling data to CSV for comparison across software versions

▸ Task CPU load measurement and budget analysis: AUTOSAR OsResource CPU load: each task reports its actual execution time via OsHooks (OsHook_TaskStart increments cycle counter, OsHook_TaskTerminate records delta); CPU load formula: Load_% = (Sum_task_exec_time / scheduling_period) × 100; for 10ms task running 800µs each cycle: Load = 8%; total CPU budget: AUTOSAR Multi-core - Core 0: 70% max for safety tasks, 30% for BSW overhead; alert threshold: any task exceeding 90% of its timing budget triggers DEM event (DemEventParameter mapped to Task_Budget_Exceeded)

Task Timing Analysis & Visualization 40 min read

▸ Task Gantt chart in TRACE32: Trace.TASK.Chart displays horizontal bar chart - X axis = time (µs), Y axis = each AUTOSAR task; colors: green=RUNNING, yellow=READY (preempted), gray=WAITING; zoom to µs level to see preemption points; identify: Task_Engine preempted by ISR_CAN at t=345µs for 12µs - too long, increases Task_Engine jitter; ISR execution time > 10µs is flag for investigation; identify: Task_Diag takes 2.1ms (budget: 2.0ms) → 5% budget overrun flagged

▸ OSEK/AUTOSAR response time analysis: Worst-Case Response Time (WCRT) = WCET_self + sum of blocking times from higher-priority tasks/ISRs; calculate WCRT for Task_Engine (priority 5, period 10ms): WCET_Task_Engine = 800µs + blocking from ISR_CAN (priority INF, WCET_ISR = 15µs × max activations in 10ms window = 15µs × 10 = 150µs) + resource blocking from Resource_CAN_Lock held by Task_Body (WCET_critical_section = 20µs); WCRT = 800 + 150 + 20 = 970µs < 10000µs → schedulable with margin

▸ iSYSTEM TAST (Task Analysis & Statistics Tool): alternative to TRACE32 for timing visualization on Aurix MCDS; TAST displays task switch timeline with 1-cycle resolution (5 ns at 200 MHz) from MCDS hardware trace; per-task statistics: min/max/avg execution time, activation jitter, response time across 1000 task cycles; export to Excel for WCET documentation in software architecture specification; TAST integrates with AUTOSAR OS timing hook output for software-only measurement fallback when MCDS hardware not available

▸ Jitter analysis and root cause: activation jitter = variation in time between successive task activations (ideal: exactly 10.000 ms for 10ms task); measure jitter from ETM trace - timestamp each Task_Engine activation, compute inter-arrival time histogram; sources of jitter: (1) ISR preemption of scheduler (add interrupt lock around scheduler wake-up), (2) OS timer resolution (Aurix STM resolution = 1/200MHz = 5ns → negligible), (3) bus contention on Aurix LMB (Local Memory Bus) between cores; for jitter > 50µs: AUTOSAR OS TimingMonitorJitter parameter triggers DEM event for scheduler health monitoring

Cache & Pipeline Analysis 35 min read

▸ Aurix TC3xx cache architecture: each TriCore core has: PCACHE (Program Cache, 16 KB, 4-way set-associative, 32-byte cache lines) + DCACHE (Data Cache, 8 KB for Core 0/1/2, disabled by default for safety - DCEN bit in CCTRL); PCACHE miss penalty: 4–12 cycles for flash access (PFlash read latency 1–3 wait states at 200 MHz with buffered flash); cache miss analysis via MCDS performance counter: configure MCDS_TC.PERF bit 3 = Program Cache Miss event → CCNTL[1] counts misses; if miss rate >10% → investigate function locality and instruction density

▸ TriCore pipeline analysis: 5-stage pipeline (Fetch, Decode, Execute, Memory, Write-back); pipeline stall sources: (1) data hazard - result of instruction N needed by instruction N+1 (2-cycle stall if no bypass); (2) control hazard - branch taken, pipeline flushed (4-cycle penalty for unconditional branch, 2-cycle for conditional mispredicted); (3) memory hazard - uncached DSPR access completes in 1 cycle, cached DFlash in 1–5 cycles, uncached LMU (Local Memory Unit) in 4 cycles; Aurix compiler hint: __builtin_expect(condition, 0) gives branch predictor hint to reduce mispredictions in ISR code

▸ Instruction cache optimization techniques: function placement using linker script - place ISRs and frequently-called functions in PFLASH_NEAR region (first 64KB of PFlash, highest cache locality); use #pragma section ".fast_code" for hot functions → linker puts them in specific flash sectors; verify with MAP file that critical functions don't overlap cache-conflicting addresses (cache set aliasing: two functions at addresses mod(cache_size/ways) compete for same cache set, causing excessive evictions); TRACE32: analyze cache set occupancy using Trace.STATISTIC to identify set conflicts

▸ Data cache safety trade-off (safety vs performance): AUTOSAR functional safety (ISO 26262 ASIL-D): data cache disabled by default for deterministic WCET - cache miss in safety-critical path could violate timing guarantees; for ASIL-B and below: enable DCACHE with cache flush before critical reads from shared memory regions (DSYNC instruction); cache coherency in multi-core: Aurix has no hardware coherency between core DCACHEs for DSPR - software must ensure cache invalidation (DSYNC + ISYNC) after cross-core memory writes; never enable data cache on global shared RAM (LMURAM) used by multiple cores

Hands-On: Timing Violation Investigation 55 min read

▸ Symptom: DEM event DemEventParam_Task10ms_BudgetExceeded fires occasionally; OsTaskTimingMonitor detects Task_Engine executing >10ms on 2% of cycles; reproduce by connecting TRACE32 PowerTrace with MCDS, enabling MCDS full trace, running 100 ignition cycles; filter Trace.TASK.Chart to show only Task_Engine execution; identify the outlier cycles where execution exceeds 10ms boundary

▸ Root cause identification: zoom MCDS trace to outlier cycle - Task_Engine takes 11.3ms; drill into per-function breakdown using Trace.STATISTIC.FUNC filtered to that time window; top offender: Rte_Read_BatteryCurrent() takes 1.2ms (normal: 0.05ms) - anomalous spike; check what Rte_Read_BatteryCurrent calls: traces into ADC driver which calls HAL_ADC_PollForConversion() with 1200ms timeout waiting for ADC to complete - polling function unexpectedly took 1.1ms extra due to ADC conversion busy from a previous diagnostic request from Task_Diag

▸ Fix and verification: replace polling HAL_ADC_PollForConversion() with interrupt-driven ADC conversion - ADC ISR sets flag, Rte_Read returns buffered result from last completed conversion in 1µs instead of polling; rerun MCDS trace for 200 cycles; Task_Engine timing: max 0.92ms, min 0.78ms, mean 0.85ms - all within 10ms budget with 9.1ms margin; DEM event DemEventParam_Task10ms_BudgetExceeded no longer fires; document fix and verification trace screenshots in engineering change request (ECR) system

▸ Preventive measures: enforce ADC access serialization via AUTOSAR Resource mechanism (OsResource_ADC_Access) - Task_Engine acquires resource before ADC read, Task_Diag waits; configure AUTOSAR OsResource ceiling priority = max(Task_Engine, Task_Diag) priority to prevent priority inversion; add WatchdogManager window monitoring (WdgM_CheckpointReached) at start and end of Task_Engine to detect timing violations at system level independently of TRACE32; set WdgM threshold to 9.5ms as early warning before OS timing monitor fires at 10ms

4

Multi-Core Debugging

4 chapters • 3.2 hrs reading

Symmetric & Asymmetric Multi-Core Debug 45 min read

▸ Aurix TC397 multi-core topology: 6 TriCore cores (TC0–TC5) in 3 groups - (1) 3× Lock-Step pairs (TC0+TC1, TC2+TC3, TC4+TC5) for ASIL-D safety with hardware comparison; (2) each pair acts as one logical core from software perspective; (3) each core has private DSPR (Data Scratch Pad RAM), PSPR (Program SPR), and shared LMURAM (Local Memory Unit RAM) for inter-core communication; memory map: Core0 DSPR at 0x70000000, Core1 DSPR at 0x60000000, Core2 DSPR at 0x50000000; LMURAM shared at 0x90000000 (cacheable) / 0x10000000 (non-cached)

▸ TRACE32 SMP vs AMP debugging mode: SMP (Symmetric Multi-Processing) mode: all cores halted simultaneously when any core hits breakpoint via CTI cross-halt; useful for freeze-frame analysis of cross-core state at exact same instant; AMP (Asymmetric) mode: each core debugged independently - Core0 can run while Core1 is halted; TRACE32 command: CORE.LOCK on (SMP), CORE.LOCK off (AMP); for AUTOSAR multi-core BSW with core-affinity (QM on Core2, ASIL tasks on Core0/1): use AMP to debug QM code while safety cores run continuously

▸ Lock-step core debug: in lock-step, Core1 mirrors Core0 exactly - executes same instructions with same results; hardware comparator checks outputs every cycle, triggers TRAP on mismatch; debug: only Core0 visible in TRACE32 (Core1 is invisible shadow); to test lock-step fault injection: iSYSTEM CompuLab lock-step test tool injects single-bit error into Core1 output - verify hardware TRAP fires within 1 cycle; disable lock-step for standalone Core1 debug (testing non-locked operational mode) via LBIST configuration register LBIST_CTRL.LS_DISABLE

▸ AUTOSAR multi-core OS debug: AUTOSAR multi-core OS assigns each task to a core via OsTask.OsTaskCoreAffinity parameter; Core0 runs OS (Master Core), Core1 and Core2 run as Slave cores initialized by Master; debug Slave core startup: in TRACE32, CORE.select 1 → set BP at start of _start() slave entry function → resume Core0 only (Slave boots when Master calls StartCore()); common issue: Slave core starts before Global Shared RAM initialized by Master → data race at startup; fix: use OsSpinlock on shared init flag

Cross-Core Breakpoints & Sync 40 min read

▸ CTI (CoreSight Cross Trigger Interface) for synchronized halt: each ARM Cortex core has CTI input/output channels; TRACE32 automatically configures CTI for SMP halt - when Core0 hits BP: CTI_TRIGGER[0] fires → CTI cross-triggers Core1 and Core2 halt via their CTI_APPHALT inputs; result: all 3 cores halt within 1 clock cycle of each other; TRACE32 command: Break.Set MyFunc /PROGRAM /SYNCH - sets synchronized cross-core BP that halts all cores when any one hits it; use to capture precise cross-core state snapshot

▸ Aurix MCDS cross-core trigger: MCDS trigger network connects all 6 cores; configure MCDS_TCTRIG (trigger network) to propagate trace trigger from Core0 event (e.g., specific address executed) to all cores → all 6 cores simultaneously stop trace recording (post-trigger capture); enables capturing the exact system state (all cores) at the moment of cross-core bug manifestation without any core running ahead; TRACE32 MCDS trigger config: TrOnChip.AutoArm 0x80012000 - arms trigger when Core0 executes address 0x80012000, then halts all cores

▸ Spinlock debugging in Aurix: Aurix hardware spinlock via SEMA4 (Semaphore) module - 32 semaphore registers; CPU0 acquires: lda lock_addr, r0; cmpxchg 0, 1, r0 (atomic compare-exchange, AUTOSAR Os_GetSpinlock uses this); debug spinlock deadlock: TRACE32 SMP halt (all 3 cores frozen) → Var.view Spinlock_MyLock - shows lock value (1 = taken) and owner core ID; check which core holds lock (SEMA4_SEM0 register bits [2:1] = granting core ID) and what that core is doing in Frame.view - identifies which task holds lock indefinitely

▸ Cross-core event synchronization debug: AUTOSAR multi-core task communication via Inter-OS-Application signals (IOC - Inter-OS-Application Communicator) or RTE port connections; debug IOC write latency: set BP on IOC_Write() on sender core, set second BP on IOC_Read() on receiver core; use TRACE32 SMP halt - both BPs armed simultaneously; measure time between BPs using ETM timestamps: sender writes at t=0, receiver reads at t=47µs - latency caused by cross-core cache flush (DSYNC) taking 40µs for 4KB region; optimize by reducing IOC data size or pre-flushing cache before write

Shared Memory & Race Condition Detection 50 min read

▸ Race condition definition in multi-core embedded: two cores read-modify-write the same memory variable without synchronization - result depends on execution order (undefined behavior per C11 §5.1.2.4); detection method 1: MCDS dual-core watchpoint - configure MCDS_WP0 for Core0 write to shared var address + MCDS_WP1 for Core1 write/read to same address; trigger cross-core capture if both events occur within 100-cycle window - indicates concurrent access race; TRACE32 records which instruction on which core accessed address at which timestamp

▸ Helgrind-style race detection for embedded: compile with ThreadSanitizer (-fsanitize=thread, for host-based unit testing with POSIX threads mimicking multi-core behavior); TSan detects: simultaneous access without lock, lock order violations (Core0 locks A then B; Core1 locks B then A → deadlock risk); for on-target race detection without TSan: add exclusive access guards using AUTOSAR SchM_Enter__() around all shared variable accesses - SchM enforces interrupt-level locking per AUTOSAR Schm spec, preventing race condition in same-core ISR/task context

▸ AUTOSAR IOC race detection: IOC (Inter-OS-Application Communicator) is the standard mechanism for cross-core data sharing - provides atomic copy semantics (AUTOSAR SWS_Os IOC_Read/IOC_Write use OS-specific atomic mechanisms); improper usage: application bypasses IOC and directly accesses shared RAM pointer → race; detect: MCDS watchpoint on shared RAM region, trigger if access occurs outside IOC function address range (i.e., direct access); log all violating PC addresses; most common cause: developer uses memcpy() on cross-core buffer thinking it's equivalent to IOC

▸ Fix pattern for shared memory race: AUTOSAR Multi-Core shared variable protection options: (1) OsSpinlock - fast (3–5 cycles acquisition), CPU spinning; use for critical sections < 1µs; (2) OsSuspendAllInterrupts() - suspends all ISRs on calling core only (not inter-core protection); (3) IOC queued - sends data via OS-managed queue with guaranteed atomicity; (4) double-buffer with generation counter: writer increments counter, writes data, increments counter again; reader retries read if generation count changes mid-read (non-blocking, zero overhead when no contention); verify fix by re-running MCDS watchpoint test with 10,000 iterations - zero concurrent access events required

Hands-On: Multi-Core Issue Resolution 60 min read

▸ Scenario: Aurix TC397 3-core AUTOSAR application - intermittent corruption of g_VehicleSpeed_kph (uint16_t in LMURAM at 0x90000100); Core0 (safety task) reads it every 10ms; Core2 (BSW Com task) writes it on CAN receive ISR; issue appears only under high CAN bus load (>80%); reproduce: connect TRACE32 with MCDS, set dual-core watchpoint on 0x90000100 (Core0 read + Core2 write in same window), run CAN load generator at 90% utilization on Vector VN1610 interface

▸ Investigation: MCDS captures concurrent access - Core2 ISR_CAN_Rx writes 0x90000100 at t=0µs; Core0 Task_Safety reads 0x90000100 at t=0.5µs (during write); write is 2 bytes (MOVH.A + ST.W sequence in TriCore - not atomic for 32-bit even though uint16_t); Core0 reads partial value: high byte from new value, low byte from old value → corrupt 0x0050 (normal 80 km/h) + 0x0100 (new 256 km/h high byte) = 0x0180 = 384 km/h (impossible value triggers safety monitor); root cause: unprotected cross-core access

▸ Fix implementation: wrap Core2 write and Core0 read with AUTOSAR SpinlockType Os_Spinlock_VehicleSpeed; in Core2 ISR: GetSpinlock(Os_Spinlock_VehicleSpeed); g_VehicleSpeed_kph = new_val; ReleaseSpinlock(Os_Spinlock_VehicleSpeed); in Core0 task: GetSpinlock(Os_Spinlock_VehicleSpeed); local_speed = g_VehicleSpeed_kph; ReleaseSpinlock(Os_Spinlock_VehicleSpeed); spinlock acquisition time = 3 cycles on Aurix (uncontended) - negligible overhead; alternatively use IOC: define IoC_VehicleSpeed queue in ARXML, use Ioc_Write in Core2 and Ioc_Read in Core0 - fully OS-managed

▸ Verification: re-run 1-hour CAN load test with MCDS watchpoint; zero concurrent access events detected (spinlock prevents simultaneous access); Core0 safety monitor no longer fires spurious speed violation DTCs; measure spinlock contention rate from MCDS trace - average contention < 0.1% (spinlock acquisition blocked < 1 in 1000 attempts) → acceptable overhead; document resolution in Jira: root cause (non-atomic multi-core shared variable access), fix (OsSpinlock), verification (MCDS trace zero-violation evidence + 1000-cycle stress test pass)

5

Advanced Debugging Techniques

5 chapters • 3.8 hrs reading

Stack Overflow & Memory Corruption 45 min read

▸ Stack overflow detection mechanisms: (1) Stack fill pattern: AUTOSAR OS fills task stack with 0xA5A5A5A5 at startup; scan from stack base upward for first non-0xA5 word to measure high-water mark - TRACE32: Data.Find 0xA5A5A5A5 0x70005000 0x70005FFF finds pattern boundary for task stack at 0x70005000; (2) MPU guard page: configure Aurix Memory Protection Unit (MPU) with zero-access region at stack bottom - any stack overflow into guard page triggers MPU trap (Class 1 TRAP, TIN=1 Protection Error) → jump to known exception handler; (3) Hardware stack pointer monitor: Aurix A10 address register = current stack pointer, MTCR A10 checked against limit in OS timer ISR

▸ Memory corruption detection techniques: guard word check: insert 32-bit magic word (0xDEADBEEF) before and after heap-allocated structures; periodically verify guards intact; on corruption: which guard changed → identify overflowing structure; AddressSanitizer shadow memory (host-based): 1 shadow byte per 8 application bytes tracks allocation state; on out-of-bounds access: ASan reports exact violating address, access type (read/write), size, and allocation/deallocation call stacks; for embedded without ASan: AUTOSAR MemProtection + MPU region configuration per application OS-application protection domain - cross-application memory write triggers MPU trap

▸ Root cause workflow for stack overflow: after MPU TRAP in ErrorHook: read A10 (SP) value - if SP < OsStackBase address for the violating task, overflow confirmed; read Task ID from Os_CurrentTask pointer; check OsTask ARXML: OsStacksize = 0x800 (2KB) - too small; enable OS.TASK view to see stack usage % (TRACE32 reads fill pattern); identify deepest call: from ETM trace find maximum frame.view depth during task execution; common cause: recursion or large local array on stack (local uint8_t buf[1024] consumes 1KB of task stack in one function call); fix: increase OsStacksize or move array to static/global allocation

▸ Heap corruption debugging (if dynamic allocation used in non-AUTOSAR code): jemalloc or TLSF allocator with guard metadata; on double-free: allocator detects freed chunk returning to free list with invalid previous-free flag → triggers abort(); on heap overflow: corrupted next-chunk metadata causes segfault on next malloc - use Valgrind (host) or allocator debug mode with poison bytes (fill allocated memory with 0xCD, freed memory with 0xDD → any use-after-free reads 0xDD bytes, detectable via watchpoint on memory region); in TRACE32: scan heap for corrupted metadata: Data.FIND 0xDDDDDDDD in heap address range reveals freed-then-accessed blocks

Hard Fault & Exception Analysis 40 min read

▸ ARM Cortex-M HardFault escalation path: exceptions cascade to HardFault when: (1) BusFault/MemManage disabled (SHCSR bit not set); (2) fault occurs during fault handler execution; (3) Vector table read fails; on HardFault entry: CPU pushes context frame {R0,R1,R2,R3,R12,LR,PC,xPSR} onto PSP or MSP depending on LR (EXC_RETURN); read HFSR (HardFault Status Register at 0xE000ED2C): FORCED bit = escalated configurable fault; VECTTBL bit = vector table read error; DEBUGEVT bit = debug event caused HF; read CFSR (Configurable Fault Status Register 0xE000ED28) for original fault cause

▸ CFSR decoding for root cause: MMFSR (MemManage Fault, byte 0): IACCVIOL (instruction fetch from non-executable region), DACCVIOL (data access to protected region), MMARVALID + MMFAR (0xE000ED34 = faulting address); BFSR (BusFault, byte 1): PRECISERR + BFAR (0xE000ED38 = exact faulting address for precise faults), IMPRECISERR (write buffer fault, faulting address unknown), IBUSERR (instruction fetch error); UFSR (UsageFault, byte 2): UNDEFINSTR (undefined instruction - ARM/Thumb interworking error or FPCA not enabled with FPU instruction), UNALIGNED (unaligned memory access with CCR.UNALIGN_TRP=1), DIVBYZERO (integer division by zero with CCR.DIV_0_TRP=1)

▸ HardFault handler implementation for maximum diagnostic info: void HardFault_Handler(void) { uint32_t *sp = __get_PSP(); /* or MSP */ uint32_t pc = sp[6]; uint32_t lr = sp[5]; uint32_t hfsr = SCB->HFSR; uint32_t cfsr = SCB->CFSR; uint32_t mmfar = SCB->MMFAR; uint32_t bfar = SCB->BFAR; /* log to NVM circular buffer with DEM: */ DEM_ReportFault(DemEvent_HardFault, pc, cfsr); /* capture stack trace: */ CrashDump_SaveStackTrace(sp, 16); /* attempt reset after save: */ NVIC_SystemReset(); } - saves diagnostic info to NVM before reset for post-reset analysis via UDS 0x22 DID

▸ Aurix TC3xx TRAP analysis: Aurix uses TRAP classes instead of ARM exceptions; Class 1 (Memory Protection) TIN 1–4 cover code/data/peripheral/stack protection violations; Class 2 (Internal Protection) for SFR access violations; TRAP handler receives Class and TIN in D15/A4 registers; in TRACE32: TRAP info visible in Register.view as TIN value after halt; read ICSR (Interrupt Control and Status) to confirm which core triggered TRAP; MCDS can log all TRAP events non-intrusively with full register context and memory snapshot; configure MCDS trigger on Class 1 TIN 1 to capture 10µs pre-fault trace history

Post-Mortem Debugging 35 min read

▸ Core dump collection in AUTOSAR: implement crash dump handler that saves to NVM (DFlash) before reset - dump includes: register snapshot (all GPRs + PC + xPSR), stack region of each active task (2KB × 4 tasks = 8KB), global variables of interest (list defined in crash_dump_config.h), ETM trace buffer if available (last 32KB of instructions), DEM event snapshot (all stored DTC with most recent occurrence counter); NVM dump accessed post-reset via UDS ReadDataByIdentifier 0x22 DID_CRASH_DUMP; crash dump format: binary struct CrashDump_t with magic + CRC-32 + timestamp

▸ ELF + core dump analysis on host: save crash dump binary from ECU via UDS; load into GDB: gdb firmware.elf; target remote : (via JTAG) or (gdb) core crash_dump.bin (with custom GDB stub that maps dump to GDB remote protocol memory regions); GDB backtrace command reconstructs call stack from saved SP + PC in dump; list variables with frame command; alternatively load dump in TRACE32: Data.Load.Binary crash_dump.bin 0x20000000 (load at SRAM address) → then Frame.view reads stack as if connected to live target; PRACTICE script auto-detects dump format and populates all register aliases

▸ Freeze frame data for field returns: OBD-required freeze frame stores sensor data at moment of emissions-related DTC set (DEM DTC with DemDtcAttributeClass PrimaryMemory, DemFreezeFrameClass linked); for field failure analysis: expand freeze frame to include custom debug data - engine speed, coolant temp, CAN error counter, task CPU load % at fault time; stored in DEM extended data records (DemExtendedDataRecordClass); read via UDS 0x19 0x04 DTC 0x123456 ExtRecordNumber 0x01 → returns 64 bytes of snapshot data for forensic analysis without vehicle on bench

▸ Watchdog reset analysis: distinguish intentional reset (software WDG trigger after detected error) from spurious WDG timeout (task overrun or infinite loop); on startup, read reset reason register (RSTSTAT on Aurix: bit 3 = SW reset, bit 4 = WDG reset, bit 5 = power-on reset); if WDG reset: load last crash dump from DFlash; check which WdgM checkpoint was last reached (stored in NVM: WdgM_LastCheckpoint variable) → task that missed its deadline identified; compare against CPU load historical log in DFlash circular buffer - isolate if specific task or ISR spike caused the overrun leading to WDG reset

Remote Debugging & CI Integration 40 min read

▸ TRACE32 remote debugging over GigE: TRACE32 PowerDebug GigE model connects probe to ECU via JTAG locally and to host PC via 1GbE LAN; configure config.t32: HOST=192.168.1.10 PORT=20000; launch t32marm (or t32mtc for Aurix TriCore) daemon on debug server; engineer connects from home PC: t32rem.exe 192.168.1.10 port=20000 "DO init.cmm" - runs debug session remotely; TRACE32 GigE supports full trace data streaming (ETM/MCDS) at up to 400 MB/s - sufficient for real-time instruction trace over LAN; VPN latency < 20ms acceptable for interactive debugging, critical for trace capture

▸ Jenkins CI pipeline integration with TRACE32: Jenkins stage "Hardware Regression": sh "t32rem.exe ${PROBE_IP} port=20000 'DO /ci/regression_test.cmm ${FIRMWARE_ELF}'" ; regression_test.cmm flashes ELF, runs test sequence, writes PASS/FAIL to file; sh "cat test_results.txt" → Jenkins reads results; on FAIL: Jenkins marks build UNSTABLE, attaches TRACE32 error log; probe shared between CI builds via mutex (Jenkins lockable resources plugin); one probe per 2 CI agents using round-robin scheduling; stack trace from failed test automatically attached to Jira ticket via Jenkins-Jira integration

▸ GDB Remote Protocol (GDB stub): open-source alternative to TRACE32 for simple embedded targets; OpenOCD acts as GDB server: openocd -f interface/ftdi/minimodule.cfg -f target/stm32f4x.cfg -c "gdb_port 3333"; GDB connects: (gdb) target remote localhost:3333; Flash: (gdb) load firmware.elf; Set BP: (gdb) break App_Main; (gdb) continue; inspect: (gdb) info registers; (gdb) backtrace; (gdb) x/16xw 0x20000000 - examine 16 words of RAM; limitations vs TRACE32: no ETM trace, limited multi-core support, slower flash programming; suitable for STM32/NXP LPC targets in open-source projects or when TRACE32 license budget unavailable

▸ Hardware-in-the-loop debug automation: pytest + TRACE32 Python API framework; test fixture: powers ECU via controllable PSU (Rohde & Schwarz NGM202, command via SCPI), controls CAN bus via python-can + PEAK PCAN-USB, controls fault injection via NI DAQ GPIO; test function: psu.set_voltage(12.0); t32.connect(); t32.flash("firmware.elf"); psu.power_on(); t32.cmd("Go"); can_bus.send(msg_engine_speed_100kph); time.sleep(0.01); result = t32.read_var("g_torque_output_Nm"); assert 150 <= result <= 200; psu.power_off(); full test suite runs in CI nightly on dedicated debug server with physical ECU attached - provides hardware regression coverage impossible to replicate with SIL simulation

Hands-On: Complex Bug Investigation 60 min read

▸ Scenario: intermittent ECU reset during long-duration endurance test (5-hour drive cycle); occurs ~3 times per 100-hour run; reset reason register shows WDG timeout; WdgM_LastCheckpoint log shows last checkpoint was Task_Diag (50ms task); no DEM events logged before reset (reset too fast to log); reproduce: enable full MCDS trace (all 6 cores, all events), run CI overnight test, collect trace after third WDG reset; 90% probability of capturing the reset in 256 MB trace buffer

▸ Trace analysis phase: load MCDS trace in TRACE32; search for WDG service calls (Task_Diag calls WdgM_MainFunction every 50ms → triggers WDG hardware service); in Trace.TASK.Chart: find last WDG service at t=-14.7s before reset; Task_Diag next activation at t=-14.65s (on time) but Trace.STATISTIC.FUNC shows Task_Diag running for 14.7s continuously without returning - infinite loop; drill into what Task_Diag was executing for 14.7s: Rte_Call_PersistentStorage_WriteBlock() → loops waiting for NVM write complete (never returns); NVM write called with invalid block ID 0xFF → NVM driver enters infinite retry loop

▸ Root cause and fix: NVM block ID 0xFF passed to Nvm_WriteBlock() from Task_Diag - trace back origin: Rte_IWrite_Task_Diag_NvmBlockId(&blockId) - data comes from CAN message signal NvmDiag_BlockId; signal not range-validated before use; CAN message sent with corrupt data during specific fault injection test pattern at hour ~3; fix: add range validation before NVM call (if blockId > NVM_MAX_BLOCK_ID) return E_NOT_OK; add Det_ReportError for invalid block ID; add NVM timeout: if NVM write takes > 100ms, return error (add counter in NVM driver loop); WdgM checkpoint: Task_Diag MUST reach WdgM_CheckpointReached(WdgM_CheckpointId_TaskDiag_LoopEnd) inside the NVM call - detect hang before WDG fires

▸ Verification and reporting: re-run 200-hour endurance test with fix applied; zero WDG resets in 200 hours (vs 6 resets expected in same duration before fix); verify NVM invalid block ID error now logged via DEM DemEvent_NvmInvalidBlockId (CVSS N/A, safety severity S1 operational impact) for field diagnostics; MCDS trace from new run confirms Task_Diag completes in < 45ms on every activation; document root cause analysis, fix, verification evidence in engineering change request; TARA team updates TS-NVM-01 threat scenario residual risk with new countermeasure (input validation + timeout) reducing feasibility from Medium to Very Low

Tools Used

Lauterbach TRACE32 iSYSTEM winIDEA Arm DS-5 Infineon DAS PLS UDE SEGGER J-Link

What You'll Learn

Configure and use TRACE32 for professional ECU debugging

Perform runtime trace analysis for timing verification

Debug multi-core AUTOSAR systems effectively

Diagnose stack overflows, memory corruption, and hard faults

Write automation scripts for repetitive debug tasks

Integrate debugging into CI/CD workflows

Prerequisites

C programming proficiency

Basic understanding of microcontroller architecture

Familiarity with AUTOSAR or RTOS concepts