Hands-On: AUTOSAR Stack Debugging - Debugging & Tracing

Lab Setup: AUTOSAR BSW Stack

Component	Detail
AUTOSAR BSW	RTA-OS (ETAS) or Tresos (EB); OsTask_10ms, OsTask_100ms, OsTask_1ms defined
ORTI file	gen/Os_Debug.orti — load with TASK.CONFIG
Lab ELF	autosar_lab.elf — compiled with Os debug hooks and -g for full symbols
Exercises	Three progressively harder bugs; each has a 'hint' script in lab_hints/

Exercise 1: Task Executing Longer Than Its Period

CMMlab_ex1_task_overrun.cmm

// Exercise 1: OsTask_10ms is taking > 10ms; system shows timing violations
// Goal: find which runnable is causing the overrun

// Step 1: Set breakpoints at task entry and exit (OS scheduler hooks)
Break.Set OsTask_10ms_Entry /Program    // first instruction of 10ms task body
Break.Set OsTask_10ms_Exit  /Program    // last instruction before ChainTask/TerminateTask

// Step 2: Use STM timer to measure wall-clock task duration
// Aurix STM0 counts at CPU clock (300 MHz); increment = 3.33 ns
LOCAL &t_start &t_end &duration_us
Go
WAIT !STATE.RUN() 1s      // halt at entry
&t_start=Data.Long(SFR:STM0.TIM0)
Go
WAIT !STATE.RUN() 1s      // halt at exit
&t_end=Data.Long(SFR:STM0.TIM0)
&duration_us=(&t_end-&t_start)/300.    // 300 ticks per microsecond
PRINT "OsTask_10ms duration: " FORMAT.FLOAT(2.,1.,&duration_us) " µs"
IF &duration_us>10000.
    PRINT %ERROR "OVERRUN: task took " FORMAT.FLOAT(2.,0.,&duration_us) " µs (budget: 10000)"

// Step 3: Profile which function is using the most time
// Run with ETM trace enabled; use runtime-measurement-profiling for full profile
// Quick method: set breakpoints inside each runnable; measure STM between calls
// Expected finding: Can_MainFunction has unexpected loop iterating all 128 mailboxes
//                   instead of only active ones — linear scan O(n) vs O(1) expected

Exercise 2: Resource Deadlock Between Tasks

CMMlab_ex2_deadlock.cmm

// Exercise 2: System appears frozen; OsTask_10ms and OsTask_100ms both waiting
// Classic OSEK resource priority ceiling deadlock scenario

// Step 1: Check all task states
TASK.List    // expected: both tasks in WAITING state simultaneously
             // impossible in non-extended OSEK unless there is a deadlock

// Step 2: Check which resources are held
// OSEK resource occupation stored in task control block
TASK.select "OsTask_10ms"
Var.View p_TaskControlBlock_10ms.resource_mask    // which resources held?

TASK.select "OsTask_100ms"
Var.View p_TaskControlBlock_100ms.resource_mask

// Step 3: Trace back to the offending GetResource call
TASK.select "OsTask_10ms"
Frame.view /Caller    // which function called GetResource?

// Expected finding:
// OsTask_10ms holds OsResource_ComBuffer and waits for OsResource_CalTable
// OsTask_100ms holds OsResource_CalTable and waits for OsResource_ComBuffer
// -> circular wait -> deadlock
// Fix: establish consistent resource acquisition order across all tasks

Exercise 3: OS ErrorHook Triggered

CMMlab_ex3_errorhook.cmm

// Exercise 3: ProtectionHook fires with E_OS_STACKFAULT — find the root cause

// Step 1: Break at ProtectionHook entry
Break.Set Os_ProtectionHook /Program

// Step 2: Run to hook
Go
WAIT !STATE.RUN() 5s

// Step 3: Inspect error parameters
Var.View OSError_GetServiceId()      // which OS service was being called?
Var.View OSError_GetParam1()         // first parameter
TASK.List                            // which task triggered the protection?

// Step 4: Switch to the faulting task context
TASK.select "OsTask_1ms"             // assume this is the faulting task
Frame.view /Caller                   // what was the task executing?

// Step 5: Inspect stack state
LOCAL &base &used
&base=Var.VALUE(OsTask_1ms_Stack)
&used=&base+Var.VALUE(sizeof_OsTask_1ms_Stack)-Register(SP)
PRINT "Stack used: " &used " bytes"

// Expected: recursive CRC function with no recursion depth limit
// Input from unchecked NVM causes 500+ recursion levels -> stack exhaustion
// Fix: convert recursive CRC to iterative; or add depth limit guard

Summary

Three exercises covering the most common AUTOSAR OS debugging scenarios: task timing overrun (diagnosed with STM cycle counter measurements between task entry/exit breakpoints), resource deadlock (visible in task states + resource masks in TCBs), and ProtectionHook E_OS_STACKFAULT (stack overflow traced back to unbounded recursion via Frame.view in the faulting task context). All three are invisible to printf debugging — they require the debugger's ability to halt the system at the exact moment of failure and inspect OS-internal state.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.