Runtime Performance Profiling - AUTOSAR Expert

TRACE32 Per-Function Cycle Count

TRACE32function_profiling.cmm

/* Profile individual functions inside Dem_MainFunction */
/* Requires ETM (Embedded Trace Macrocell) hardware */

TRACE.METHOD ETM
TRACE.ON

/* Set profiling scope to Dem_MainFunction */
PERF.GATE Dem_MainFunction
Go
WAIT 1.s
PERF.LISTFUNC  /* sorted by CPU time */

/* Example output:
   Function                    | Calls | Total (us) | Self (us) | %
   Dem_MainFunction            |   100 |  5000      |    200    | 4%
   Dem_EventDebounce           |  3200 |  2000      |   2000    | 40%  ← hot
   Dem_DtcStorageHandling      |   100 |  1500      |   1500    | 30%
   Dem_ClearDTCFilter          |    10 |   800      |    800    | 16%

   Finding: Dem_EventDebounce is called 32x per Dem_MainFunction cycle
   Action: increase DemDebounceCounterBasedStepSize to skip unnecessary calls */

Preemption Chain Analysis

Task Preemption Chain Example

  Time →  0ms    1ms    2ms    3ms    4ms    5ms    6ms    7ms    8ms    9ms   10ms
  ─────────────────────────────────────────────────────────────────────────────────
  Task_SafetyCtrl_10ms  ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████
                       start ↑                                      ↑ end (gross=10ms)
  Task_2ms              ░░██░░░░██░░░░██░░░░██░░░░               net=0.8ms each
                                                                   preempts Safety 4×
  Task_1ms              ░██░░██░░██░░██░░██░░██░░██░░██░░██░░██░  preempts Safety 8×

  Safety task net time = 3.5ms; gross time = 10ms
  Preemption overhead = 6.5ms of gross time
  → Safety task is not overloaded; gross > net is expected and correct
  → Concern would be: net > 10ms (execution budget exceeded)

💡 Net vs Gross Time

Net time is the actual CPU cycles consumed by the task. Gross time is wall-clock time from task activation to termination, including all preemptions by higher-priority tasks. A safety task with net=3.5ms and gross=10ms on a 10ms period is healthy. A safety task with net=10.5ms on a 10ms period has missed its execution budget — this triggers OsTaskTimeFrame protection.

Generated Code Efficiency: Signal Getter Inlining

Shellverify_signal_inline.sh

#!/bin/bash
# Verify COM signal getters compile to direct struct access, not function calls
arm-none-eabi-objdump -d ECU_EPS.elf | grep -A5 "Com_GetSignal_VehicleSpeed"

# BAD: function call overhead (~10 cycles on Cortex-M7)
# 0800A100: push {lr}
# 0800A102: bl   0800B000 
# 0800A106: pop  {pc}

# GOOD: inlined direct memory access (~3 cycles)
# 0800A100: ldr  r0, [r4, #0x14]  ; direct struct field access
# 0800A102: uxth r0, r0            ; zero-extend uint16

# If not inlined: add INLINE keyword to Com_GetSignal wrapper in Com_Cfg.h
# Or enable -O2 and verify __attribute__((always_inline)) is present in generated header

Data Cache Optimization: COM Buffer Layout

Ccache_aligned_com.c

/* Place frequently-accessed COM signal buffers in same cache line */
/* Cortex-R5 L1 D-cache: 32-byte cache lines */

/* BEFORE: signal buffers scattered in .bss — multiple cache misses per task */
uint16 Com_Sig_VehicleSpeed;   /* .bss offset 0x100 */
uint16 Com_Sig_EngineRPM;      /* .bss offset 0x240 — different cache line */
uint16 Com_Sig_GearPosition;   /* .bss offset 0x380 — different cache line */

/* AFTER: co-locate in linker script section */
/* In linker scatter file: */
/* .com_hot_signals 0x20001000 : { *(.com_hot_signals) } */

/* In Com_Cfg.h: */
uint16 Com_Sig_VehicleSpeed   __attribute__((section(".com_hot_signals")));
uint16 Com_Sig_EngineRPM      __attribute__((section(".com_hot_signals")));
uint16 Com_Sig_GearPosition   __attribute__((section(".com_hot_signals")));
/* All three fit in one 32-byte cache line → 3 signals for price of 1 cache miss */

Summary

Runtime performance profiling combines function-level ETM profiling (to find hot BSW functions like Dem_EventDebounce), preemption chain analysis (to distinguish legitimate gross/net time gaps from actual overload), and generated code inspection (to catch non-inlined signal getters). Data cache optimization is the highest-leverage technique on Cortex-R5/M7 ECUs where L1 cache miss latency (4–10 cycles) dominates tight BSW main function execution time.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.