| Phase | Goal | Tool |
|---|---|---|
| Baseline | Measure execution time of CAN decode loop | STM cycle counter or ETM trace |
| Identify hotspot | Find which sub-function consumes most cycles | -fstack-usage, flat profile output |
| Optimise | Apply table lookup / loop unroll / memory placement | Code changes + rebuild |
| Verify | Confirm speedup; run MISRA check; run unit tests | PC-lint + Cppcheck + test runner |
Capstone: Profile and Optimise a CAN Signal Processing Module
Exercise 1: Baseline Profiling
/* Profile CAN signal decode loop using STM hardware counter */
#include
#define STM_TICKS_PER_US 300u
#define PROFILE_ITERATIONS 1000u
typedef struct {
uint32_t min_ticks;
uint32_t max_ticks;
uint64_t total_ticks;
uint32_t count;
} PerfProfile_t;
static PerfProfile_t g_decode_profile;
void profile_can_decode(void)
{
uint8_t test_frame[8] = {0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0};
CanSignals_t signals;
g_decode_profile = (PerfProfile_t){UINT32_MAX, 0u, 0u, 0u};
for (uint32_t i = 0u; i < PROFILE_ITERATIONS; i++) {
uint32_t t_start = STM0_TIM0;
Can_DecodeSignals(test_frame, &signals); /* function under test */
uint32_t elapsed = STM0_TIM0 - t_start;
if (elapsed < g_decode_profile.min_ticks) g_decode_profile.min_ticks = elapsed;
if (elapsed > g_decode_profile.max_ticks) g_decode_profile.max_ticks = elapsed;
g_decode_profile.total_ticks += elapsed;
g_decode_profile.count++;
}
uint32_t avg = (uint32_t)(g_decode_profile.total_ticks / PROFILE_ITERATIONS);
/* Report via DID or SWO trace:
min=%u µs, avg=%u µs, max=%u µs
values in ticks/300 = µs */
} Exercise 2: Optimise with LUT and Loop Unrolling
/* Before: decode each signal with bit shifts and divisions */
float decode_engine_speed_slow(const uint8_t *data)
{
uint16_t raw = ((uint16_t)data[3] << 8u) | (uint16_t)data[4];
return (float)raw * 0.25f; /* float multiply: 3–20 cycles on softfp */
}
/* After: Q7.8 fixed-point pre-scaled LUT (if range is known) */
/* Or: integer scaling avoiding float */
uint16_t decode_engine_speed_fast_rpm4(const uint8_t *data)
{
/* Result in units of 0.25 rpm to avoid float — caller scales if needed */
uint16_t raw = ((uint16_t)data[3] << 8u) | (uint16_t)data[4];
return raw; /* raw × 1 in 0.25 rpm units; no multiply needed */
}
/* Bulk decode all signals in one pass (cache-friendly: one pass over data) */
void Can_DecodeSignals_Optimised(const uint8_t *data, CanSignals_t *out)
{
/* Process all signals in byte order: temporal locality for CPU cache */
out->engine_speed_rpm4 = ((uint16_t)data[3] << 8u) | (uint16_t)data[4];
out->vehicle_speed_ms8 = (uint16_t)data[2] | ((uint16_t)data[3] << 8u);
out->coolant_temp_raw = data[1];
out->throttle_pct = data[0];
/* Single pass through 8 bytes: all signals decoded with minimal branching */
}Exercise 3: Regression Verification
#!/bin/bash
# Verification pipeline: optimisation must not break correctness or MISRA compliance
set -e
echo "=== Step 1: Build and run unit tests ==="
cmake --build build --target unit_tests
./build/unit_tests --reporter compact # must be 0 failures
echo "=== Step 2: MISRA compliance check ==="
cppcheck --addon=misra.py --misra-c=2012 src/can_decode.c 2>&1 | tee misra_results.txt
MANDATORY=$(grep -c "mandatory" misra_results.txt 2>/dev/null || echo 0)
[ "$MANDATORY" -eq "0" ] || { echo "FAIL: mandatory MISRA violations"; exit 1; }
echo "=== Step 3: Code size check (flash must not grow by >5%) ==="
BASELINE_SIZE=98304 # bytes from before optimisation
CURRENT_SIZE=$(arm-none-eabi-size build/app.elf | awk 'NR==2{print $1}')
MAX_SIZE=$((BASELINE_SIZE * 105 / 100))
[ "$CURRENT_SIZE" -le "$MAX_SIZE" ] || { echo "FAIL: code grew too large"; exit 1; }
echo "=== Step 4: Performance benchmark ==="
# Run on target via JTAG; check profile DID via CAN
# python3 ci/read_profile_did.py --can vcan0 --did 0xFF01 --max-wcet-us 5
echo "All checks PASSED"
echo "Baseline: $BASELINE_SIZE bytes Current: $CURRENT_SIZE bytesSummary
The profiling capstone captures the complete optimisation workflow: measure (what is slow?), identify (why is it slow?), optimise (fix the root cause), verify (same results, MISRA-clean, no size regression). The most common finding in automotive signal decode code: float arithmetic where integer equivalents work (factor 5–10× speedup on softfp MCUs), and repeated passes over the same data bytes where a single-pass decode is possible. The verification pipeline is as important as the optimisation: a speedup that introduces a MISRA violation, fails a unit test, or exceeds the flash budget is not a net improvement.
🔬 Deep Dive — Core Concepts Expanded
This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.
Key principles to reinforce:
- Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
- Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
- Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.
🏭 How This Topic Appears in Production Projects
- Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
- Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
- Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.
⚠️ Common Mistakes and How to Avoid Them
- Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
- Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
- Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
- Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.
📊 Industry Note
Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.