Code Size & Speed Optimization - Embedded C/C++ for Automotive

Compiler Optimisation Flags

Flag	Effect	Use Case
−O0	No optimisation; fastest compile	Debug builds: full debuggability
−O1	Basic optimisation; safe for debugger	Development: some optimisation, still debuggable
−O2	Full optimisation; recommended for release	Production: best balance speed/size
−O3	Aggressive: auto-vectorisation, inlining	High-performance compute; verify MISRA compliance
−Os	Optimise for size	Flash-constrained systems; may be slower than −O2
−Og	Optimise for debuggability	Debug build: better optimisation than −O0 but debugger still works
−flto	Link-time optimisation	Cross-translation-unit inlining; significant speedup on larger codebases

Code Size Reduction Techniques

Ccode_size.c

/* Technique 1: Function attributes to guide linker/compiler */
/* __attribute__((noinline)): prevent inlining of rarely-called functions */
__attribute__((noinline)) void Error_Handler(uint32_t error_code)
{
    /* complex error handling: don't inline at every call site */
    Log_Error(error_code);
    DEM_ReportError(error_code);
}

/* __attribute__((always_inline)): force inlining of hot small functions */
static __attribute__((always_inline)) inline uint32_t
Gpio_ReadPin(uint8_t port, uint8_t pin)
{
    return (PORT(port)->IN >> pin) & 0x01u;  /* 2–3 instructions: inline is free */
}

/* Technique 2: Avoid printf/scanf in production (largest contributor to code size) */
/* printf + format strings + float support: +30–80 kB flash */
/* Replace with: semihosting, UART with custom itoa, or SWO trace */

/* Technique 3: Linker garbage collection */
/* gcc: -ffunction-sections -fdata-sections -Wl,--gc-sections
   Each function/variable gets its own section; linker removes unused ones
   Can save 10–30% flash on large codebases with many helper functions */

/* Technique 4: Section attributes to control placement */
__attribute__((section(".ccmram")))  /* place in Core-Coupled Memory for speed */
static uint32_t g_hot_data[64];

__attribute__((section(".fastcode"))) /* place code in RAM for zero-wait execution */
void __attribute__((noinline)) Fast_CriticalISR(void)
{
    /* executed from RAM: single-cycle instruction fetch */
}

Speed Optimisation Techniques

Cspeed_opt.c

/* Technique 1: Loop optimisation */
/* Inefficient: function call overhead per element */
void process_slow(const uint8_t *in, uint8_t *out, uint16_t len)
{
    for (uint16_t i = 0u; i < len; i++) {
        out[i] = compute_expensive(in[i]);  /* function call per byte */
    }
}

/* Faster: unroll by 4 (reduces loop overhead) */
void process_fast(const uint8_t *in, uint8_t *out, uint16_t len)
{
    uint16_t i = 0u;
    for (; i + 3u < len; i += 4u) {   /* process 4 at a time */
        out[i+0u] = lut[in[i+0u]];    /* table lookup: 1 cycle vs function call */
        out[i+1u] = lut[in[i+1u]];
        out[i+2u] = lut[in[i+2u]];
        out[i+3u] = lut[in[i+3u]];
    }
    for (; i < len; i++) { out[i] = lut[in[i]]; }  /* remainder */
}

/* Technique 2: DSPR scratchpad placement (Aurix TC3xx) */
/* DSPR: 256 kB Core-local SRAM at 0x50000000; single-cycle access */
/* Compare: LMU RAM 3–5 cycles; PFlash 3–10 cycles (cached) */

/* Technique 3: Cortex-M SIMD via ARM DSP extension (Cortex-M4/M7) */
/* __SADD16: simultaneous 16-bit adds on two pairs of int16 */
/* Useful for Q15 vector arithmetic, FIR filter coefficient application */
#include "cmsis_gcc.h"
uint32_t simd_add(uint32_t a, uint32_t b) { return __SADD16(a, b); }

Summary

The golden rule of optimisation: measure first, then optimise the measured hotspot. A profiler (ETM trace on Aurix, cycle counting on Cortex-M, or GNU gprof on native tests) identifies the 10% of code that consumes 90% of execution time. Optimising the other 90% gives diminishing returns and increases maintenance risk. For code size, -ffunction-sections -fdata-sections -Wl,--gc-sections combined with removing unused code paths typically saves 10–30% flash with zero effort. For speed, the highest-impact techniques are: table lookups instead of computation, placement in fast memory (DSPR/CCM), and avoiding function call overhead in tight loops.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.