Home Learning Paths ECU Lab Assessments Interview Preparation Pricing Log In Sign Up
Log In Sign Up
Programming

Embedded C/C++ for Automotive

Master embedded C and C++ programming for automotive ECUs. Covers memory management, real-time programming, MISRA-C compliance, peripheral drivers, and RTOS fundamentals.

36 chapters
27.5 hrs reading
7 modules

Overview

Embedded C/C++ is the foundation skill for every automotive software engineer. This course goes beyond general programming to focus specifically on patterns, techniques, and constraints unique to automotive ECU development.

You'll learn memory-mapped I/O, bit manipulation for register access, interrupt handling, RTOS task design, and strict MISRA-C compliance - all within the context of real automotive microcontrollers (AURIX, RH850, S32K).

The course includes extensive hands-on exercises using both simulators and real target hardware examples, preparing you for actual ECU software development at any OEM or Tier-1.

Course Modules

1
C Fundamentals for Embedded
6 chapters • 4.2 hrs reading
Data Types, Qualifiers & Storage ClassesFREE PREVIEW 40 min read
▸ C99 fixed-width types from <stdint.h>: uint8_t, int16_t, uint32_t, uint64_t - platform-independent sizing guaranteed by the C standard; AUTOSAR Std_Types.h aliases: uint8, uint16, uint32, sint8, sint16, sint32, Std_ReturnType=uint8 (E_OK=0, E_NOT_OK=1); char vs uint8_t: char signedness is implementation-defined - MISRA-C:2012 Rule 10.1 mandates always use explicitly signed/unsigned types; uintptr_t for safe pointer-to-integer conversion; size_t for sizeof and array index results; ptrdiff_t for pointer arithmetic differences
▸ Type qualifiers: volatile - prevents compiler from caching value in register (mandatory for memory-mapped SFR access and ISR-shared variables); without volatile the optimizer may eliminate repeated reads of hardware status registers in polling loops; const - marks data read-only, placed in .rodata (flash) saving RAM; const volatile - read-only hardware status register (e.g., volatile const uint32_t *STATUS_REG = (uint32_t*)0xF0001000); restrict (C99) - pointer non-aliasing hint enabling stronger SIMD optimization in memcpy-like MCAL functions
▸ Storage classes: static at function scope - persistent across calls, allocated in .data/.bss instead of stack (use for state machines, ring buffers); static at file scope - internal linkage only (module-private, MISRA Rule 8.6); extern - references variable defined elsewhere; __attribute__((section(".noinit"))) - GCC extension for variables exempt from startup zero-init (used for ProgConditions RAM surviving reset); __attribute__((used)) - prevents linker from removing interrupt vector table entries; __attribute__((weak)) - allows override in unit tests without linker errors
▸ Struct padding and alignment: GCC aligns each member to its natural alignment (uint32_t to 4 bytes, uint16_t to 2 bytes); struct {char a; uint32_t b; char c;} = 12 bytes (3-byte pad after a, 3-byte pad after c); __attribute__((packed)) removes padding - use cautiously since unaligned 32-bit access causes DTRAP on Aurix TC3xx DSPR; __attribute__((aligned(32))) required for Cortex-A DMA buffers needing cache-line alignment; verify: static_assert(sizeof(FrameStruct)==8, "unexpected padding") enforced at compile time; ARM AAPCS specifies 8-byte stack alignment
Pointers, Arrays & Memory LayoutFREE PREVIEW 50 min read
▸ ARM Cortex-M memory map: 0x00000000–0x1FFFFFFF = Code (Flash); 0x20000000–0x3FFFFFFF = SRAM; 0x40000000–0x5FFFFFFF = Peripheral (MMIO); 0xE0000000–0xE00FFFFF = System (CoreSight/NVIC/SysTick); Aurix TC3xx: PFlash 0xA0000000, DSPR0 0x70000000, CSA 0xD0000000 (Call Stack Area); linker script (.ld) defines memory regions and places .text in FLASH, .data in FLASH (load address) and SRAM (run address), .bss in SRAM; startup code copies .data from LMA to VMA and zeroes .bss before main()
▸ Pointer mechanics: pointer arithmetic increments by sizeof(pointed type) - uint32_t *p; p++ advances by 4 bytes; array name decays to pointer to first element (arr == &arr[0]); difference: sizeof(arr) gives total array bytes, sizeof(ptr) gives pointer width (4 or 8 bytes); pointer-to-const: const uint8_t *p (can't write via p, can change p); const pointer: uint8_t * const p (can write via p, can't change p); const pointer-to-const: const uint8_t * const p; void* for generic memory (memcpy signature); NULL pointer = 0x00000000, dereference causes HardFault on Cortex-M (MPU region 0 not mapped)
▸ Stack vs heap vs static allocation: stack grows downward from _estack symbol (linker defined); each function call pushes frame (locals + return address + saved registers); stack overflow → silent corruption unless MPU guard page configured; heap: malloc/free forbidden in safety-critical AUTOSAR (MISRA Rule 21.3, Directive 4.12); all dynamic allocation replaced with static pools (e.g., AUTOSAR MemMap.h section attributes); static global arrays: uint8_t Buffer[1024] declared at file scope - placed in .bss (zero-init) or .data (with initializer); AUTOSAR MemMap: #pragma ghs section bss=".Os_Vars_Noinit_START" for OS-specific sections
▸ ELF binary analysis with GNU tools: arm-none-eabi-objdump -h app.elf → shows all sections (name, LMA, VMA, size); arm-none-eabi-nm --size-sort app.elf → lists all symbols by size (identify large buffers); arm-none-eabi-size app.elf → text + data + bss totals; arm-none-eabi-readelf -S app.elf → detailed section headers with alignment; linker map file (.map) shows exact address of every variable and function; TRACE32: Data.dump A:0x20000000 %Long - inspect SRAM contents live; Var.Global → shows all global variables with current values; use to verify startup copy of .data completed correctly and .bss is zero-filled
Structures, Unions & Bit Fields 45 min read
▸ Struct as hardware register overlay: volatile struct overlaying memory-mapped register block; example - Aurix TC3xx CAN message object: typedef struct { volatile uint32_t MOFCR; volatile uint32_t MOFGPR; volatile uint32_t MOIPR; volatile uint32_t MOAMR; volatile uint8_t MOData[8]; volatile uint32_t MOAR; volatile uint32_t MOCTR; } Can_MsgObjType; mapped via Can_MsgObjType *pMsgObj = (Can_MsgObjType*)0xF0200000; bitfield within MOFCR gives direct access to DLC, DIR, MSGVAL fields; MISRA Rule 18.4 prohibits unions as unsafe but allows struct overlays
▸ Bit fields for register bit access: struct CanMofcr { uint32_t MMC:4; uint32_t GDFS:1; uint32_t IDC:1; uint32_t DLCC:1; uint32_t OVSC:1; uint32_t :24; } __attribute__((packed)); embedded in union with uint32_t raw for full-word access: union { CanMofcr_t bits; uint32_t word; }; caution: bit field allocation order (LSB-first vs MSB-first) is compiler + endianness dependent - MISRA Rule 6.1 mandates only unsigned or signed int bit fields, never char or bool; Aurix GCC: bit fields fill from LSB in little-endian words; always test with a static_assert checking union word value against expected bit pattern
▸ Unions for type punning: CAN frame raw bytes to structured access: union { uint8_t bytes[8]; struct { uint16_t id; uint8_t dlc; uint8_t data[5]; } frame; }; avoids strict aliasing violations (prefer memcpy for aliasing-safe type punning in C99); union for floating-point bit inspection: union { float f; uint32_t u; } fp; fp.f = 3.14f; uint32_t bits = fp.u; (reads IEEE 754 bit pattern); CAN signal packing: CAN frame payload often packed with multiple signals at arbitrary bit offsets - requires shift+mask extraction rather than struct overlay to avoid endianness/alignment issues
▸ Struct initialization patterns for embedded: designated initializers (C99): Can_ConfigType cfg = {.baudrate = 500000, .mode = CAN_MODE_NORMAL, .rxFifoEnabled = FALSE}; guarantees zero for unspecified fields; compound literals for temporary struct passing: Dcm_ExtendedDataRecordType rec = (Dcm_ExtendedDataRecordType){.recordNumber=0x01, .dataClass=DEM_EXTENDED_DATA_OCCURRENCE_CNT}; flexible array member (C99): struct Buffer { uint16_t length; uint8_t data[]; }; allocated with malloc-replacement in statically-sized memory pool; AUTOSAR pattern: all module config structs defined as const and placed in .rodata by linker - zero RAM overhead for configuration data
Function Pointers & Callbacks 40 min read
▸ Function pointer syntax: return_type (*name)(arg_types); example: typedef void (*Can_NotifFnc_t)(uint8_t controllerId, Can_ReturnType result); declaration: Can_NotifFnc_t TxCallback; assignment: TxCallback = App_TxComplete; invocation: TxCallback(0, CAN_OK); typedef is mandatory in AUTOSAR for readability and MISRA Rule 11.1 compliance (conversions between function pointers and other types prohibited); storing in const struct: static const CanIf_CtrlCfgType CtrlCfg = {.TxCancelNotifFnc = App_TxCancel, .BusOffNotifFnc = App_BusOff}; function pointer tables placed in .rodata (flash)
▸ Dispatch tables (function pointer arrays): AUTOSAR RTE uses generated dispatch arrays for SWC port calls; example: typedef Std_ReturnType (*ServiceHandler)(const uint8_t *req, uint8_t *resp, uint16_t *len); static const ServiceHandler ServiceTable[256] = {[0x10]=Dcm_HandleSessionCtrl, [0x22]=Dcm_HandleRDBI, [0x2E]=Dcm_HandleWDBI}; SID lookup: if(ServiceTable[sid]) ServiceTable[sid](req, resp, &len); NULL check mandatory before call (MISRA Rule 11.6); vs switch-case: table O(1) vs switch O(n) for large SID space; AUTOSAR OS hooks (PreTaskHook, PostTaskHook, ErrorHook) registered as function pointers in OS configuration struct
▸ AUTOSAR callback patterns: MCAL CanIf calls application via DET (Default Error Tracer) callback on errors: CanIf_TxConfirmation(PduIdType CanTxPduId) - called by Can_MainFunction_Write; callback registered in CanIf_CallbacksType config; SWC client-server: Rte_Call_Port_Interface_Op() → calls server runnable via function pointer table generated by RTE; Com_RxCallout: application registers per-PDU receive callback to post-process signal values before storage; MISRA Directive 4.1 (rely on defined behavior): function pointers through ROM table always have defined values vs dynamic registration which may leave NULL entries
▸ State machine implementation using function pointers: typedef void (*StateFnc)(void); static StateFnc StateTable[] = {State_Init, State_Ready, State_Active, State_Error}; static uint8_t currentState = STATE_INIT; main loop: StateTable[currentState](); each state function sets currentState to transition; advantage over switch: adding new state requires no changes to dispatch loop; AUTOSAR BswM (BSW Mode Manager) uses mode-transition tables with function pointers for mode notification callbacks; null safety: initialize all state table entries with a default handler that logs an error rather than leaving uninitialized entries as NULL
Preprocessor Macros & Conditional Compilation 35 min read
▸ Object-like macro pitfalls: #define BUFFER_SIZE 256 - safe; #define DOUBLE(x) x*2 - unsafe (DOUBLE(a+b) = a+b*2); correct: #define DOUBLE(x) ((x)*2); double-evaluation: #define MAX(a,b) ((a)>(b)?(a):(b)) evaluates a and b twice - dangerous with side effects (MAX(i++, j++)); C99 inline functions are the safe alternative: static inline uint32_t Max(uint32_t a, uint32_t b) {return a>b?a:b;} - type-safe, single evaluation, optimizer produces equivalent code; MISRA-C:2012 Directive 4.9: function-like macros should be replaced with inline functions where possible
▸ Conditional compilation for hardware variants: #ifdef CPU_TC397B → TC397-specific register definitions; #if (MCU_CLOCK_FREQ == 300000000) → 300MHz-specific clock divider; #ifndef UNIT_TEST → exclude hardware register access in host simulation; AUTOSAR Compiler.h: INLINE, LOCAL_INLINE, FUNC(rettype, memclass), P2CONST(ptrtype, ptrclass, memclass) macros abstract compiler-specific attributes for portability across GCC, Green Hills, Tasking, IAR; Platform_Types.h: CPU_TYPE_8/16/32, CPU_BIT_ORDER, CPU_BYTE_ORDER - auto-selected per target; #error directive: #if !defined(MCU_TYPE) #error "MCU_TYPE must be defined" #endif - enforces required build definitions
▸ X-macro pattern for DID/DTC tables: #define DID_TABLE X(0xF190, "VIN", 17, DID_READ_ONLY) X(0xF18A, "HW_NR", 10, DID_READ_WRITE) X(0xF101, "VARIANT", 1, DID_READ_WRITE); generate enum: enum DID_IDs { #define X(id, name, len, rw) DID_##name, DID_TABLE #undef X }; generate string table: const char *DID_Names[] = { #define X(id, name, len, rw) #name, DID_TABLE #undef X }; eliminates manual synchronization of parallel arrays; used in AUTOSAR generated code (DcmDspDid tables) and DEM event ID enumerations
▸ Token pasting and stringification: ## operator: #define REG(n) REG_##n → REG(STATUS) = REG_STATUS; # operator: #define STRINGIFY(x) #x → STRINGIFY(BUFFER_SIZE) = "256"; used in AUTOSAR DET module: DET_REPORTERROR(MODULE_ID, INSTANCE_ID, API_ID, ERROR_ID) macro compiles to Det_ReportError() call with literal constants; static_assert (C11) in embedded headers: _Static_assert(sizeof(uint32_t)==4, "uint32_t size mismatch") - catches cross-compiler type size differences at compile time; __LINE__ and __FILE__ macros in DET and error logging macros; __func__ (C99) for current function name in assert messages
Hands-On: Memory Layout Analysis 50 min read
▸ Lab exercise: compile a C source file for ARM Cortex-M4 using arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -O2 -c module.c -o module.o; link with a minimal linker script defining FLASH at 0x08000000 (512KB) and RAM at 0x20000000 (128KB); run arm-none-eabi-objdump -h app.elf to list sections: .text (code in FLASH), .rodata (const strings/tables in FLASH), .data (initialized globals - load=FLASH, run=RAM), .bss (zero-init globals in RAM), .stack (RAM top); arm-none-eabi-size app.elf prints text+data+bss totals → text+data = FLASH used, data+bss = RAM used at runtime
▸ Symbol table inspection: arm-none-eabi-nm --size-sort --print-size app.elf | tail -20 → shows 20 largest symbols; 'T' = .text (code), 't' = local code, 'D' = initialized data, 'd' = local data, 'B' = BSS, 'b' = local BSS, 'R' = read-only data; identify large buffers by 'B' symbols; arm-none-eabi-objdump -d app.elf → disassembly with Thumb2 instructions; arm-none-eabi-readelf -s app.elf → full symbol table with addresses and sizes; linker map file (-Wl,-Map=app.map): shows exact address of every function and global variable - use to detect section overlap or unexpected large objects
▸ Struct padding analysis: write a C program that prints sizeof() for several structs with different member orderings: struct A {char a; uint32_t b; char c;} → 12 bytes; struct B {uint32_t b; char a; char c;} → 8 bytes (2 bytes saved by reordering); verify: printf("sizeof(A)=%u, sizeof(B)=%u\n", sizeof(struct A), sizeof(struct B)); run on host x86 and cross-compiled ARM target - compare results to see platform differences; add offsetof() inspection: printf("offset b=%u\n", offsetof(struct A, b)) → shows padding inserted; experiment with __attribute__((packed)) and note size reduction vs potential unaligned access risk
▸ TRACE32 live memory analysis: after flashing to target, use TRACE32: Var.Global → shows all global variables with types, addresses, values; Data.dump A:0x20000000 %Long %Decimal → hex+decimal view of SRAM; verify startup: check .bss range is zero-filled (Data.dump showing 0x00000000 for uninitialized globals); check .data was copied correctly (compare LMA value in FLASH with VMA value in RAM using Data.compare command); measure actual stack usage: fill stack with 0xA5 pattern in startup → run application → use Data.dump to find deepest non-0xA5 address → compute high-water mark; cross-reference with WCET analysis for safety margin
2
Embedded-Specific C Patterns
6 chapters • 4.5 hrs reading
Memory-Mapped I/O & Register Access 50 min read
▸ Memory-mapped register access pattern: #define GPIO_PORTA_BASE 0x40004000UL; #define GPIO_PORTA_DATA (*((volatile uint32_t*)(GPIO_PORTA_BASE + 0x3FC))); direct read: uint32_t val = GPIO_PORTA_DATA; write: GPIO_PORTA_DATA = 0xFF; volatile mandatory - without it compiler may cache register value in CPU register across accesses, causing stale reads; UL suffix ensures no integer overflow on 32-bit address constants; cast pattern: volatile uint32_t * const GPIOA_DATA = (volatile uint32_t *)0x400040FC; alternatively define struct-based SFR map matching chip data sheet register layout
▸ Read-Modify-Write (RMW) operations: SET bit: REG |= (1U << bit); CLEAR bit: REG &= ~(1U << bit); TOGGLE bit: REG ^= (1U << bit); READ bit: (REG >> bit) & 1U; non-atomic RMW problem on multi-core: Core0 reads GPIO, context switch, Core1 writes GPIO, Core0 writes back - corrupts Core1's change; ARM Cortex-M solution: LDREX/STREX exclusive access for shared registers; Aurix TC3xx: dedicated bit-addressable peripheral register segments (PBSER - Peripheral Bus Set/Enable/Reset registers) allow atomic single-bit operations without RMW; many Aurix peripherals provide SETB/CLRB bit-level registers
▸ CMSIS (Cortex Microcontroller Software Interface Standard) register definitions: ARM provides CMSIS-Device header per MCU family; access pattern: GPIOA->ODR |= GPIO_ODR_OD5; GPIOA->MODER &= ~GPIO_MODER_MODE5; CoreSight registers via CMSIS: SCB->CFSR (Configurable Fault Status Register for HardFault analysis); NVIC->ISER[0] = (1 << irq_number) enables IRQ; SysTick->LOAD = reload_value; CMSIS intrinsics: __disable_irq(), __enable_irq(), __DSB() (Data Synchronization Barrier), __ISB() (Instruction Synchronization Barrier) - required after MMU/cache configuration changes
▸ Access width enforcement: some SFRs require specific access width - 8-bit, 16-bit, or 32-bit only; accessing 32-bit register with 8-bit pointer may split into multiple bus transactions causing undefined hardware behavior; solution: always match access width to hardware requirement: volatile uint32_t * const SFR32 = (volatile uint32_t*)addr - forces 32-bit Thumb2 LDR/STR; Aurix TC3xx: Peripheral Bus (SRI bus) - some registers require word-aligned 32-bit access; violating access width generates SRI bus error (CERBERUS error), trapped as MPU violation or bus error interrupt; AUTOSAR Reg.h: REG32(addr), REG16(addr), REG8(addr) macros enforce width explicitly
Volatile, Const & Compiler Barriers 45 min read
▸ Volatile semantics: every read/write to volatile variable generates an actual memory access instruction; prevents four compiler optimizations: (1) caching in register, (2) eliminating seemingly redundant reads, (3) reordering across other volatile accesses, (4) merging consecutive writes; for flag shared between ISR and main loop: volatile uint8_t IsrFlag; main reads: if(IsrFlag) {IsrFlag=0; process();} - without volatile, optimizer may hoist IsrFlag read outside the loop; volatile does NOT provide atomicity for multi-byte variables - use __disable_irq()/load/enable pattern for atomic access of uint32_t on Cortex-M
▸ Const placement and memory sections: const uint8_t LookupTable[256] = {...} - placed in .rodata (flash), never copied to RAM; saves 256 bytes of precious SRAM; const inside function: const uint32_t mask = 0xFF - compiler may optimize to immediate operand, no memory access; pointer to const: const uint8_t *p (reads only); const pointer: uint8_t * const p (fixed address); const pointer to const: const uint8_t * const p (typical for hardware register base address); AUTOSAR MemMap attribute: #define CDD_START_SEC_CONST_32 - places config tables in flash section; verify with objdump that const arrays land in .rodata not .data
▸ Compiler memory barriers: GCC inline assembly barrier: asm volatile ("" ::: "memory") - tells compiler that memory state is clobbered, preventing reordering of memory accesses across this point; equivalent to compiler fence (software barrier only, no hardware serialization); for hardware memory ordering on ARM: __DMB() (Data Memory Barrier) - ensures all memory accesses before barrier are visible before subsequent accesses; __DSB() (Data Synchronization Barrier) - stronger, waits for completion; __ISB() (Instruction Synchronization Barrier) - flushes pipeline, required after writing to VTOR/MPU/SCTLR; CMSIS provides these as intrinsic functions
▸ Optimization interaction pitfalls: GCC -O2 or higher may: (1) move writes out of critical sections if they appear unreachable, (2) inline functions leading to larger code contradicting size goals, (3) eliminate delay loops: for(i=0;i<1000;i++); optimize to nothing - use volatile counter or __NOP() chain; -fno-strict-aliasing: disables strict aliasing rules (allow type punning via union/pointer cast), sometimes needed in legacy embedded drivers; -fno-inline: prevent inlining for stack depth analysis; separate compilation units per AUTOSAR BSW module and use -Os for size-optimized BSW, -O2 for performance-critical SWC; GCC stack usage report: -fstack-usage generates .su files per function
Bit Manipulation Techniques 40 min read
▸ Core operations with portable macros: SET bit n: reg |= (1UL << n); CLEAR bit n: reg &= ~(1UL << n); TOGGLE: reg ^= (1UL << n); READ bit n: (reg >> n) & 1UL; extract bitfield [high:low]: (reg >> low) & ((1UL << (high-low+1))-1); insert bitfield: reg = (reg & ~(mask << low)) | ((val & mask) << low); always use UL suffix to prevent shift-by-more-than-width UB on 16-bit int; MISRA Rule 10.1: shift operands must be unsigned; MISRA Rule 12.2: shift count < bit width of type - static_assert(n < 32, "shift overflow")
▸ CAN signal extraction (DBC-style): 8-byte CAN payload packed with signals at arbitrary bit positions; big-endian (Motorola) vs little-endian (Intel) byte order extraction differs; Intel format signal at startBit=8, length=12: rawVal = (payload[1] | (payload[2] << 8)) & 0x0FFF; scaled value = rawVal * 0.1 - 40.0; Vector CANdb++ DBC encoding; AUTOSAR ComXf signal packing: ComSignalBitPosition, ComSignalBitSize, ComSignalByteOrder; Com_SendSignal() calls ComXf_Pack to place signal bits into PDU buffer; Com_ReceiveSignal() calls ComXf_Unpack to extract; AUTOSAR Transformer (ComXf) generated code handles endianness automatically
▸ Power-of-2 arithmetic tricks: check power of 2: (n & (n-1)) == 0; round up to next power of 2: n--; n|=n>>1; n|=n>>2; n|=n>>4; n|=n>>8; n|=n>>16; n++; align address to boundary: addr = (addr + (align-1)) & ~(align-1); used in AUTOSAR MemMap section alignment; count set bits (popcount): uint32_t v=n; v=v-(v>>1)&0x55555555; v=(v&0x33333333)+((v>>2)&0x33333333); v=(v+(v>>4))&0x0F0F0F0F; return (v*0x01010101)>>24; or use __builtin_popcount(n) GCC intrinsic; find first set bit: __builtin_ctz(n) (count trailing zeros = position of LSB); Aurix CLZ/CTZ instructions via intrinsic
▸ CRC computation via bit manipulation: CRC-8 (polynomial 0x2F, used in AUTOSAR E2E Profile 1): uint8_t crc=0xFF; for each byte: crc^=byte; for(i=0;i<8;i++) crc=(crc&0x80)?(crc<<1)^0x2F:(crc<<1); return ~crc; lookup table optimization: precompute 256-entry table, reduce to one XOR + one table lookup per byte; CRC-32 (0x04C11DB7, Ethernet/ZIP): hardware CRC32 unit on Aurix TC3xx (CRC module) and Cortex-M4/M7 (CRC peripheral); AUTOSAR Crc library (Crc_CalculateCRC8, Crc_CalculateCRC32P4) wraps hardware acceleration with software fallback; E2E library uses CRC for protected communication between SWCs across bus boundaries
Endianness & Data Serialization 35 min read
▸ Endianness fundamentals: little-endian (ARM Cortex-M, x86) - LSB at lowest address; big-endian (network byte order, Motorola CAN signals, some PowerPC) - MSB at lowest address; bi-endian: Aurix TC3xx TriCore operates little-endian; detection at runtime: uint16_t x=0x0102; uint8_t *p=(uint8_t*)&x; if(p[0]==0x01) bigEndian else littleEndian; compile-time via __BYTE_ORDER__ macro (GCC): #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__; AUTOSAR Platform_Types.h: CPU_BYTE_ORDER = LOW_BYTE_FIRST (little) or HIGH_BYTE_FIRST (big)
▸ Byte-swap operations: manual: uint16_t swap16(uint16_t v){return (v<<8)|(v>>8);} uint32_t swap32(uint32_t v){return ((v&0xFF)<<24)|((v&0xFF00)<<8)|((v>>8)&0xFF00)|((v>>24)&0xFF);} GCC built-in: __builtin_bswap16(v), __builtin_bswap32(v), __builtin_bswap64(v) - compile to single REV instruction on ARM Thumb2; network byte order: htons()/ntohs() for uint16, htonl()/ntohl() for uint32 (from <arpa/inet.h> on Linux host or manually implemented on embedded); DoIP uses big-endian for header fields - ECU must bswap before processing
▸ Safe serialization without type punning: memcpy-based approach avoids strict aliasing UB: uint32_t val=0x12345678; uint8_t buf[4]; memcpy(buf, &val, 4); reads bytes in target endian order; explicit serialization for portable network data: buf[0]=(val>>24)&0xFF; buf[1]=(val>>16)&0xFF; buf[2]=(val>>8)&0xFF; buf[3]=val&0xFF; (big-endian); deserialization: val=((uint32_t)buf[0]<<24)|((uint32_t)buf[1]<<16)|((uint32_t)buf[2]<<8)|buf[3]; AUTOSAR Transformer (ComXf) generates serialization code from I-PDU Signal layout in ARXML - handles endianness, scaling, and offset automatically
▸ Practical issues in automotive ECUs: CAN DBC signals use Motorola (big-endian) or Intel (little-endian) byte order per signal; mixing both in same CAN frame is common - each signal must be extracted with its own byte-order logic; UDS request/response multi-byte values: always big-endian per ISO 14229 (e.g., DID 0xF190 = 0xF1 0x90, memorySize in 0x34 = MSB first); Ethernet/DoIP: headers big-endian (UDP/TCP/IP standard); JSON/SomeIP for SOME/IP services: payload encoding defined in ARXML interface description; AUTOSAR SomeIpXf Transformer handles serialization; common bug: reading uint16_t status register on little-endian ARM when register expects big-endian - status appears bit-reversed
Fixed-Point Arithmetic 40 min read
▸ Fixed-point notation: Qm.n format - m integer bits + n fractional bits; Q1.15 (16-bit signed): range -1.0 to +0.99997, resolution 1/32768; Q15.16 (32-bit signed): range ±32767.99998, resolution 1/65536; conversion: float→Q1.15: int16_t fp = (int16_t)(floatVal * 32768.0f); Q1.15→float: float fv = (float)fp / 32768.0f; AUTOSAR uses Q-format for calibration parameters: e.g., Kp gain stored as Q8.8 uint16_t in NvM; Simulink/Embedded Coder generates Q-format code via Fixed-Point Designer toolbox with explicit wordlength and fraction length annotations
▸ Fixed-point arithmetic operations: addition: result (same Q-format) = a + b - no scaling needed; subtraction same; multiplication of Q1.15 × Q1.15 = Q2.30 in 32-bit intermediate → right-shift by 15 to get Q1.15 result: int32_t result = ((int32_t)a * b) >> 15; division: result = ((int32_t)a << 15) / b; overflow in addition: use saturation - if(sum > INT16_MAX) sum=INT16_MAX; ARM Cortex-M4 DSP extensions: QADD16, QSUB16 (saturating 16-bit add/sub), SMULL (32×32→64-bit multiply), SMLAL (multiply-accumulate with 64-bit accumulator) - accessible via CMSIS DSP library arm_math.h
▸ AUTOSAR calibration data and fixed-point: AUTOSAR Software Component Description (SwcDesc ARXML) specifies CompuMethod for each port signal: CompuMeth = LINEAR with coefficients (offset, factor, unit) mapping raw uint16 to physical float; RTE generated code handles conversion in Rte_IWrite/Rte_IRead; Simulink fixed-point design: annotate signal with fixdt(1,16,8) (signed, 16-bit word, 8-bit fraction = Q7.8); code generation with embedded-coder produces explicit Q-format operators with saturation and rounding modes matching hardware behavior; hardware DSP: Aurix TC3xx has no FPU on TriCore-1.6P, making fixed-point essential for control loops requiring sub-millisecond determinism
▸ Precision and overflow guard patterns: always promote to wider type before multiply: int32_t product = (int32_t)q15_a * q15_b (prevents 16-bit overflow); use round-half-up before final shift: product += (1 << (shift-1)); result = product >> shift; saturation after arithmetic: if(result > Q15_MAX) result=Q15_MAX; if(result < Q15_MIN) result=Q15_MIN; overflow detection: check that signs match expected (positive×positive should be positive); common bug in PID controllers: I-term windup - accumulator grows unbounded when actuator saturates; fix: conditional integration (stop integrating when output clamped); MISRA-C:2012 Rule 12.4: avoid side effects in operands of shifts
Hands-On: GPIO & Timer Driver in C 60 min read
▸ GPIO driver implementation for STM32F4 (or QEMU-simulated): define register structs for GPIO: typedef struct { volatile uint32_t MODER; volatile uint32_t OTYPER; volatile uint32_t OSPEEDR; volatile uint32_t PUPDR; volatile uint32_t IDR; volatile uint32_t ODR; volatile uint32_t BSRR; volatile uint32_t LCKR; volatile uint32_t AFR[2]; } GPIO_TypeDef; define base addresses: #define GPIOA ((GPIO_TypeDef*)0x40020000); enable clock via RCC: RCC->AHB1ENR |= (1<<0) (GPIOA clock); configure pin as output: GPIOA->MODER |= (1<<(pin*2)) - mode 01=output; set pin high: GPIOA->BSRR = (1<<pin); clear pin: GPIOA->BSRR = (1<<(pin+16))
▸ SysTick timer driver for 1ms tick: SysTick_Config(SystemCoreClock/1000) sets reload value = (168000000/1000-1) = 167999; SysTick_Handler ISR: volatile uint32_t msTick++; delay function: void delay_ms(uint32_t ms){uint32_t start=msTick; while((msTick-start)<ms);} LED blink: while(1){GPIOA->ODR^=(1<<5); delay_ms(500);} general-purpose timer (TIMx): TIM2->PSC = 167 (prescaler: 168MHz/(167+1)=1MHz tick); TIM2->ARR = 999 (auto-reload: 1000 ticks = 1ms period); TIM2->CR1 |= TIM_CR1_CEN; poll TIM2->SR & TIM_SR_UIF for update event; clear flag: TIM2->SR &= ~TIM_SR_UIF
▸ AUTOSAR-style GPIO abstraction layer: define Dio_ChannelType as uint8_t, Dio_LevelType as uint8_t (STD_HIGH/STD_LOW); function prototypes: Dio_LevelType Dio_ReadChannel(Dio_ChannelType channelId); void Dio_WriteChannel(Dio_ChannelType channelId, Dio_LevelType level); implementation maps channelId to (port, pin) via config table: Dio_ChannelCfgType[channelId] = {.port=GPIOA_BASE, .pin=5}; abstraction allows same code on STM32, Aurix, S32K by swapping config table; Port_Init() configures direction/mode/pullup using similar abstraction over MCAL Port driver; verify: unit test with mock register struct, inject GPIO reads, verify Dio_ReadChannel returns correct level
▸ Common pitfalls and debug checklist: clock not enabled → GPIO registers read/write 0 (garbage) - always enable peripheral clock via RCC/SCG/PCC before accessing registers; wrong pin mode - MODER bits must be set after clock enable; BSRR vs ODR: prefer BSRR (single-write atomic bit-set/reset) over ODR RMW (non-atomic); SysTick priority: set NVIC priority for SysTick ISR - default priority 0 (highest) conflicts with other critical ISRs; wrap-around in msTick: uint32_t arithmetic handles 0xFFFFFFFF→0 correctly if using subtraction (msTick-start) rather than comparison; TRACE32: set breakpoint in SysTick_Handler, verify msTick increments at 1kHz via TRACE32 Var.Watch msTick
3
MISRA-C & Safe Coding
5 chapters • 3.8 hrs reading
MISRA-C:2012 Rules & Directives 50 min read
▸ MISRA-C:2012 structure: 16 Directives (D1.1–D4.14) - implementation-defined behavior, compiler configuration, code style requirements that can't be checked syntactically; 143 Rules (1.1–22.10) - statically checkable source code requirements; category: Mandatory (must comply, no deviation), Required (must comply or get documented deviation with justification), Advisory (best practice, can deviate without formal process); ISO/SAE 21434 requires MISRA compliance for safety-critical SWC code; AUTOSAR C coding guidelines extend MISRA-C:2012 with additional restrictions (no dynamic allocation, no recursion)
▸ Critical mandatory rules: Rule 1.3 - no undefined behavior (UB); Rule 2.1 - no unreachable code; Rule 7.2 - unsigned suffix U/UL on unsigned constants (1U not 1); Rule 8.4 - function definition must have prototype; Rule 10.1–10.8 - essential type model (no implicit conversions between incompatible types); Rule 11.3 - no cast between pointer to object and pointer to different object type; Rule 14.4 - if condition must be essentially boolean; Rule 15.5 - function must have single exit point (single return); Rule 17.3 - no implicit function declaration; Rule 21.3 - no dynamic memory allocation (malloc/free/realloc/calloc prohibited)
▸ Key required rules for automotive C: Rule 8.7 - functions not called in multiple translation units should have internal linkage (static); Rule 12.1 - operator precedence - always use parentheses to make order explicit; Rule 13.5 - right-hand operand of && or || shall not contain persistent side effects; Rule 16.4 - every switch statement shall have a default clause; Rule 18.1 - pointer arithmetic within array bounds only; Rule 20.4–20.7 - macro hygiene rules; Directive D4.1 - run-time failures shall be minimized (precondition checks, defensive coding); Directive D4.6 - typedefs for numeric types (uint8_t etc.) required; Directive D4.7 - return value of functions shall be checked
▸ Deviation management process: when MISRA rule cannot be followed (e.g., Rule 11.5 void* cast in memcpy-based drivers), create formal deviation record: rule ID, justification (why compliance is not feasible), risk assessment (is deviation safe?), authorization (QA lead sign-off); deviation documented in source code comment: /* MISRA C:2012 Rule 11.5 deviation: void* cast required for MCAL Fls_17 driver interface, reviewed by QA on 2024-01-15, risk: LOW */; tool annotation: Polyspace: %deviation MISRA2012_R11.5; PRQA QAC: /* PRQA S 0312 */ comment; tracked in deviation register (Excel/JIRA) for ISO 26262 Part 6 §8.4.6 compliance audit
Common MISRA Violations & Fixes 45 min read
▸ Rule 10.3 violation - implicit narrowing conversion: int32_t x=val; uint8_t y=x; → Rule 10.3 requires explicit cast: uint8_t y=(uint8_t)x; fix: add explicit cast AND verify value fits in range with assert(); Rule 10.4 - mixing signed/unsigned: if(sint8_val > uint8_limit) - both operands must be same essential type; fix: cast to common unsigned type first; Rule 10.6 - composite expression type widening: uint32_t r = (uint16_t)a * (uint16_t)b - multiplication done in uint16 → overflow; fix: cast one operand: (uint32_t)a * b; always check tool output for R10.x violations as they represent real overflow/conversion bugs
▸ Rule 14.3 - controlling expression always true/false: if(uint8_val >= 0) always true (unsigned never negative) - compiler warning and MISRA violation; fix: remove condition or use assertion; Rule 15.5 - multiple return paths: function with early return at error check - fix: use goto cleanup pattern or rework to single return with status variable: Std_ReturnType ret=E_OK; if(!cond1) {ret=E_NOT_OK;} else {...}; return ret; Rule 17.1 - stdarg.h variadic functions prohibited (cannot type-check at compile time); Rule 19.2 - union type prohibits: legitimate deviation required for type punning, justified with review record
▸ Rule 11.1 - function pointer conversions: (int(*)(void))ptr - converting between incompatible function pointers prohibited; fix: ensure all function pointers in dispatch table have identical signature; Rule 11.4 - integer-to-pointer cast: uint32_t addr=0xF0000000; uint32_t *reg=(uint32_t*)addr - flagged by Rule 11.4; required deviation for MMIO access; annotation pattern in Polyspace: /* PRQA S 0306 */ /* cast from integer to pointer required for MMIO */; Rule 15.4 - continue statement: replace with if-else restructuring; Rule 20.1–20.5 - standard library restrictions: no use of setjmp.h, signal.h, stdio.h in production code
▸ Systematic fix workflow: run Polyspace Bug Finder or PRQA QAC on module; export violation report (CSV/HTML); categorize: true violations vs false positives vs justified deviations; for true violations: fix code, re-run tool, verify zero violations in fixed functions; for false positives: annotate with /* PRQA S XXXX */ and document why; track violations per module in metrics dashboard (target: zero mandatory violations, <5 required per 1000 LOC); CI gate: Jenkins pipeline runs Polyspace MISRA check on each commit - fails build if new mandatory violations introduced; review AUTOSAR module source as reference for compliant patterns (e.g., Vector MCAL CAN driver as MISRA-compliant C reference)
CERT C for Security 40 min read
▸ SEI CERT C Coding Standard overview: published by Carnegie Mellon SEI, aligned with ISO/IEC TS 17961 (C secure coding rules); organized into 18 rule categories (ARR, DCL, EXP, FLP, INT, MEM, MSC, POS, PRE, SIG, STR, ERR, ENV, FIO, etc.); severity+likelihood matrix determines priority (P1 highest); automotive relevance: CERT C rules required by ISO/SAE 21434 §11.4.5 for cybersecurity-relevant code; CERT C vs MISRA-C: complementary - MISRA focuses on safety/portability, CERT C focuses on security vulnerabilities (buffer overflows, integer overflows, format string attacks)
▸ Critical CERT C rules for embedded automotive: INT30-C - no unsigned integer wrapping: uint8_t counter+1 → 256 wraps to 0; fix: if(counter == UINT8_MAX) handle_overflow else counter++; INT32-C - signed integer overflow is UB: int32_t a=INT32_MAX; a+1 = UB; fix: check before add: if(a > INT32_MAX - b) handle_overflow; ARR30-C - no out-of-bounds array access: always validate index against array length before access; ARR38-C - bounds-check for pointer arithmetic: validate before p + n; STR31-C - null-terminate strings: always ensure space for '\0' in char arrays used with string functions; ERR33-C - check return values of all library calls (memcpy returns NULL on failure)
▸ MEM and DCL rules for automotive C: MEM30-C - do not use freed memory: set pointer to NULL after free (where malloc is used in host tools); MEM34-C - only free memory allocated via malloc (not stack pointers); DCL30-C - declare objects with appropriate storage duration (avoid returning pointer to local variable); DCL31-C - declare identifiers before using them (no implicit declarations); EXP33-C - do not read uninitialized variables: always initialize all locals; EXP34-C - do not dereference null pointers: check return values of pointer-returning functions; EXP45-C - no assignment in selection statements (confusing = vs ==); MSC32-C - seed PRNG before use (relevant for SecurityAccess seed generation)
▸ CERT C tools and automotive workflow: Coverity Static Analysis: detects CERT C violations including integer overflow, buffer overflows, null pointer dereferences, use-after-free; configured with CERT C rule set; Polyspace Bug Finder: CERT C checker integrated; Helix QAC: CERT C-2016 compliance checking; in automotive CI: run Coverity on all security-relevant modules (Crypto, UDS handler, OTA update); severity mapping: Coverity HIGH → must-fix before milestone; Coverity MEDIUM → fix before release; tool results exported to JIRA tickets; CERT C compliance evidence required as cybersecurity activity deliverable for ISO/SAE 21434 TARA implementation evidence
Static Analysis Tool Integration 35 min read
▸ Tool landscape: Polyspace Bug Finder (MathWorks) - data flow analysis, MISRA-C:2012 + CERT C, integrates with Simulink/Embedded Coder pipeline; Polyspace Code Prover - formal verification proving absence of run-time errors (array bounds, overflows, nullptr dereference proven green/orange/red); PRQA QAC (Perforce Helix QAC) - industry-standard MISRA checker with extensive rule set including AUTOSAR C++14 guidelines; Coverity SAST - deep inter-procedural analysis for security defects; PC-lint Plus (Gimpel) - lightweight fast checker; SonarQube with C embedded plugin - CI dashboard with trend visualization; each tool requires a compilation database (compile_commands.json or Makefile) to understand include paths and defines
▸ Polyspace Bug Finder Jenkins integration: polyspace-bug-finder-server -sources src/ -include-path inc/ -compiler gcc -misra2 -report-path report.html; Jenkins pipeline: stage('StaticAnalysis'){sh 'polyspace-bug-finder-nodesktop -nodesktop ...'} post{always{publishHTML(reportDir:'polyspace-reports', reportFiles:'polyspace-results.html')}}; configure pass/fail threshold: if new HIGH severity bugs found → fail build; Polyspace Access web dashboard: centralized defect tracking with assignment to developers, fix progress tracking, trend charts per module; rule configuration file (.psopts) specifies active rule set, includes/excludes, baseline comparison
▸ Suppression annotations and justification: inline suppression: /* polyspace +1 MISRA-C3:10.3 "Required cast for MCAL interface" */; file-level: /* polyspace-begin MISRA-C3:11.5 */.../* polyspace-end MISRA-C3:11.5 */; QAC: /* PRQA S 0303 */ comment on line; rules for suppression: (1) only suppress false positives or documented deviations, (2) every suppression requires mandatory justification comment, (3) suppressions reviewed in code review and tracked in deviation register; blanket suppression of entire files/modules prohibited; active suppression count tracked as quality metric - target: <2% of violations suppressed
▸ Incremental analysis workflow: scan only changed files per pull request (diff-based analysis) - reduces analysis time from hours to minutes; baseline comparison: compare current scan results against approved baseline - only new violations block merge; integration with git hooks: pre-commit hook runs fast PC-lint check on modified files (<5 seconds) for immediate feedback; nightly full scan with Polyspace Code Prover (complete project formal verification - 30-60 min for 100K LOC); results correlated with coverage metrics: high static analysis defect density in low-coverage modules flags testing gaps; all analysis results archived per software version for ISO 26262 §8.4.7 static analysis documentation requirement
Hands-On: MISRA-Compliant Module 55 min read
▸ Exercise: rewrite a non-compliant CAN receive handler to be MISRA-C:2012 compliant; starting code has: implicit int return type, missing U suffix on constants, unreachable default: case, multiple return statements, non-boolean if condition (if(canHandle) instead of if(canHandle != NULL)), void* cast without annotation, global variable shadowing local; target: zero MISRA mandatory violations in Polyspace output; file structure: Canif_Receive.c + Canif_Receive.h + Canif_Types.h following AUTOSAR naming conventions
▸ Compliant module header template: #ifndef CANIF_RECEIVE_H #define CANIF_RECEIVE_H /* MISRA-C:2012 compliant header */ #include "Std_Types.h" #include "Canif_Types.h" /* function declarations - Rule 8.4 */ extern Std_ReturnType CanIf_RxIndication(uint8 ControllerId, const PduInfoType * const PduInfoPtr); #endif; implementation: static variables for state (Rule 8.6), uint8_t U suffix everywhere, single return path, NULL checks for all pointers before dereference, /* PRQA S XXXX */ only where genuinely required for MCAL interface compliance
▸ Running Polyspace on the fixed module: polyspace-bug-finder -sources Canif_Receive.c -misra2012 -results-dir results/; open HTML report: verify zero MISRA Mandatory violations; expected remaining items: one Rule 11.5 deviation for void* cast in PduInfoPtr handling (annotated and justified); one Advisory 15.5 deviation if using goto-cleanup pattern; verify with polyspace-report -results-dir results/ -output-format html -bug-categories all; compare violation count before/after fix - target: from 12 violations to 1 justified deviation
▸ CI integration of the exercise: add Makefile target: make misra - runs polyspace-bug-finder headless, exits non-zero if mandatory violations found; commit hook: .git/hooks/pre-commit: make misra || exit 1; verify hook blocks commit with violation, passes with clean code; write companion unit tests (Google Test on host) for the CanIf_RxIndication function with mock PduInfoType; run clang-format with automotive style (.clang-format: IndentWidth:4, BreakBeforeBraces: Linux) as pre-commit format check; final deliverable: module with full MISRA compliance, unit test coverage >80%, Polyspace report archived, deviation register entry for Rule 11.5
4
Interrupts & Real-Time
5 chapters • 3.8 hrs reading
Interrupt Handling & Priority Schemes 50 min read
▸ ARM Cortex-M NVIC (Nested Vectored Interrupt Controller): 8-bit priority field - only upper bits implemented (STM32: 4 bits = 16 priority levels, 0=highest, 15=lowest); subpriority within preemption group (NVIC_SetPriorityGrouping); set priority: NVIC_SetPriority(IRQn_Type irq, uint32_t priority); enable: NVIC_EnableIRQ(irq); pending: NVIC_SetPendingIRQ(irq); vector table in flash: __attribute__((section(".isr_vector"))) void (*const VectorTable[])(void) = {reset_handler, nmi_handler, hardfault_handler, ..., TIM2_IRQHandler, USART1_IRQHandler}; VTOR (Vector Table Offset Register): SCB->VTOR = 0x20000000 to relocate to RAM for OTA update
▸ ISR declaration and context: ARM Thumb2 ISR: void TIM2_IRQHandler(void) __attribute__((interrupt)); ISR entry: CPU saves context frame (xPSR, PC, LR, R12, R0-R3) on stack automatically (tail-chaining saves/restores only differ); ISR must use EXC_RETURN value in LR (0xFFFFFFF9 = return to Thread mode using MSP); AUTOSAR MCAL ISR declaration: ISR(Can0_Isr_BusOff) { CanIf_ControllerBusOffIndication(0); } - uses OSEK/AUTOSAR Os_Isr macro to register ISR with OS; ISR nesting: higher-priority IRQ preempts lower; Aurix TC3xx Interrupt System (IR): 255 priority levels (0=lowest, 255=highest), separate ICR register per SRN (Service Request Node)
▸ ISR design best practices: keep ISRs minimal - set flag, write to ring buffer, then process in task/main loop; avoid blocking calls (delay loops, UART polling) inside ISR; never call non-reentrant functions (printf, malloc) from ISR; shared variables between ISR and task: declare volatile, protect with __disable_irq()/__enable_irq() or OS-critical section; ISR execution time budget: ISR period = 1/IRQ_rate; max execution time = budget × (1 - response_time_margin); time-bounded ISR: copy buffer, set flag, exit - defer heavy processing to deferred interrupt service task (DIT pattern, used in FreeRTOS/AUTOSAR); measure ISR execution time with DWT cycle counter: DWT->CYCCNT start/stop around ISR body in debug build
▸ Priority assignment strategy: Rate Monotonic Analysis (RMA): assign higher priority to higher-frequency tasks; example hierarchy (Cortex-M, 4-bit priorities): 0 = SysTick 1ms system tick, 1 = CAN ISR (500µs budget), 2 = UART ISR, 3 = SPI ISR, 5 = ADC DMA complete, 8 = background task; BASEPRI register: __set_BASEPRI(3<<4) blocks all IRQs with priority >= 3 (used by FreeRTOS for critical sections without disabling high-priority ISRs); FAULTMASK: __set_FAULTMASK(1) blocks all interrupts including NMI (use only for HardFault recovery); AUTOSAR OS: OsIsrCategory1 (directly calls Os_Cat1_Isr, no OS API allowed), OsIsrCategory2 (full OS services available, OsIsrMaxAllowedPriority check)
Critical Sections & Atomic Operations 45 min read
▸ Critical section implementation on Cortex-M: save and disable: uint32_t primask = __get_PRIMASK(); __disable_irq(); /* critical section */ __set_PRIMASK(primask); pattern preserves original interrupt state (supports nested critical sections); alternatively: __disable_irq()/__enable_irq() without save (non-nestable - only for single-level protection); AUTOSAR Os API: Os_EnterCriticalSection(SECTION_ID) / Os_ExitCriticalSection(SECTION_ID) - OS-level suspend/resume interrupts with nesting counter; SuspendAllInterrupts() / ResumeAllInterrupts() for OS global critical section; SuspendOSInterrupts() / ResumeOSInterrupts() - suspends only OS-managed interrupts (not Cat1 ISRs)
▸ ARM exclusive access for lock-free atomic operations: LDREX/STREX pair implements load-link/store-conditional; increment shared counter atomically without disabling interrupts: do { uint32_t val = __LDREXW(counter); val++; } while(__STREXW(val, counter)); if STREX returns 1 (exclusive access lost due to interrupt/context switch), retry; CMSIS provides: __atomic_add_fetch, __atomic_compare_exchange_n (GCC built-ins mapped to LDREX/STREX on ARM); C11 stdatomic.h: _Atomic uint32_t counter; atomic_fetch_add(&counter, 1); compiles to LDREX/STREX sequence; limitation: exclusive access monitor cleared on context switch requiring retry loop
▸ Ring buffer implementation with critical section: struct RingBuf { volatile uint8_t buf[256]; volatile uint8_t head; volatile uint8_t tail; }; push (from ISR): buf.buf[buf.head]=data; buf.head++; (uint8 wraps automatically at 256); pop (from task): requires critical section to read tail+head atomically: __disable_irq(); if(buf.head==buf.tail) empty=true else {data=buf.buf[buf.tail]; buf.tail++;} __enable_irq(); alternatively: single-producer single-consumer lock-free: only head written by ISR, only tail written by consumer - no critical section needed if head/tail are 8-bit (single-byte write is atomic on ARM); size check: (uint8_t)(head-tail) gives used count (modular arithmetic)
▸ Aurix TC3xx multi-core critical sections: Aurix has 6 TriCore cores (TC397) sharing LMURAM and peripheral access; spinlock via SEMA4 peripheral: Ifx_SEMA4_acquireLock(SEMA4_CHANNEL_0) - blocks if locked by other core; spin-wait: while(Ifx_SEMA4_acquireLock(ch) != IFX_SUCCESS); release: Ifx_SEMA4_releaseLock(ch); AUTOSAR Multi-Core OS (Autosar 4.x): GetSpinlock(spinlockId)/ReleaseSpinlock(spinlockId) for inter-core data protection; TryToGetSpinlock for non-blocking attempt; scheduling context matters: spinlocks in ISR are dangerous if task on same core also holds spinlock (deadlock); use inter-core message queues (IOC - Inter-OS-Application Communicator in AUTOSAR) as preferred pattern over spinlocks for non-trivial shared state
Timer/Counter Programming 40 min read
▸ ARM Cortex-M SysTick: 24-bit countdown timer fed from HCLK or external reference; SysTick_Config(SystemCoreClock/1000) sets 1ms period; CTRL register: ENABLE(bit0), TICKINT(bit1 - generate IRQ on underflow), CLKSOURCE(bit2 - 1=HCLK, 0=HCLK/8); LOAD register holds reload value (SystemCoreClock/1000 - 1); VAL register holds current count (write any value to clear); SysTick interrupt priority: SCB->SHP[11] = priority << (8 - NUM_PRIO_BITS); use for OS tick in FreeRTOS/AUTOSAR; limitation: 24-bit only (max period at 168MHz = ~100ms); for longer periods use general-purpose timer (TIM2 32-bit on STM32F4)
▸ STM32 general-purpose timer (TIMx) modes: upcounting: counter increments from 0 to ARR → overflow → UIF flag set → optional update interrupt; TIM2->PSC = psc-1; TIM2->ARR = period-1; TIM2->CR1 |= TIM_CR1_CEN; input capture: measure time between external edge transitions (e.g., PWM period measurement, encoder position); output compare: generate PWM - TIM2->CCMR1 = (6<<4) (PWM mode 1); TIM2->CCR1 = duty_cycle; TIM2->CCER |= TIM_CCER_CC1E; encoder mode: TIM2->SMCR = TIM_SMCR_SMS_0|TIM_SMCR_SMS_1 (encoder mode 3); Aurix TC3xx GTM (Generic Timer Module): hierarchical timer architecture for automotive applications (ATOM/TOM channels, multi-channel PWM synchronization)
▸ AUTOSAR GPT (General Purpose Timer) driver: Gpt_StartTimer(GptChannelId, timeoutTicks); Gpt_GetTimeElapsed(channel) returns elapsed ticks; Gpt_StartNotification(channel) enables callback on timeout; GPT abstraction over hardware timer - callback function registered in GptChannelCfg.GptNotification; AUTOSAR OS AlarmAction: SetRelAlarm(AlarmId, increment, cycle) triggers OS task/event on timer expiry - preferred over bare-metal timer for schedulable timing; MCAL GPT driver maps to hardware timer (SysTick, TIM2, Aurix GTM-ATOM) based on GptChannelCfg hardware mapping; Gpt_GetTimeElapsed returns ticks in Gpt_ValueType (implementation-defined bit width)
▸ Timer precision and jitter: timer resolution = 1/timer_clock_Hz; at 1MHz timer clock, resolution = 1µs; jitter sources: ISR latency (0-32 cycles on Cortex-M depending on stack depth), priority inversion, DMA bus contention; measure jitter with oscilloscope or logic analyzer on GPIO toggle in timer ISR; reduce jitter: increase ISR priority, minimize critical section duration, avoid long ISRs that delay timer ISR; Aurix TC3xx: STM (System Timer Module) provides 56-bit counter at 100MHz - 10ns resolution for timestamping without jitter issues; DWT_CYCCNT on Cortex-M: 32-bit cycle counter at CPU frequency - use for microsecond-accurate profiling without dedicated timer; AUTOSAR EcuM WakeupSource configuration links timer wakeup to power management
DMA Fundamentals 35 min read
▸ DMA architecture on STM32F4: DMA controller has 8 streams × 8 channels each; stream = hardware data path (FIFO + arbitration); channel = peripheral request source mapping; configuration: DMA2_Stream0->CR = DMA_SxCR_CHSEL_0 (channel 0) | DMA_SxCR_DIR_1 (M2P) | DMA_SxCR_MINC (memory increment) | DMA_SxCR_MSIZE_0 (16-bit memory) | DMA_SxCR_PSIZE_0 (16-bit peripheral) | DMA_SxCR_TCIE (transfer complete interrupt); set peripheral address: DMA2_Stream0->PAR = (uint32_t)&ADC1->DR; set memory address: DMA2_Stream0->M0AR = (uint32_t)adcBuffer; set count: DMA2_Stream0->NDTR = BUFFER_SIZE; enable: DMA2_Stream0->CR |= DMA_SxCR_EN
▸ DMA transfer modes: Memory-to-Memory (memcpy offload - frees CPU during bulk data move); Peripheral-to-Memory (P2M - ADC, UART/SPI RX → RAM buffer); Memory-to-Peripheral (M2P - RAM buffer → SPI/UART/DAC TX); circular mode: DMA_SxCR_CIRC bit - buffer automatically refilled, TC interrupt at end, HT interrupt at half; double-buffer mode (DBM): DMA_SxCR_DBM - alternates between M0AR and M1AR, CPU processes one while DMA fills other (zero-copy streaming); FIFO mode vs direct mode: FIFO (4-word depth) increases burst efficiency; direct mode triggers single transfer per peripheral request; Aurix TC3xx DMA: 128 channels, multi-channel linked list (LLI) for scatter-gather transfers
▸ Cache coherency with DMA on Cortex-M7 (D-cache): when CPU and DMA share memory, D-cache creates coherency problem - CPU writes to cache (not RAM), DMA reads stale RAM; solution for M2P: SCB_CleanDCache_by_Addr((uint32_t*)txBuffer, bufLen) before enabling DMA TX - flushes cache to RAM; for P2M: SCB_InvalidateDCache_by_Addr((uint32_t*)rxBuffer, bufLen) after DMA complete ISR - invalidates cache so CPU reads DMA-written data from RAM; OR use non-cacheable memory region (MPU attribute noncacheable) for DMA buffers; buffer alignment: __attribute__((aligned(32))) required - cache line is 32 bytes on Cortex-M7; partial cache line invalidation corrupts adjacent data
▸ AUTOSAR MCAL DMA driver (Dma module): Dma_SetupChannel(channelId, &config) configures a DMA channel; Dma_StartChannel(channelId) initiates transfer; Dma_GetChannelStatus(channelId) polls completion; Dma_SetupLinkedListMode(channelId, linkedListItems) for scatter-gather; notification callback: DmaCfg.TransferCompleteCallback = App_DmaComplete; AUTOSAR SPI handler uses DMA internally for high-speed SPI master transfers - application just calls Spi_AsyncTransmit(); transparency: SPI handler abstracts DMA complexity; common bug: DMA channel conflicts - two peripherals attempting to use same DMA stream simultaneously → undefined behavior; DMA stream-channel mapping must match chip data sheet (STM32: Table 43 in Reference Manual)
Hands-On: ISR-Driven Communication 55 min read
▸ Lab objective: implement UART receive driver using ISR + ring buffer on STM32F4 (or QEMU emulation); hardware setup: USART1 PA9/PA10, 115200 baud, 8N1; initialization: RCC->APB2ENR |= RCC_APB2ENR_USART1EN; USART1->BRR = SystemCoreClock/115200; USART1->CR1 = USART_CR1_UE|USART_CR1_RE|USART_CR1_RXNEIE; NVIC_SetPriority(USART1_IRQn, 5); NVIC_EnableIRQ(USART1_IRQn); ring buffer: volatile uint8_t rxBuf[256]; volatile uint8_t rxHead=0, rxTail=0; ISR: void USART1_IRQHandler(void){ if(USART1->SR & USART_SR_RXNE) {rxBuf[rxHead++]=(uint8_t)USART1->DR;} }
▸ Consumer task implementation: uint8_t Uart_ReadByte(uint8_t *data) { if(rxHead==rxTail) return 0; *data=rxBuf[rxTail++]; return 1; } line buffer: collect bytes until '\n'; process command string; error handling: USART1->SR & USART_SR_ORE (overrun error - ISR too slow or ring buffer full) → clear by reading SR then DR; USART1->SR & USART_SR_FE (framing error - baud rate mismatch, noise) → log and clear; USART1->SR & USART_SR_NE (noise error) → increment noise counter; DMA upgrade: configure USART1 DMA Rx in circular mode with half-transfer + full-transfer interrupts for zero-copy receive of large packets
▸ ISR latency measurement exercise: toggle GPIO in ISR entry and exit; measure pulse width on oscilloscope or logic analyzer; expected: <1µs ISR entry latency on Cortex-M4 at 168MHz (12 cycles = 71ns minimum from IRQ trigger to ISR first instruction); measure worst case: ISR triggered while DMA is running (bus contention adds 2-4 cycles); use DWT_CYCCNT: enter ISR: uint32_t start=DWT->CYCCNT; exit ISR: cycleCount=DWT->CYCCNT-start; log to ring buffer, analyze distribution; target: max ISR execution time < 10µs for 115200 baud (one byte period = 86µs)
▸ Common pitfalls and debug checklist: ISR not executing - check: clock enabled? NVIC enabled? RXNEIE bit set in USART CR1? correct IRQ name in vector table? data corruption in ring buffer - check volatile on rxHead/rxTail; check uint8 wrapping (256 modulo) works correctly for buffer size 256; overrun error at high baud rate - increase NVIC priority of USART ISR or switch to DMA; FIFO behavior on STM32F4 USART: no FIFO - each byte must be read before next arrives (86µs window at 115200); echo test: connect UART TX loopback to RX, transmit known pattern, verify ISR receives identical bytes; TRACE32 set breakpoint in USART1_IRQHandler, verify rxHead increments on each received byte
5
RTOS Fundamentals
5 chapters • 4.2 hrs reading
RTOS Concepts - Tasks, Scheduling, Priorities 50 min read
▸ RTOS task model: task = independent thread of execution with own stack; FreeRTOS task creation: xTaskCreate(TaskFnc, "TaskName", STACK_DEPTH, pvParams, PRIORITY, &xHandle); AUTOSAR OS task: TASK(TaskName){/* body */} defined in OIL/ARXML; AUTOSAR task types: basic task (run-to-completion, no blocking), extended task (can wait on events); FreeRTOS task states: Running → Blocked (pend on semaphore/delay) → Ready → Suspended; AUTOSAR OS states: Running → Waiting (WaitEvent) → Ready (SetEvent/ActivateTask) → Suspended (TerminateTask); preemptive scheduling: higher priority Ready task immediately preempts lower priority Running task
▸ Scheduling algorithms: preemptive fixed-priority (FreeRTOS configUSE_PREEMPTION=1, AUTOSAR FULL_PREEMPTIVE): higher priority task runs immediately when ready; cooperative (configUSE_PREEMPTION=0): task voluntarily yields via taskYIELD() or blocking call; time-sliced round-robin (configUSE_TIME_SLICING=1): equal-priority tasks share CPU in tick-period slices; Rate Monotonic Scheduling (RMS): assign priority proportional to frequency - 1ms task highest priority, 100ms task lowest; schedulability test: sum of (Ci/Ti) ≤ n(2^(1/n)-1) where Ci=WCET, Ti=period; for 3 tasks: sum ≤ 3(2^(1/3)-1) = 0.78 (78% CPU utilization bound); AUTOSAR OS: OsTaskPeriod, OsTaskActivation in OIL file define periodic task scheduling
▸ Stack sizing for tasks: FreeRTOS: stack size in words (not bytes) - 256 words = 1KB on 32-bit; minimum stack: 128 words for simple task + 64 words ISR overhead; calculate: sum of all local variable sizes in deepest call chain + 32 words for context save; GCC stack usage files: compile with -fstack-usage → each .c generates .su file with per-function stack usage; TRACE32: Os.task view shows stack pointer per task; fill stack with 0xA5 pattern: after system run, scan task stack from bottom - first non-0xA5 byte = high-water mark; FreeRTOS: uxTaskGetStackHighWaterMark(handle) returns remaining words; AUTOSAR: OsStacksize must be multiple of 8 bytes (stack alignment)
▸ AUTOSAR OS task configuration example (OIL): TASK TaskCtrl_1ms { PRIORITY = 10; ACTIVATION = 1; SCHEDULE = FULL; AUTOSTART = FALSE; STACKSIZE = 1024; EVENT = EvtCtrl; }; ALARM AlarmCtrl_1ms { COUNTER = SystemTimer; ACTION = ACTIVATETASK { TASK = TaskCtrl_1ms; }; AUTOSTART { ALARMTIME = 1; CYCLETIME = 1; APPMODE = OSDEFAULTAPPMODE; }; }; configure OsTaskTimingProtection (AUTOSAR 4.x): ExecutionBudget=500µs (WCET), TimeFrame=1ms (period), ResourceLockBudget per resource; OS ErrorHook reports OSEK error codes (E_OS_LIMIT if task activated while already active, E_OS_ID if invalid task ID, E_OS_MISSINGEND if task body exits without TerminateTask)
Semaphores, Mutexes & Message Queues 55 min read
▸ Binary semaphore: FreeRTOS: xSemaphoreCreateBinary(); give from ISR: xSemaphoreGiveFromISR(sem, &higherPriorityTaskWoken); portYIELD_FROM_ISR(higherPriorityTaskWoken); take in task: xSemaphoreTake(sem, portMAX_DELAY) blocks until signaled; AUTOSAR: WaitEvent(EvtSem); ClearEvent(EvtSem); triggered by SetEvent(taskId, EvtSem) from another task or ISR; binary semaphore for task synchronization - ISR signals task when ADC conversion complete; counting semaphore: xSemaphoreCreateCounting(maxCount, initialCount); used for resource pool of N items; AUTOSAR Os: no counting semaphore directly - simulate with EventMask or use counting RES mechanism
▸ Mutex for mutual exclusion: FreeRTOS: xSemaphoreCreateMutex(); take: xSemaphoreTake(mutex, timeout) blocks; give: xSemaphoreGive(mutex) releases; mutex holder priority temporarily elevated when higher priority task waits (priority inheritance - FreeRTOS automatic); AUTOSAR RESOURCE mechanism: GetResource(RES_UART) / ReleaseResource(RES_UART) - ceiling priority protocol (task runs at max(task_priority, resource_ceiling_priority)); OsResource OsResourceCeilingPriority = highest priority that uses this resource; difference from FreeRTOS mutex: AUTOSAR resources cannot be taken in ISR, cannot be taken recursively; mutex vs semaphore: mutex has ownership (only taker can give), semaphore has no ownership (ISR gives, task takes)
▸ Message queues for inter-task communication: FreeRTOS: xQueueCreate(length, itemSize); send: xQueueSend(queue, &item, timeout); receive: xQueueReceive(queue, &item, portMAX_DELAY); send from ISR: xQueueSendFromISR(); queue of structs: typedef struct {uint32_t id; uint8_t data[8];} CanMsg_t; xQueueCreate(10, sizeof(CanMsg_t)); ISR copies CAN frame to queue on RX, CAN processing task reads; queue depth prevents data loss under burst conditions; AUTOSAR IOC (Inter-OS-Application Communicator): IOC_WriteIocGroup1(GroupId, &data) / IOC_ReadIocGroup1(GroupId, &data) - type-safe, generated by RTE, works across OS-Application boundaries and across cores on multi-core Aurix
▸ Choosing between synchronization primitives: binary semaphore = ISR→task notification (signal without data); counting semaphore = bounded resource pool; mutex = protect shared resource (use with priority inheritance); message queue = pass data + synchronization; event flags (AUTOSAR EventMask) = multiple events in single task wait; common mistakes: giving mutex from ISR (undefined on FreeRTOS - semaphore required); taking mutex in ISR (blocks ISR = system deadlock); setting event from task context without AUTOSAR WaitEvent on receiving task (event lost if task not waiting); using global variable instead of queue for producer-consumer (not protected against race condition); AUTOSAR Com module uses IOC internally for inter-SWC signal routing across partitions
Deadlock Prevention & Priority Inversion 45 min read
▸ Priority inversion problem: Task_High (prio=10) blocks on mutex held by Task_Low (prio=1); Task_Mid (prio=5) runs instead of Task_Low since Task_Mid is higher priority than Task_Low; Task_High effectively runs at Task_Low's priority - violation of real-time guarantees; classic example: Mars Pathfinder (1997) - VxWorks priority inversion crashed the rover; solutions: Priority Inheritance (FreeRTOS mutex): Task_Low temporarily inherits Task_High's priority while holding mutex; Priority Ceiling Protocol (AUTOSAR OsResource): task holding resource runs at resource's ceiling priority; best approach for hard real-time: Priority Ceiling Protocol since worst-case bound is deterministic
▸ Deadlock conditions (Coffman's four): mutual exclusion (resource can't be shared), hold and wait (hold one resource while waiting for another), no preemption (resource not forcibly taken), circular wait (A waits for B, B waits for A); deadlock in RTOS example: Task_A holds MutexX, waits for MutexY; Task_B holds MutexY, waits for MutexX → circular wait = deadlock, both tasks blocked forever; detection: TRACE32 Os.task view shows both tasks in WAITING state permanently; Aurix: TRACE32 Peripherals->SEMA4 view shows locked spinlock with no holder making progress
▸ Deadlock prevention strategies: lock ordering - always acquire mutexes in fixed global order (MutexX before MutexY always); prevents circular wait; enforced by code review and Coverity deadlock analysis; timeout on mutex take: xSemaphoreTake(mutex, pdMS_TO_TICKS(100)) - if timeout occurs, release all held mutexes, log error, retry; try-lock pattern: xSemaphoreTake(mutex, 0) - non-blocking attempt; on failure, release other held resources; AUTOSAR OsResource cannot deadlock if all tasks acquire resources in same ceiling-priority order (ceiling protocol guarantees); AUTOSAR multi-core: GetSpinlock ordering enforced in OIL configuration - AUTOSAR OS checks acquisition order at runtime
▸ FreeRTOS watchdog integration for livelock/deadlock detection: software watchdog timer per task: each task must call Watchdog_Kick(taskId) within deadline; if watchdog ISR fires without kick, log deadlock evidence (stack traces of all tasks via xTaskGetTaskInfo()) then trigger controlled reset; AUTOSAR OS Timing Protection: ExecutionBudget per task (OS kills task if exceeded), InterArrivalTime (min time between activations), ResourceLockBudget (max time holding resource before OS ErrorHook + protective action); Timing Protection violation: AUTOSAR OS calls ProtectionHook(E_OS_PROTECTIONCODE) → application decides: PRO_TERMINATETASKISR (kill task), PRO_TERMINATEAPPL (kill OS-Application), PRO_SHUTDOWN (full shutdown)
OSEK/VDX OS for Automotive 40 min read
▸ OSEK/VDX OS standard (ISO 17356): consortium standard defining a deterministic RTOS for automotive ECUs; evolved into AUTOSAR OS (AUTOSAR 4.x compatible); four conformance classes: BCC1 (basic tasks, one activation), BCC2 (multiple activations per task), ECC1 (extended tasks with WaitEvent, one activation), ECC2 (extended tasks + multiple activations); AUTOSAR OS conformance class = ECC2 (superset of all); scheduling policies: FULL_PREEMPTIVE (standard), NON_PREEMPTIVE (task runs without preemption), MIXED (per task OsTaskSchedule flag); system calls: ActivateTask, TerminateTask, ChainTask (terminate + activate atomically), Schedule (cooperatively yield), GetTaskID, GetTaskState
▸ AUTOSAR OS Alarm and Schedule Table: Alarm: SetRelAlarm(alarmId, offset_ticks, period_ticks) → activates task or sets event at regular intervals; Schedule Table: predefined sequence of expiry points each activating different tasks (e.g., StSchedule_1ms: 0ms→ActivateTask(Task1ms), 2ms→ActivateTask(Task2ms), 10ms→ActivateTask(Task10ms), 20ms→stop); StartScheduleTableRel/StartScheduleTableAbs; Schedule Tables offer zero jitter for synchronized multi-rate task activation; used in engine control for crank-synchronized task activation; OsScheduleTableDuration, OsScheduleTableRepeating (cyclic vs one-shot); AUTOSAR 4.x: NextScheduleTable() for seamless switchover between tables
▸ Error handling hooks: StartupHook() - called after OS initialized, before first task runs (initialize BSW, hardware); ShutdownHook(StatusType error) - called on ShutdownOS(); ErrorHook(StatusType error) - called on any OS API error (E_OS_LIMIT, E_OS_ID, etc.); PreTaskHook() / PostTaskHook() - called before/after every task switch; ProtectionHook(StatusType protection) - called on timing protection violation; Os_ErrorGetServiceId() and Os_ErrorGetParam_XXX() macros in ErrorHook to identify failing API call; typical ErrorHook: log error code + task ID to circular NvM buffer for post-crash analysis; never call blocking OS APIs from hooks (deadlock risk)
▸ Comparison AUTOSAR OS vs FreeRTOS in automotive context: AUTOSAR OS: static configuration (all tasks/resources/alarms defined at compile time - zero runtime overhead), ISO 26262 ASIL-D certified (Vector SC1/SC2/SC3 OS implementations), no dynamic task creation, deterministic worst-case latency proven; FreeRTOS: dynamic creation allowed, smaller footprint for resource-constrained MCUs, TCO lower for non-safety-critical ECUs; AUTOSAR OS Memory Protection: OS-Application partitioning + MPU enforces each application's memory access rights; AUTOSAR Multi-Core OS: one Os instance per core, SpinLock mechanism, IOC for inter-core data - all defined in ARXML and generated by SystemDesk/EB tresos configurator; typical automotive ECU: Aurix TC397 runs AUTOSAR OS on 6 cores simultaneously
Hands-On: Multi-Task RTOS Application 60 min read
▸ Lab: FreeRTOS multi-task application on STM32F4 (or QEMU Cortex-M4 emulation); create 3 tasks: Task_Sensor (prio=3, 10ms period) reads ADC via DMA-complete semaphore, converts raw value to physical unit, puts result in queue; Task_Control (prio=5, 1ms period) reads sensor queue, computes PID output (fixed-point Q8.8), writes control output to DAC; Task_Comms (prio=2, 100ms period) reads sensor+output values, formats ASCII report, sends via UART TX queue; SysTick at 1kHz for FreeRTOS tick; demonstrate: trace analysis confirms preemption order matches priorities, no missed deadlines
▸ FreeRTOS configuration (FreeRTOSConfig.h): configUSE_PREEMPTION=1; configUSE_TIME_SLICING=0; configTICK_RATE_HZ=1000; configMAX_PRIORITIES=8; configMINIMAL_STACK_SIZE=128; configTOTAL_HEAP_SIZE=16384; configUSE_MUTEXES=1; configUSE_COUNTING_SEMAPHORES=1; configUSE_TRACE_FACILITY=1; configGENERATE_RUN_TIME_STATS=1 (for CPU load measurement); configCHECK_FOR_STACK_OVERFLOW=2 (writes 0xA5 pattern); configUSE_MALLOC_FAILED_HOOK=1; hook implementations: void vApplicationStackOverflowHook(TaskHandle_t, char *name){Log_Error("Stack overflow in task: %s", name); while(1);}
▸ Runtime statistics and tracing: FreeRTOS runtime stats: vTaskGetRunTimeStats(buf) prints task name, total CPU cycles, percentage; use TIM2 as run-time counter (free-running at 1MHz): portCONFIGURE_TIMER_FOR_RUN_TIME_STATS() sets up timer; portGET_RUN_TIME_COUNTER_VALUE() reads it; Percepio Tracealyzer integration: vTraceEnable(TRC_START) records all task switches, queue operations, semaphore gives/takes; open .psf trace file in Tracealyzer - visualize task Gantt chart, find priority inversions, measure response times; SEGGER SystemView: real-time streaming trace over RTT (Real-Time Transfer) to J-Link - zero CPU overhead profiling
▸ Validation and common pitfalls: validate: oscilloscope GPIO toggles in each task ISR show 1ms/10ms/100ms periods; uxTaskGetStackHighWaterMark shows adequate margin (>32 words free); no WDT resets; UART output rate matches 100ms; common bugs: Task_Control blocking on sensor queue with portMAX_DELAY - misses 1ms deadline if sensor takes >1ms; fix: use xQueueReceive with 0 timeout (poll) and use last valid value on empty queue; stack overflow in Task_Comms using sprintf with large buffer - increase stack; SysTick priority conflict with FreeRTOS: configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY must be set (typically 5) - IRQs above this priority cannot call FreeRTOS API, ISRs calling FromISR functions must have priority ≥ this value
6
C++ for Automotive
5 chapters • 3.8 hrs reading
C++ in Embedded - Classes & RAII 45 min read
▸ C++ in embedded constraints: AUTOSAR Adaptive (AP) platform uses C++14/17; AUTOSAR Classic (CP) uses C++03/11 subset (AUTOSAR C++14 coding guidelines); forbidden features in safety-critical embedded C++: dynamic memory allocation (new/delete), virtual functions with runtime polymorphism (vtable overhead + determinism), exceptions (unwinding overhead, code size, non-deterministic), RTTI (typeid, dynamic_cast - runtime overhead); allowed: templates (resolved at compile time), constexpr, inline functions, value semantics, std::array, std::span; compile flag -fno-exceptions -fno-rtti -fno-unwind-tables reduces code size; AUTOSAR AP: ARA (Adaptive Runtime) uses constrained C++17
▸ Class design for embedded: constructors must not allocate heap; prefer aggregate initialization; member variables: prefer value types (uint32_t, bool) over pointers; class size matters - fits in cache line (64 bytes Cortex-A, 32 bytes Cortex-M7); example AUTOSAR-style class: class CanFrame { public: uint32_t id; uint8_t dlc; uint8_t data[8]; constexpr CanFrame() : id(0), dlc(0), data{} {} constexpr CanFrame(uint32_t id, uint8_t dlc) : id(id), dlc(dlc), data{} {} }; static_assert(sizeof(CanFrame) == 16, "unexpected size"); constexpr enables compile-time computation; trivially copyable types are safe for memcpy-based IPC
▸ RAII (Resource Acquisition Is Initialization): constructor acquires resource, destructor releases; example - interrupt critical section RAII guard: class CriticalSection { uint32_t savedPrimask; public: CriticalSection() {savedPrimask=__get_PRIMASK(); __disable_irq();} ~CriticalSection() {__set_PRIMASK(savedPrimask);} CriticalSection(const CriticalSection&)=delete; }; usage: {CriticalSection guard; /* protected code */} // destructor automatically enables IRQ on scope exit even if exception occurs; same pattern for mutex: MutexGuard, for NVM transactions: NvmTransaction; eliminates forgotten unlock bugs; AUTOSAR CP: limited RAII use (no exceptions) but destructor still called on scope exit in C++ mode
▸ C++ overhead analysis for embedded: virtual function: adds 8-byte vtable pointer per object + indirect call (1-3 cycle branch prediction miss); avoid in tight control loops; template instantiation: each unique template argument creates separate code - template<typename T> T Max(T a, T b) generates separate functions for uint8_t, uint16_t, uint32_t; use extern template to prevent duplicate instantiation across translation units; constructor/destructor overhead: trivial constructors/destructors generate zero code; complex constructors with initializer lists compile to same code as C struct initialization; AUTOSAR Adaptive coding guideline A12-1-1: compiler-generated constructors preferable over user-defined when possible; GCC: use -fvisibility=hidden to prevent default export of C++ symbols, reducing binary size
Templates & Compile-Time Polymorphism 40 min read
▸ Function templates for type-safe operations: template<typename T> constexpr T Clamp(T val, T lo, T hi) {return val<lo?lo:(val>hi?hi:val);} works for uint8_t, int16_t, float with zero overhead vs runtime type check; template specialization for hardware-specific types: template<> int32_t Clamp(int32_t val, int32_t lo, int32_t hi) {return __SSAT(val<<0, 24);} using ARM SSAT (Saturate Signed) intrinsic; non-type template parameters: template<uint32_t BASE_ADDR, uint32_t BIT> inline void SetBit() {*((volatile uint32_t*)BASE_ADDR) |= (1U<<BIT);} SetBit<0x40020000, 5>() - address and bit resolved at compile time, no runtime overhead
▸ CRTP (Curiously Recurring Template Pattern) - static polymorphism: template<typename Derived> class Driver { public: void init() {static_cast<Derived*>(this)->init_impl();} }; class CanDriver : public Driver<CanDriver> { public: void init_impl() {/* CAN hardware init */} }; no vtable, no virtual dispatch overhead - method resolved at compile time; used in Adaptive AUTOSAR ara::core::Result<T,E> error handling (CRTP-based); AUTOSAR AP Service Proxy/Skeleton: template-based type-safe service interface generation (ara::com API uses template instantiation per service type)
▸ Policy-based design with templates: separate concerns into policy classes combined via templates: template<class LockPolicy, class StoragePolicy> class RingBuffer : public LockPolicy, public StoragePolicy {...}; LockPolicy variants: NoLock (single-task), DisableIrqLock (ISR-safe), MutexLock (RTOS-safe); StoragePolicy variants: StaticStorage<N> (compile-time size), NvmBackedStorage (persistent); instantiation: RingBuffer<DisableIrqLock, StaticStorage<256>> - zero runtime polymorphism cost; constexpr evaluation: C++14 constexpr functions evaluated at compile time: constexpr uint32_t ComputeCrc(uint8_t data) {/* CRC algorithm */} evaluated during compilation, result embedded as constant in binary
▸ Template metaprogramming for compile-time checks: static_assert with type traits: static_assert(std::is_trivially_copyable<CanFrame>::value, "CanFrame must be trivially copyable for IPC"); static_assert(sizeof(CanFrame) == 16, "Size mismatch"); enable_if for conditional compilation: template<typename T, typename = typename std::enable_if<std::is_unsigned<T>::value>::type> T saturating_add(T a, T b); only compiles for unsigned types; type traits in AUTOSAR AP ara::core: ara::core::Optional<T>, ara::core::Result<T,E>; if constexpr (C++17): template<typename T> void serialize(T val) { if constexpr (std::is_same_v<T,uint16_t>) {/* 16-bit path */} else {/* generic path */}} - branch eliminated at compile time
Smart Pointers & Memory Management 45 min read
▸ Embedded memory constraints and forbidden heap usage: ISO 26262 and MISRA prohibit dynamic memory allocation (malloc/new) in safety-critical code - non-deterministic allocation time, heap fragmentation risk, allocation failure at runtime; exceptions: AUTOSAR Adaptive platform uses heap in non-safety OS-Applications with bounded allocation (use ara::core::Optional to avoid null pointer dereference); Classic AUTOSAR: NvM, COM, DCM all use statically-allocated pools; replacements: static arrays, memory pools (fixed-size block allocators), stack allocation for temporaries; operator new overload: restrict heap to controlled pool - class MyModule { static uint8_t pool[1024]; void* operator new(size_t sz) {...}; }
▸ std::unique_ptr in embedded (AUTOSAR Adaptive): std::unique_ptr<ara::com::SomeService> service = std::make_unique<ara::com::SomeService>(); automatic cleanup on scope exit without heap fragmentation concern (object created once, lives for application lifetime in AP context); std::unique_ptr overhead: zero runtime overhead vs raw pointer - destructor inlined, deleter resolved at compile time; std::unique_ptr for optional resources: if service creation fails, pointer is null (check before use); never use std::shared_ptr in safety-critical code - atomic reference count is non-deterministic overhead; std::unique_ptr allowed in AUTOSAR AP ARA services and daemon main functions
▸ Static memory pool implementation: template<typename T, uint8_t N> class Pool { T objects[N]; bool inUse[N]{}; public: T* allocate() { for(uint8_t i=0;i<N;i++) if(!inUse[i]){inUse[i]=true;return &objects[i];} return nullptr; } void release(T *p) {inUse[p-objects]=false;} }; usage: Pool<CanFrame, 32> canPool; CanFrame *f = canPool.allocate(); if(f) {/* use */; canPool.release(f);} allocation O(N) but deterministic; for O(1): use free-list with linked nodes; AUTOSAR MemIf: manages NvM block pool using similar pattern; OSEK/AUTOSAR IOC internal buffer pools use fixed-size array of message slots per channel
▸ Memory safety patterns replacing pointers: std::array<uint8_t, 8> instead of raw array + length - bounds-checked in debug builds (at() method), same overhead as raw array with operator[]; std::span<uint8_t> (C++20) for non-owning buffer views with size: void process(std::span<const uint8_t> data) {for(auto b : data){...}} - replaces (uint8_t *buf, size_t len) pattern; ara::core::Optional<CanFrame> for nullable return without null pointer: Optional<CanFrame> parseFrame(uint8_t *raw); if(auto f=parseFrame(buf); f.has_value()) {use(*f);} - eliminates null dereference bugs; Adaptive AUTOSAR ARA error handling: ara::core::Result<T, ara::core::ErrorCode> for functions that can fail - monadic error propagation without exceptions
C++14/17 Features for Adaptive AUTOSAR 40 min read
▸ C++14 features used in AUTOSAR AP: generic lambdas: [](auto x){return x*2;} - used in ARA service callbacks and Future/Promise continuations; variable templates: template<typename T> constexpr T Pi = T(3.14159265358979); make_unique<T>: std::make_unique<SomeIpSerializer>() - safer than new SomeIpSerializer(); std::integer_sequence for compile-time index generation (used in tuple processing for SOME/IP serialization); constexpr functions with multiple statements (relaxed in C++14 vs C++11 single-expression only); auto return type deduction: auto ComputeChecksum(std::span<uint8_t> data) -> uint32_t
▸ C++17 features in AUTOSAR AP: if constexpr: template<typename T> void serialize(T v) {if constexpr(std::is_integral_v<T>) {/* int path */} else {/* float path */}} - compile-time branch elimination; std::variant<ara::core::ErrorCode, ServiceResponse> - type-safe tagged union replacing void*/error code pattern; std::optional<T> - explicit nullable (replaces nullptr/sentinel values); structured bindings: auto [status, value] = ReadData(); fold expressions for variadic templates: (args + ...) + 0 sums all arguments; std::string_view: non-owning string reference - avoid string copies for diagnostic messages; if-init statements: if(auto result = TryLock(mutex); result == OK) {/* use locked resource */}
▸ AUTOSAR AP C++ coding guidelines (AUTOSAR_RS_CPP14Guidelines.pdf): Rule A0-1-1: no unreachable code; Rule A2-10-5: unique identifier names; Rule A5-1-1: use of lambda only if necessary; Rule A7-2-3: enumerations to be scoped (enum class); Rule A9-3-1: member functions to be non-virtual (prefer CRTP); Rule A11-0-2: public data members only in structs; Rule A12-1-1: trivial constructors for safety-critical classes; Rule A15-1-1: exceptions used only for error handling, not control flow; Rule A18-5-1–5: no dynamic memory allocation in safety-critical OS-Applications; AUTOSAR CP C++ guidelines extend MISRA C++ 2008 with additional restrictions
▸ Practical AUTOSAR AP service implementation with C++17: class RadarService { ara::com::SomeIpServiceInstance instance; std::array<ara::com::ServiceHandle, 4> clients; public: void init() {instance.OfferService(); instance.RegisterCallback([this](auto&& req){handleRequest(std::forward<decltype(req)>(req));});} private: void handleRequest(const DetectedObjects& req) {auto result = processDetection(req); if(result) {instance.Reply(*result);} else {instance.ReplyError(result.error());}} }; ara::core::Result replaces exception: ara::core::Result<DetectedObjects, ara::core::ErrorCode> processDetection(const Request&); client uses: if(auto r=client.Call(req); r.HasValue()) {use(r.Value());} else {log(r.Error());}; compile: arm-linux-gnueabihf-g++ -std=c++17 -fno-exceptions -O2
Hands-On: C++ Adaptive Application Module 55 min read
▸ Lab: implement a minimal AUTOSAR AP Adaptive Application (AA) using open-source ARA stub library (OpenAPI or COVESA/vsomeip); project structure: CMakeLists.txt + src/main.cpp + src/SensorFusion.cpp + manifest/application.json; main.cpp: ara::core::Initialize(); auto app = std::make_unique<SensorFusion>(); app->Run(); ara::core::Deinitialize(); SensorFusion class: constructor subscribes to SOME/IP Event (ara::com::FindService + SubscribeEvent), starts processing thread; build: cmake -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_FLAGS="-fno-exceptions" ..; make; run on Ubuntu 22.04 host or QNX/Linux-based Adaptive Platform simulator
▸ Implement ara::core::Result-based error handling: define error domain: class SensorError : public ara::core::ErrorDomain {...}; enum class SensorErrorCode : ara::core::ErrorDomain::CodeType {InvalidData=1, Timeout=2}; function: ara::core::Result<FusedData, SensorError> FuseData(const RawSensor& s) { if(!s.isValid()) return ara::core::Result<FusedData,SensorError>::FromError(SensorError::kInvalidData); return FusedData{s.value * SCALE}; }; caller: auto result = FuseData(raw); if(!result.HasValue()) {HandleError(result.Error()); return;} ProcessData(result.Value()); compare with exception version - no exception overhead, zero-cost abstraction on RISC-V/ARM if compiled with -fno-exceptions
▸ SOME/IP service proxy implementation: generated from ARXML using adaptive_autosar_generator tool; Proxy class: SensorProxy::StartFindService(handler); handler registers instances; proxy.Subscribe<ObjectDetectionEvent>(maxSamples=5); auto samples = proxy.ObjectDetectionEvent.GetNewSamples([](auto&& sample){process(*sample);}); method call: ara::core::Future<CalibrationResult> fut = proxy.Calibrate(params); fut.then([](auto result){if(result.HasValue()) storeCalib(result.Value());}); SOME/IP transport layer: vsomeip library; service discovery via UDP multicast (239.255.255.250:30490); service instance ID, method ID, event ID all defined in ARXML interface description
▸ Build system and static analysis for C++17 AP module: CMake target_compile_options: -std=c++17, -Wall, -Wextra, -Wpedantic, -fno-exceptions, -fno-rtti, -O2; AddressSanitizer for host testing: -fsanitize=address,undefined; ThreadSanitizer for race detection: -fsanitize=thread; clang-tidy integration: cmake -DCMAKE_CXX_CLANG_TIDY="clang-tidy;-checks=autosar-*,cert-*,cppcoreguidelines-*"; check AUTOSAR AP coding guidelines A0-1-1 through A27-0-4; common violations found: A5-1-1 (lambda captures by ref - lifetime issue), A12-1-5 (common initialization in target constructors), A18-5-1 (dynamic memory in safety partition); unit tests: Google Test with gmock for service proxy mocking; coverage: lcov + genhtml → verify >90% line coverage
7
Debugging & Optimization
4 chapters • 3.0 hrs reading
Debugger Usage - Breakpoints, Watchpoints 40 min read
▸ ARM Cortex-M FPB (Flash Patch Breakpoint) unit: provides 6 hardware breakpoints (FPB_COMP0–5 on Cortex-M4); set breakpoint without modifying flash - debugger writes address to FPB comparator; software breakpoint: BKPT 0x00 instruction inserted by debugger into RAM-mapped code; FPB register: FP_CTRL enable bit, FP_COMP[n] = target address | 1 (enable bit); TRACE32: Break.Set Address /Program - uses FPB hardware breakpoint; DWT watchpoints: 4 watchpoints on Cortex-M4 via DWT_COMP/MASK/FUNCTION registers; FUNCTION=7 (data read/write access); TRACE32: Break.Set &variable /Write - monitors variable write; trigger on write to specific struct member using DWT_COMP=address, DWT_MASK=0, DWT_FUNCTION=6 (write access + halt)
▸ Conditional breakpoints and hit count: TRACE32: Break.Set Address /Program /CONDition "R0==0x1234" - breaks only when condition true; Break.Set Address /Count 5 - breaks only on 5th hit; conditional breakpoints use software evaluation (read register/memory after each hit) - slight performance overhead; alternatively: conditional logic in code with BKPT: if(debug_condition) {__BKPT(0);} compile with -O0 to prevent optimization removing the branch; GDB: break function if variable==value; watch *(&array[0]) - watchpoint on array element; rwatch variable - break on read; TRACE32 Trace.Break: add trace logging instead of halting - logs timestamp+register values on each hit without stopping execution
▸ TRACE32 memory inspection and modification: Data.dump A:0x20000000 - hex view of SRAM; Data.dump A:0xF0000000 %Long - 32-bit peripheral registers; Var.View myStruct - structured view of global struct with field names from ELF DWARF info; Data.Set A:address %Long value - write to memory/peripheral register (inject test values); Var.Set globalVar=42 - set variable by name; Per.view SFR window - organized register view from device SVD file; Register.view - CPU register file (R0-R15, xPSR, CONTROL, PRIMASK); Data.compare A:LMA A:VMA size - verify startup copy of .data completed correctly; useful pattern: set watchpoint on stack guard word (0xDEADBEEF at bottom of stack) to detect stack overflow
▸ OpenOCD + GDB for low-cost debugging: openocd -f interface/stlink.cfg -f target/stm32f4x.cfg; gdb arm-none-eabi-gdb app.elf; target extended-remote localhost:3333; monitor reset halt; load; break main; continue; inspect: print/x globalVar, x/4xw &array, info registers; set disassembly-flavor intel; disassemble /m function - show C + assembly interleaved; GDB scripting: commands\n while(1)\n continue\n end - run to next breakpoint in loop; fault analysis: set $pc to 0 (simulate reset), step through startup; GDB with TUI: layout src (source view), layout reg (register view), layout asm; common issue: -O2 optimized code has variables in registers not memory - use -O0 or -Og for debug builds
Memory Leak & Stack Overflow Detection 45 min read
▸ Stack overflow detection techniques: stack paint: fill task stack with 0xA5A5A5A5 pattern during initialization (startup code or OS task creation); periodically scan stack from bottom - first word ≠ 0xA5A5A5A5 indicates overflow boundary; high-water mark calculation: (stack_bottom + last_clean_offset) = minimum free stack; FreeRTOS uxTaskGetStackHighWaterMark(handle) returns remaining words (requires configCHECK_FOR_STACK_OVERFLOW=1); MPU guard page: configure ARM MPU region at stack bottom with no-access permission - stack overflow generates MemManage fault instantly instead of silent corruption; Aurix TC3xx: CSA (Context Save Area) overflow generates CIST trap (CSA trap); TRACE32 Os.stack - shows stack usage per AUTOSAR OS task
▸ Memory leak detection on host: Valgrind Memcheck: valgrind --leak-check=full ./app - reports all heap allocations without corresponding free; AddressSanitizer (ASan): compile with -fsanitize=address - detects heap overflows, use-after-free, use-after-return at runtime with ~2× overhead; LeakSanitizer (LSan): -fsanitize=leak - reports all memory leaks on exit; embedded heap monitoring: wrap malloc/free with leak tracker: __wrap_malloc(size): record (ptr, size, __LINE__, __FILE__) in static table, __wrap_free(ptr): remove entry; scan table at end - remaining entries = leaks; link with -Wl,--wrap=malloc,--wrap=free; Valgrind massif: heap profiler tracking peak usage over time - useful for finding gradual leaks in long-running applications
▸ Buffer overflow detection: ASan for host/QEMU: array[size] access beyond bounds triggers immediate SIGSEGV with stack trace and exact violated address; GCC stack protector: compile with -fstack-protector-all - inserts canary word between locals and return address; function exit checks canary: if corrupted → __stack_chk_fail() handler (log + reset); canary value = stack_chk_guard (random 32-bit value initialized at startup); Aurix TC3xx: DTRAP exception on stack pointer crossing PCXI boundary; TRACE32: set DWT watchpoint on canary address - halts on write; AUTOSAR Stack Monitoring: OsStackMonitoring configuration in ARXML generates hook called by OS scheduler to check stack paint before task switch
▸ Systematic resource leak patterns and prevention: ISR not clearing interrupt flag - ISR fires repeatedly in infinite loop consuming CPU; detect: add execution counter in ISR, monitor rate via TRACE32 Var.Watch; DMA not completing - DMA channel stuck in active state blocking peripheral; detect: check DMA_SxCR_EN bit still set after expected timeout; SPI slave select not deasserted - bus hung waiting for NSS release; prevent: RAII guard classes for all hardware resources (SpiTransaction, DmaChannel, GpioPin); NvM block not committed - data appears written in RAM but lost on power cycle; detect: verify Dcm_GetActiveSessionMode after NvM write + operation cycle reset; AUTOSAR WdgM watchdog: supervised entities configured per module - if module stalls, WdgM detects missed checkpoint and triggers reset
Code Size & Speed Optimization 40 min read
▸ GCC optimization flags: -O0 no optimization (debug); -O1 basic optimizations (constant folding, dead code elimination); -O2 most optimizations without space-speed tradeoff (loop unrolling disabled); -O3 aggressive (vectorization, loop unrolling - may increase code size); -Os optimize for code size (similar to -O2 but avoids size-increasing transforms); -Og debug-friendly optimizations (good balance for debug + moderate optimization); per-function attribute: __attribute__((optimize("O3"))) for hot path; __attribute__((noinline)) prevents inlining of rarely-called functions; -flto (Link-Time Optimization): inlines across translation unit boundaries at link time - can reduce code size 5-15%
▸ Code size reduction techniques: #pragma GCC optimize("Os") before size-critical section; remove unused code: -ffunction-sections -fdata-sections (place each function/variable in own section) + -Wl,--gc-sections (linker removes unreferenced sections); verify with arm-none-eabi-size before/after - typical savings 10-30%; reduce string literals: use error codes instead of descriptive strings in production build (#ifdef PRODUCTION_BUILD / #endif); avoid printf/sprintf in production (pulls in 20-40KB of formatting code) - use custom itoa + serial write; AUTOSAR: separate debug strings into .debug section excluded from production flash image; Thumb2 vs ARM: Thumb2 instructions 2-4 bytes vs ARM 4 bytes - Cortex-M always uses Thumb2 (no switch needed)
▸ Speed optimization techniques: loop optimization: move invariant computations outside loop (done automatically by -O2); use local variables instead of global (local in register, global requires memory load); avoid division (expensive on Cortex-M without FPU division): replace divide-by-constant with multiply-by-reciprocal: x/5 → (x * 0x33333334) >> 33 (compiler does this automatically with -O2); use __builtin_expect for branch hints: if(__builtin_expect(errorFlag, 0)) - hints branch not taken; Cortex-M4 SIMD: arm_math.h CMSIS DSP library uses Thumb2 SIMD (SADD16, SMUL16) for parallel 16-bit operations; DMA for bulk memory operations instead of memcpy loops
▸ Profiling workflow and measurement: cycle-accurate profiling with DWT: uint32_t start=DWT->CYCCNT; function_to_measure(); uint32_t cycles=DWT->CYCCNT-start; log cycles; for Aurix: use STM timestamp; TRACE32 Trace.STATISTIC.FUNC: profile function-level CPU usage from ETM instruction trace (no code modification); Compiler explorer (godbolt.org): paste embedded C/C++ + select arm-none-eabi-gcc with -O2 -mcpu=cortex-m4 - inspect assembly output; identify expensive operations: mul instructions (1 cycle), div instructions (2-12 cycles), cache miss (4-12 cycles flash access); cache optimization: place hot loop code in DSPR RAM (Aurix DTAG pin) or ITCM (Cortex-M7) for zero-wait-state execution; AUTOSAR MemMap attributes separate calibration data from code to minimize cache pollution
Hands-On: Performance Profiling Project 55 min read
▸ Lab objective: profile and optimize a CRC32 computation function for Aurix TC3xx or STM32F4; starting implementation: naive bit-by-bit CRC: for each byte, for each bit: 8 iterations per byte; measure baseline: enable DWT: CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk; DWT->CYCCNT=0; DWT->CTRL|=1; start=DWT->CYCCNT; crc32_naive(data, 1024); cycles_naive = DWT->CYCCNT - start; record result; goal: reduce execution time by 10× while maintaining correctness; compile all variants with -O2 -mcpu=cortex-m4 -mthumb
▸ Optimization iteration 1 - lookup table: precompute 256-entry uint32_t table at startup (or as const array in .rodata); reduce loop: for each byte: crc = (crc>>8) ^ table[(crc^*data++) & 0xFF]; measure: cycles_table = DWT->CYCCNT - start; expect 8× speedup vs naive; iteration 2 - CMSIS hardware CRC on STM32: enable CRC peripheral: RCC->AHB1ENR |= RCC_AHB1ENR_CRCEN; CRC->CR = CRC_CR_RESET; for(uint16_t i=0;i<len;i++) CRC->DR = data[i]; result = CRC->DR; measure: cycles_hw ≈ 1 cycle per byte (hardware parallel); iteration 3 - Aurix hardware CRC module: use CRC module with CRC_CLCR configuration, direct register write for zero-overhead
▸ Code size profiling exercise: measure binary size at each optimization level: arm-none-eabi-size crc_O0.elf / crc_O2.elf / crc_Os.elf; enable -ffunction-sections + --gc-sections: before = include unused functions; after = only CRC functions linked; compare .text section size; use arm-none-eabi-nm to find largest functions; annotate hot path with __attribute__((hot)) and cold path with __attribute__((cold)) - directs code layout optimizer; bloaty (https://github.com/google/bloaty): bloaty app.elf -d symbols → size by symbol with template instantiation breakdown; expected findings: -O2 vs -Os size tradeoff: -O2 table version = 1200 bytes (.rodata for table), -Os bit-shift version = 200 bytes (.text)
▸ Final report and benchmark comparison: create results table: implementation vs cycles-per-1KB vs code-size-bytes vs flash-cycles (table version needs flash read if table in .rodata); plot on Compiler Explorer showing assembly for each version; identify bottleneck in naive version: branch + loop overhead dominates; identify cache benefit: table in .rodata (flash) has 4-cycle access penalty first time, then cache hit; optimization lessons: prefer hardware accelerators (10-100× faster), then lookup tables (8× over naive), then compiler intrinsics, then manual optimization; document findings in profiling report; CI performance gate: add benchmark test to Jenkins - fail build if CRC throughput drops below 50 MB/s (regressions in code generation or table placement)

What You'll Learn

Write MISRA-C:2012 compliant embedded C code
Implement peripheral drivers using memory-mapped I/O
Design interrupt-safe real-time software architectures
Develop multi-task applications on RTOS platforms
Apply modern C++ patterns for Adaptive AUTOSAR
Debug and optimize code for automotive microcontrollers
Use static analysis tools to enforce coding standards

Prerequisites

Basic C programming knowledge
Understanding of binary/hexadecimal numbers
High school level mathematics
Full Access
Free with Pro
Enroll Now Browse Modules

This course includes:

36 detailed documentation chapters
Downloadable resources
Searchable text documentation
Code snippets & technical diagrams
Hands-on exercises
Lifetime access
Certificate of completion