| Cache | Location | Size | Line Size | Policy |
|---|---|---|---|---|
| PCACHE (Program) | Per TC core | 16 kB (4-way set-assoc) | 256 bits (32 bytes) | Write-protected (code only) |
| DCACHE (Data) | Per TC core | 8 kB (2-way set-assoc) | 256 bits (32 bytes) | Write-back, write-allocate |
| DLMU (LMU) | Shared all cores | Up to 1 MB RAM | No cache — direct-mapped | N/A — always uncached |
| Scratchpad DSPR | Per core local | 256 kB | No cache — single-cycle | Zero wait state; best performance |
Aurix TC3xx Cache Architecture
Cache Miss Analysis with MCDS
// Cache miss analysis using Aurix MCDS performance counters
// MCDS can count: instruction fetch, data read, cache misses, bus stalls
// Enable MCDS performance counters
MCDS.ON
MCDS.PerfCounter IMISS // instruction cache misses
MCDS.PerfCounter DMISS // data cache misses
MCDS.PerfCounter BSTALL // bus stall cycles
// Reset counters and run test section
MCDS.PerfCounter.RESET
Go App_5ms_Runnable // run to function
WAIT !STATE.RUN() 100ms
Break.Set App_5ms_Return /Program
Go
WAIT !STATE.RUN() 100ms
// Read counters
LOCAL &imiss &dmiss &bstall &total_cycles
&imiss = MCDS.PerfCounter.VALUE(IMISS)
&dmiss = MCDS.PerfCounter.VALUE(DMISS)
&bstall = MCDS.PerfCounter.VALUE(BSTALL)
&total_cycles = Data.Long(SFR:STM0.TIM0) - &start_stm
PRINT "I-Cache misses: " &imiss " (" FORMAT.FLOAT(2.,1.,(&imiss*100.)/&total_cycles) "%)"
PRINT "D-Cache misses: " &dmiss " (" FORMAT.FLOAT(2.,1.,(&dmiss*100.)/&total_cycles) "%)"
PRINT "Bus stalls: " &bstall " cycles"
// High D-miss rate → investigate data locality (see placement section)
// High I-miss rate → function too large or scattered across flash pagesPipeline Stall Detection
| Stall Type | Cause | Detection | Mitigation |
|---|---|---|---|
| Data hazard (RAW) | Read-after-write in sequential instructions with no gap | MCDS data stall counter; high stall count on tight loop | Insert NOPs or reorder instructions; compiler -O2 does this automatically |
| Load-use hazard | Load result used in next instruction | Pipeline stall counter; check compiler output | Use __builtin_prefetch() or rearrange load/use distance |
| Branch misprediction | Conditional branch taken unexpectedly | MCDS branch mispred counter | Profile hot branches; consider conditional moves vs branches |
| Bus wait | Flash wait states on PFlash access | Bus stall counter; check wait state config | Copy hot loops to PSPR scratchpad for zero wait states |
Code and Data Placement for Cache Performance
/* Linker section placement for cache/performance optimisation */
/* Time-critical functions → PSPR scratchpad (zero wait state, no cache needed) */
#pragma section ".text.fast" ax
void __attribute__((section(".text.fast")))
Can_RxIsr_Handler(void) {
/* This function runs from PSPR scratchpad: 0-cycle instruction fetch */
}
#pragma section
/* Hot lookup tables → DSPR (data scratchpad, zero wait state) */
#pragma section ".data.fast" awc
static const uint16_t CRC_Table[256] __attribute__((section(".data.fast")));
#pragma section
/* Cold init data → LMU (shared RAM; evicted from cache after init) */
#pragma section ".data.init" aw
static CalibrationParams_t g_calParams __attribute__((section(".data.init")));
#pragma section
/* Linker script entries (GNU LD):
.text.fast : > DSPR_CORE0 AT > PFLASH -- VMA in scratchpad, LMA in flash
.data.fast : > DSPR_CORE0 AT > PFLASH
.data.init : > LMU_RAM AT > PFLASH -- cold data in LMU */Summary
Aurix TC3xx has per-core 16 kB instruction and 8 kB data caches with 32-byte lines; the 256 kB per-core DSPR scratchpad is faster still (zero wait state, never evicted). MCDS performance counters quantify I-miss, D-miss, and bus stall rates without source code modification. The primary optimisation lever is placement: moving time-critical code and hot lookup tables from PFlash (with cache misses) to DSPR scratchpad eliminates wait states entirely. Profile first, place second — optimising non-hot code wastes limited scratchpad space.
🔬 Deep Dive — Core Concepts Expanded
This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.
Key principles to reinforce:
- Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
- Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
- Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.
🏭 How This Topic Appears in Production Projects
- Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
- Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
- Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.
⚠️ Common Mistakes and How to Avoid Them
- Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
- Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
- Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
- Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.
📊 Industry Note
Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.