Performance Profiling & Optimization - AUTOSAR Adaptive Platform

Event Delivery Latency Measurement

C++latency_measure.cpp

#include <time.h>

// Skeleton side: embed timestamp in payload
struct TimestampedIMU {
    float accel_x, accel_y, accel_z;
    uint64_t send_ns; // CLOCK_MONOTONIC nanoseconds
};

TimestampedIMU data = ReadIMU();
clock_gettime(CLOCK_MONOTONIC, &ts);
data.send_ns = uint64_t(ts.tv_sec) * 1'000'000'000ULL + ts.tv_nsec;
skel.IMUData.Send(data);

// Proxy side: measure receive latency
imuProxy.IMUData.GetNewSamples([](auto sample) {
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);
    uint64_t recv_ns = uint64_t(now.tv_sec)*1'000'000'000ULL + now.tv_nsec;
    uint64_t latency_us = (recv_ns - sample->send_ns) / 1000;
    histogram.Record(latency_us); // collect P50/P95/P99
}, 10);

Zero-Copy Optimization

C++zero_copy.cpp

// COPY-BASED (baseline):
// CM allocates new buffer, copies data into it before Send()
LargePayload data = BuildData();
skel.DataEvent.Send(data); // 1 copy

// ZERO-COPY (optimized):
// Application fills the CM-allocated buffer directly
auto sample = skel.DataEvent.Allocate();
FillDataInPlace(*sample); // write directly into CM buffer
skel.DataEvent.Send(std::move(sample)); // 0 copies

// For 1 MB payloads at 30 Hz: zero-copy saves ~30 MB/s memory bandwidth

💡 Zero-Copy Applicability

Zero-copy via Allocate() is beneficial only for payloads > ~1 KB. For small events (<64 bytes), the overhead of the Allocate() call exceeds the copy cost. Profile before optimizing.

SOME/IP Serialization Cost

Shellterminal

# Profile ara::com Send() with perf
perf stat -e cache-misses,instructions,cycles \
  ./sensor_app_benchmark --duration 10 --rate 100hz

# Results for 1 KB payload at 100 Hz:
# SOME/IP header encoding:  ~2 µs / call
# UDP sendmsg():            ~8 µs / call (kernel syscall)
# Total Send() overhead:   ~10 µs / call

# For 1 µs budget: use local IPC binding instead of SOME/IP
# ara::com local binding bypasses serialization entirely

🔍 Batch Events

For very high-frequency small events (>1 kHz), consider batching multiple samples into a single SOME/IP message using an array event type. This amortizes the 10 µs Send() overhead over multiple samples.

CPU Pinning

C++cpu_pinning.cpp

#include <sched.h>

// Pin the real-time processing thread to CPU core 2
// to avoid scheduler migration latency (typically 5–50 µs)
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset);
pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);

// Set SCHED_FIFO priority
struct sched_param param;
param.sched_priority = 80;
pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);

// Verify pinning applied
cpu_set_t result;
pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), &result);
assert(CPU_ISSET(2, &result));

⚠️ IRQ Affinity

CPU pinning is only effective if the Ethernet NIC IRQ is also pinned to a different core. If the NIC IRQ and the application thread share a core, NIC interrupt handling preempts the RT thread. Set NIC IRQ affinity via /proc/irq/N/smp_affinity_list.

Summary

Profiling Adaptive applications requires combining OS-level tools (perf, clock_gettime) with ara::com instrumentation. Zero-copy Allocate(), CPU pinning, and SCHED_FIFO are the primary optimization levers. Always measure before optimizing — the bottleneck is rarely where you expect.

🔬 Deep Dive — Core Concepts Expanded

This section builds on the foundational concepts covered above with additional technical depth, edge cases, and configuration nuances that separate competent engineers from experts. When working on production ECU projects, the details covered here are the ones most commonly responsible for integration delays and late-phase defects.

Key principles to reinforce:

Configuration over coding: In AUTOSAR and automotive middleware environments, correctness is largely determined by ARXML configuration, not application code. A correctly implemented algorithm can produce wrong results due to a single misconfigured parameter.
Traceability as a first-class concern: Every configuration decision should be traceable to a requirement, safety goal, or architecture decision. Undocumented configuration choices are a common source of regression defects when ECUs are updated.
Cross-module dependencies: In tightly integrated automotive software stacks, changing one module's configuration often requires corresponding updates in dependent modules. Always perform a dependency impact analysis before submitting configuration changes.

🏭 How This Topic Appears in Production Projects

Project integration phase: The concepts covered in this lesson are most commonly encountered during ECU integration testing — when multiple software components from different teams are combined for the first time. Issues that were invisible in unit tests frequently surface at this stage.
Supplier/OEM interface: This is a topic that frequently appears in technical discussions between Tier-1 ECU suppliers and OEM system integrators. Engineers who can speak fluently about these details earn credibility and are often brought into critical design review meetings.
Automotive tool ecosystem: Vector CANoe/CANalyzer, dSPACE tools, and ETAS INCA are the standard tools used to validate and measure the correct behaviour of the systems described in this lesson. Familiarity with these tools alongside the conceptual knowledge dramatically accelerates debugging in real projects.

⚠️ Common Mistakes and How to Avoid Them

Assuming default configuration is correct: Automotive software tools ship with default configurations that are designed to compile and link, not to meet project-specific requirements. Every configuration parameter needs to be consciously set. 'It compiled' is not the same as 'it is correctly configured'.
Skipping documentation of configuration rationale: In a 3-year ECU project with team turnover, undocumented configuration choices become tribal knowledge that disappears when engineers leave. Document why a parameter is set to a specific value, not just what it is set to.
Testing only the happy path: Automotive ECUs must behave correctly under fault conditions, voltage variations, and communication errors. Always test the error handling paths as rigorously as the nominal operation. Many production escapes originate in untested error branches.
Version mismatches between teams: In a multi-team project, the BSW team, SWC team, and system integration team may use different versions of the same ARXML file. Version management of all ARXML files in a shared repository is mandatory, not optional.

📊 Industry Note

Engineers who master both the theoretical concepts and the practical toolchain skills covered in this course are among the most sought-after professionals in the automotive software industry. The combination of AUTOSAR standards knowledge, safety engineering understanding, and hands-on configuration experience commands premium salaries at OEMs and Tier-1 suppliers globally.