1 Why container-optimized .NET now? (Context, goals, and trade-offs)
Containerized .NET applications have matured from “it works in Docker” to “it scales efficiently across thousands of pods.” The conversation has shifted from portability to performance, cold-start latency, and memory economics. In distributed environments like AKS (Azure Kubernetes Service) and Azure Container Apps (ACA), the ability to start fast, consume less RAM, and coexist with other workloads defines both reliability and cost efficiency. Modern .NET—from .NET 7 onward—gives us new tools for this: Native AOT, link-time trimming, and GC tuning designed for containers.
This guide focuses on how to apply these tools in production, not just what they do. You’ll see why startup speed matters more than ever, what each compilation model optimizes, and how to make code truly lean without losing diagnostics or maintainability.
1.1 Problem framing: cold starts, noisy neighbors, and memory bills in K8s/ACA
In a static VM or App Service, your .NET service starts once and lives for weeks. In Kubernetes or ACA, it might start hundreds of times per day—scaled by request rate, KEDA rules, or scheduled jobs. Each cold start burns CPU to JIT-compile IL, initializes GC heaps, and loads assemblies. When scaling from zero, this startup time directly adds to user latency.
A few recurring pain points:
- Cold starts – Even lightweight ASP.NET Core APIs can take 400–800 ms to warm up under JIT. Multiply that by dozens of replicas and you’re burning precious CPU cycles before doing real work. In ACA jobs or event-driven workers, that’s often longer than the job itself.
- Noisy neighbors – Containers share nodes. If one service over-allocates memory, it can trigger GC pressure or OOMs in another. The .NET GC by default sizes heaps to ~75 % of container limits, so multiple processes can unknowingly overcommit.
- Memory bills – Memory is the silent cost driver in AKS and ACA. A 512 MiB service scaled to 50 replicas across three environments adds up quickly. Smaller resident sets (RSS) from trimming and AOT compilation reduce both cloud cost and bin-packing friction.
Optimizing .NET for containers isn’t about chasing micro-benchmarks—it’s about controlling startup latency, memory footprint, and CPU efficiency under elastic scaling.
1.2 Startup vs. throughput vs. memory: how serverless-ish scaling changes priorities
Traditional tuning balanced throughput vs. latency for steady-state workloads. In container platforms, we add a third axis: startup cost.
| Concern | What matters | Why it matters in containers |
|---|---|---|
| Startup | Build size, IL→native overhead, JIT warm-up | Impacts scale-out speed, cold-start latency |
| Throughput | GC tuning, thread pool, async efficiency | Sustained request handling under load |
| Memory | Trimmed assemblies, GC heap size, native footprint | Determines pod density and cost efficiency |
ACA, KEDA, and Kubernetes Horizontal Pod Autoscalers all assume new replicas appear quickly. A 1-second startup vs. 300 ms can mean missing your P95 latency SLOs under burst load. Similarly, when scaling down to zero, memory-heavy images delay cold starts and inflate registry pulls.
Modern .NET gives you compilation strategies (JIT/R2R/AOT) and linker optimizations (trimming) to balance these axes. The key is understanding what each mode optimizes—and its costs.
1.3 What’s new in modern .NET relevant to containers
Three evolutions since .NET 6 have fundamentally changed container efficiency:
- Native AOT (Ahead-of-Time compilation) – Introduced as stable in .NET 8, it compiles IL directly to native code, removing the JIT entirely. The result: faster startup and smaller memory footprints—ideal for microservices and background jobs.
- Trimming and link-time analysis – The IL linker now aggressively removes unused code, even across assemblies, when
PublishTrimmed=true. Combined with AOT, this can cut output sizes by 50–80 %. - Container diagnostics –
dotnet-monitor,dotnet-counters, andOpenTelemetryintegrations are now container-friendly. You can capture runtime metrics without volume mounts or privileged access. - ASP.NET Core minimal APIs and source generators – Reduced reflection, leaner DI, and compile-time endpoint generation make ASP.NET Core more compatible with AOT and trimming.
- Smarter GC for containers – The runtime now detects cgroup memory limits (especially under v2) and sizes heaps proportionally, respecting
DOTNET_GCHeapHardLimit.
Together, these features shift .NET from “fast after warm-up” to “fast from the first request,” enabling parity with Go and Rust in container environments while keeping the productivity of C#.
1.4 Baseline mental model: JIT vs. ReadyToRun (R2R) vs. Native AOT
Think of these as points on a spectrum balancing flexibility, startup speed, and build complexity.
| Mode | How it runs | Benefits | Trade-offs |
|---|---|---|---|
| JIT (default) | IL compiled at runtime | Portable, simple builds, full reflection support | Slow startup, more memory overhead |
| ReadyToRun (R2R) | IL pre-compiled to native ahead of time, still uses JIT for some code | Faster startup, retains full .NET feature set | Larger binaries, architecture-specific |
| Native AOT | Fully compiled to native, no JIT present | Smallest, fastest startup, low RSS | Limited reflection, static linking, reduced flexibility |
1.4.1 JIT: dynamic and flexible
JIT (Just-In-Time) compilation turns IL into native machine code as methods execute. It enables dynamic features like reflection, runtime code generation, and cross-platform portability. The downside: startup tax—each process must JIT its own methods—and increased memory use for JIT caches.
1.4.2 ReadyToRun: hybrid
ReadyToRun pre-compiles most IL at publish time using crossgen2. The resulting assemblies embed native code sections, reducing JIT time. However, they remain partly managed—methods that rely on runtime generics or dynamic code may still be JIT-compiled. You gain startup speed but pay with larger images (often 20–50 % bigger).
1.4.3 Native AOT: static and lean
Native AOT eliminates the JIT entirely. You get single-file executables with no dependency on the .NET runtime (other than a few native libraries). The trade-off is stricter feature support: limited dynamic code, reflection via source generation, and constrained libraries. For microservices or background jobs that don’t use dynamic loading, it’s a clear win.
1.5 Scope of this guide
This series focuses on three workload types running on AKS and Azure Container Apps:
- Microservices – REST or gRPC APIs with predictable workloads, where startup and memory dominate cost.
- Background jobs – Short-lived or bursty workloads triggered by queues, events, or schedules.
- Sidecars – Lightweight agents (e.g., Dapr, telemetry exporters) co-deployed with main apps, where small memory footprints prevent OOMs.
Each section builds from fundamentals (compilation and trimming) toward container-specific tuning and deployment strategies. You’ll see side-by-side builds—JIT, R2R, and Native AOT—deployed to AKS and ACA with performance measurements.
2 Compilation strategies for containers: JIT, R2R, and Native AOT
In containers, compilation strategy is not an academic choice—it directly impacts cold-start latency, image size, and resource usage. Let’s dissect how each works, when to use it, and how to integrate it into a CI/CD pipeline targeting AKS and ACA.
2.1 JIT inside containers: warm-up cost, profile data, and image size implications
A default dotnet publish creates IL assemblies that depend on JIT compilation at runtime. On first execution, methods are compiled on demand.
2.1.1 Warm-up overhead
When a pod starts, JIT compilation contributes hundreds of milliseconds before handling the first request. You can mitigate this by profile-guided optimization (PGO) introduced in .NET 8.
Example build with dynamic PGO enabled:
dotnet publish -c Release -p:TieredPGO=1
PGO records which methods are frequently executed, optimizing them more aggressively across future runs. In container environments, you can capture profile data from staging workloads and bake it into production images via:
dotnet publish -c Release -p:ReadyToRunProfilePath=profiledata.mibc
2.1.2 Image size
JIT builds are smallest—only IL assemblies and the runtime—but every container carries the full .NET runtime image (~200 MB). Smaller base images like mcr.microsoft.com/dotnet/aspnet:8.0-alpine reduce pull time but not JIT cost.
2.1.3 When JIT makes sense
Use JIT when:
- You need dynamic features (reflection, plugins, dynamic loading)
- Build times must be minimal
- Startup latency isn’t critical (long-running services)
For everything else, consider R2R or AOT.
2.2 ReadyToRun (R2R)
ReadyToRun is a middle ground: pre-compile IL into native code at publish time using crossgen2. It speeds up cold starts while keeping the full runtime feature set.
2.2.1 How it works
During publishing:
dotnet publish -c Release -p:PublishReadyToRun=true
The compiler produces PE files that embed both IL and native sections. The runtime uses the native version directly, skipping JIT for most methods.
2.2.2 Show warnings and verify
Crossgen2 can emit diagnostics about unverifiable methods:
dotnet publish -c Release -p:PublishReadyToRun=true -p:PublishReadyToRunShowWarnings=true
Warnings typically indicate dynamic code that couldn’t be pre-compiled—these methods still JIT at runtime.
2.2.3 Performance and image impact
Expect:
- Startup 20–40 % faster than JIT
- Image 20–60 % larger, due to native sections
- Memory usage slightly lower once warmed up (less JIT cache)
Example Dockerfile for R2R:
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY . .
RUN dotnet publish -c Release -p:PublishReadyToRun=true -o /app/publish
FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "MyService.dll"]
2.2.4 Known caveats
- Platform-specific binaries—need per-architecture builds (
linux-x64,linux-arm64) - Larger deployment packages
- Slightly longer build times
- Still requires the .NET runtime in the container
R2R fits well for long-running APIs or gRPC services where startup latency matters but full runtime features are needed.
2.3 Native AOT
Native AOT produces standalone executables with no JIT and no runtime dependencies. It’s the most transformative change for containerized .NET.
2.3.1 Publishing a Native AOT binary
dotnet publish -r linux-x64 -c Release -p:PublishAot=true
Output: a single native ELF binary.
2.3.2 Benefits
- Startup <100 ms even for ASP.NET Core apps
- Memory use 30–60 % lower (no JIT or metadata tables)
- Smaller images when paired with
distrolessoralpinebases - No runtime dependency—just a native binary and libc
2.3.3 Limitations
- Limited reflection and dynamic code
- No runtime code generation (e.g.,
System.Reflection.Emit) - Some libraries not yet compatible (dynamic proxies, some serializers)
- Longer build times and platform-specific output
2.3.4 Compatibility checks
Run trimming analysis before attempting AOT:
dotnet publish -c Release -p:PublishTrimmed=true -p:TrimMode=link
If the build passes without trimming warnings, it’s likely AOT-safe. The compiler will flag unsupported APIs automatically.
2.4 ASP.NET Core with Native AOT
ASP.NET Core 8+ officially supports AOT for minimal APIs and lightweight web services. This subset avoids heavy DI or runtime code generation.
2.4.1 Suitable patterns
- Minimal APIs
- gRPC services with static contracts
- Background workers (Queue or Event processing)
- CLI tools and sidecars
2.4.2 Example minimal API build
Program.cs:
var app = WebApplication.CreateSlimBuilder(args).Build();
app.MapGet("/healthz", () => "OK");
app.Run();
csproj:
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<PublishAot>true</PublishAot>
<InvariantGlobalization>true</InvariantGlobalization>
</PropertyGroup>
Dockerfile:
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY . .
RUN dotnet publish -r linux-x64 -c Release /p:PublishAot=true -o /out
FROM mcr.microsoft.com/dotnet/runtime-deps:8.0
WORKDIR /app
COPY --from=build /out .
ENTRYPOINT ["./MyService"]
2.4.3 Current gaps and workarounds
- Reflection in DI frameworks – Prefer source-generated DI (e.g.,
Microsoft.Extensions.DependencyInjection.Generators) - JSON serialization – Use
System.Text.Jsonsource generation - OpenTelemetry exporters – Work with trimming mode
linkand explicit attributes - Middleware discovery – Avoid dynamic assembly scans; register explicitly
2.4.4 Trimming and AOT synergy
AOT implicitly trims unused code. Still, specify:
dotnet publish -c Release -r linux-x64 -p:PublishAot=true -p:TrimMode=full
This produces minimal native executables under 20 MB for small APIs.
2.5 Decision tree: choosing JIT + R2R vs. Native AOT
| Criterion | Prefer JIT | Prefer R2R | Prefer Native AOT |
|---|---|---|---|
| Needs heavy reflection / dynamic loading | ✅ | ✅ | ❌ |
| Startup latency critical (<300 ms) | ⚠️ | ✅ | ✅ |
| Memory constrained (<512 MiB per pod) | ⚠️ | ✅ | ✅ |
| Build simplicity | ✅ | ⚠️ | ❌ |
| CI/CD build time tolerance | ✅ | ⚠️ | ❌ |
| Portability across OS/arch | ✅ | ⚠️ | ❌ (arch-specific) |
| Diagnostics & observability | ✅ | ✅ | ⚠️ (limited runtime hooks) |
Rule of thumb:
- For APIs and gRPC services, start with R2R and dynamic PGO; consider AOT when warm-up dominates latency.
- For background jobs and event handlers, default to Native AOT—they benefit most from short startup and low memory.
- For sidecars or agents, use AOT for minimal footprint and static linking.
3 Trimming and link-time dead code removal (for JIT/R2R/AOT)
Trimming eliminates unused IL and metadata at build time, reducing image size and memory footprint. It’s essential whether you target JIT, R2R, or AOT—though AOT benefits the most.
3.1 What trimming is and why it matters in containers
Trimming analyzes code and dependencies to remove unreferenced members, similar to link-time optimization in C/C++. In containers, trimming helps by:
- Reducing image size – Less code copied into images → faster pulls and deployments
- Lowering RSS – Fewer assemblies and metadata tables loaded
- Improving cold starts – Less code to load at startup
Typical size reductions:
- Minimal API: from 90 MB → 35 MB trimmed
- Native AOT: down to 15–20 MB total binary
3.2 Trimming modes and pitfalls
Set trimming with:
dotnet publish -c Release -p:PublishTrimmed=true
Key options:
TrimMode=copyused– Conservative (safe for libraries)TrimMode=link– Aggressive (for apps)SuppressTrimAnalysisWarnings=false– Show potential breakages
3.2.1 Common pitfalls
- Reflection – If code uses
Type.GetType("Foo"), linker can’t detect it. Annotate with[DynamicallyAccessedMembers]. - Serializers – Libraries like
Newtonsoft.Jsonrely on reflection; preferSystem.Text.Jsonwith source generation. - DI frameworks – Reflection-based injection may remove needed constructors. Switch to compile-time DI generators.
- Plug-ins or MEF – Dynamic loading via
Assembly.Loadcan’t be analyzed safely; mark assemblies asPreserveDependency.
Example attribute usage:
[DynamicDependency(DynamicallyAccessedMemberTypes.PublicConstructors, typeof(MyType))]
3.3 Making libraries trim-friendly
3.3.1 Patterns
- Avoid string-based reflection (
Activator.CreateInstance("TypeName")) - Replace
typeof(T).Assembly.GetTypes()with explicit registrations - Use source generators for DI, JSON, and gRPC code
- Apply
[RequiresUnreferencedCode]to methods that use reflection internally
3.3.2 Choosing NuGet packages
Check for the <IsTrimmable>true</IsTrimmable> property in a library’s .nuspec. Libraries that declare trimming compatibility are safe for AOT pipelines.
Examples:
- ✅
System.Text.Json(with source gen) - ✅
prometheus-net - ⚠️
Newtonsoft.Json - ⚠️
Autofac(reflection-heavy) - ✅
Microsoft.Extensions.DependencyInjection.Generators
3.4 Verification: enforcing trimming correctness in CI
Add this step in your pipeline:
dotnet publish -c Release -p:PublishTrimmed=true -warnaserror:IL2026,IL3050
Fail builds if trimming warnings appear. ILLink analyzer warnings (IL2026, IL3050) indicate missing attributes or unsafe reflection.
You can integrate this into GitHub Actions:
- name: Build trimmed
run: dotnet publish -c Release -p:PublishTrimmed=true -warnaserror
3.5 Example: trimming a minimal API and a background job
3.5.1 Minimal API
Untrimmed:
dotnet publish -c Release
# Output: 92 MB
# Startup: ~450 ms
Trimmed:
dotnet publish -c Release -p:PublishTrimmed=true -p:TrimMode=link
# Output: 38 MB
# Startup: ~260 ms
3.5.2 Background job (AOT)
Untrimmed AOT:
dotnet publish -r linux-x64 -p:PublishAot=true
# 28 MB binary
Trimmed AOT:
dotnet publish -r linux-x64 -p:PublishAot=true -p:TrimMode=full
# 16 MB binary
Cold start dropped from 300 ms → 90 ms; RSS reduced by ~35 %.
3.6 Open-source libraries that play nicely with trimming/AOT
When building AOT-ready containerized services, library choice matters.
| Library | Compatible | Notes |
|---|---|---|
System.Text.Json | ✅ | Use source generators for reflection-free serialization |
YARP | ⚠️ | Works with trimming if middleware registration explicit |
prometheus-net | ✅ | No reflection usage |
OpenTelemetry SDK | ✅ | Use 1.7+; exporter trimming safe |
MassTransit / NServiceBus | ⚠️ | Reflection-heavy; test thoroughly |
Dapr SDK | ⚠️ | Some reflection; isolate in sidecar for AOT safety |
If your dependency stack includes reflection-heavy libraries, prefer R2R builds and partial trimming to maintain reliability.
4 GC & memory tuning in containers (with cgroup v2 awareness)
The garbage collector (GC) is one of the most important moving parts of .NET performance inside containers. Unlike VMs, containers run under explicit memory constraints—set by resources.limits.memory in Kubernetes or by the environment in Azure Container Apps (ACA). The .NET runtime automatically adapts heap sizing based on those limits, but the defaults can surprise you. Understanding how the GC reacts to container limits and how to override that behavior is essential for keeping services stable and cost-efficient.
4.1 How .NET GC sizes heaps in containers by default (75% of limit or 20 MB min) and why that matters for pod OOMs and bin-packing
When running inside a container, .NET uses cgroup data to estimate available memory. The GC sets its heap budget to roughly 75% of the container’s memory limit or a minimum of 20 MB—whichever is higher. That 75% is a heuristic meant to prevent aggressive collection but still leave headroom for native allocations, thread stacks, and runtime overhead.
In Kubernetes, this means a pod with:
resources:
limits:
memory: 512Mi
will let the .NET GC allocate up to about 384 MiB for managed heaps before it starts collecting aggressively.
Why it matters
This works well for single-process containers, but if you have multiple processes (e.g., app + sidecar), each assumes it owns 75% of the limit. The combined footprint often exceeds the cgroup limit, leading to OOMKills even though each process individually looks fine. Similarly, bin-packing on AKS relies on pods respecting their limits—so any overuse disrupts scheduling efficiency.
If you’ve ever seen random OOMs despite “plenty” of memory, or inconsistent behavior between local Docker runs and AKS, this default is usually why.
4.2 cgroup v2 on AKS and Azure Linux nodes: what changed in accounting, how it can shift observed usage
Recent AKS and Azure Linux node pools now use cgroup v2 by default. The change isn’t visible in YAML, but it directly affects how memory usage is measured.
Under cgroup v1, .NET often misread available memory because the GC saw only node-level metrics or partial container quotas. cgroup v2 exposes unified, hierarchical memory accounting that lets .NET measure exactly the memory assigned to the container.
Implications
- Tighter enforcement – GC now correctly stops before crossing memory limits, meaning apps that previously worked near the edge may now hit OOMs sooner.
- Different RSS numbers – Because v2 counts page cache and shared memory differently, metrics from tools like
kubectl top podordotnet-countersmay shift by 5–15%. - Behavioral change after cluster upgrades – Upgrading from AKS 1.26+ or Azure Linux node pools can subtly increase GC frequency, as the runtime perceives less available memory.
Practical check
You can verify cgroup mode from inside the container:
cat /sys/fs/cgroup/cgroup.controllers
If the output lists controllers (not subdirectories), you’re on cgroup v2.
When moving to v2 clusters, retest your GC tuning and review DOTNET_GCHeapHardLimit settings. What used to fit comfortably under v1 may now need explicit limits or reduced concurrency.
4.3 Key knobs: Server GC vs. Workstation, GCLatencyMode, heap limits, and NUMA considerations
4.3.1 GC modes
Containers default to Server GC, optimized for throughput on multi-core environments. It spawns one GC thread per core and uses larger heaps. That’s ideal for gRPC backends or high-QPS APIs but can be heavy for small microservices.
You can switch to Workstation GC by setting:
DOTNET_GCServer=0
or in runtimeconfig.json:
{
"runtimeOptions": {
"configProperties": {
"System.GC.Server": false
}
}
}
Server GC scales well above 2 cores. Below that, Workstation GC often gives lower latency and smaller footprints.
4.3.2 GCLatencyMode
GCLatencyMode controls how aggressively GC blocks for collections. For bursty jobs or background tasks, SustainedLowLatency avoids long pauses:
System.Runtime.GCSettings.LatencyMode = GCLatencyMode.SustainedLowLatency;
For microservices that serve short requests, this can reduce p95 spikes. For CPU-bound backends, stay with default Batch mode.
4.3.3 Heap hard limits
You can enforce a hard cap on GC heap size with environment variables:
DOTNET_GCHeapHardLimit=300000000 # bytes
DOTNET_GCHeapHardLimitPercent=70 # percentage of total container memory
This overrides the 75% heuristic and ensures GC never exceeds that value. It’s especially useful when multiple containers share a pod.
4.3.4 NUMA awareness
Inside containers, NUMA awareness rarely helps unless the container spans cores from multiple NUMA nodes (uncommon in small nodes). If you see unpredictable GC performance in large-node clusters, you can disable NUMA-based heap partitioning:
DOTNET_GCCpuGroup=0
This avoids per-node heap allocation overhead.
4.4 Multiple processes per pod (app + sidecars): coordinating memory targets across processes
Modern .NET pods often include sidecars—Dapr, Envoy, OpenTelemetry collector, or metrics agents. Each process interprets the same memory limit independently, so each sets its heap to ~75% of the total. Add them up, and the node kernel sees 150–200% of the pod’s limit, triggering OOM kills.
For example:
containers:
- name: api
image: myservice
resources:
limits:
memory: 512Mi
- name: dapr
image: daprio/daprd
Each container believes it can use ~384 MiB. The fix is to split limits explicitly:
resources:
limits:
memory: 384Mi # app
---
resources:
limits:
memory: 128Mi # dapr
Or set DOTNET_GCHeapHardLimit manually in the .NET container to coordinate budgets. Always validate with kubectl top pod—if RSS for all containers exceeds the total limit, reduce heap caps.
When sidecars are essential, using Native AOT apps can reclaim enough headroom to stay within the combined limit.
4.5 Practical recipes
4.5.1 Microservice with tight 256–512 MiB limits (latency-biased)
For APIs that prioritize responsiveness and quick scaling:
DOTNET_GCServer=1
DOTNET_GCHeapHardLimitPercent=65
DOTNET_ReadyToRun=1
DOTNET_TieredPGO=1
Keep GC small enough to avoid background compaction pauses. In code, you can tune latency mode:
GCSettings.LatencyMode = GCLatencyMode.Interactive;
Publish trimmed or AOT builds to reduce JIT and metadata memory. Measure with dotnet-counters under load:
dotnet-counters monitor --process-id <pid> System.Runtime
4.5.2 gRPC service with higher throughput (server GC tuning)
gRPC workloads benefit from large object heap stability and throughput optimization. Configure:
DOTNET_GCServer=1
DOTNET_GCHeapHardLimitPercent=75
DOTNET_GCLatencyLevel=Batch
Ensure requests and limits match CPU expectations:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "768Mi"
Benchmark under load with ghz to confirm sustained throughput:
ghz --insecure --proto ./proto/service.proto --call MyService.Echo -d '{"msg":"hi"}' -n 10000 0.0.0.0:5000
Server GC with two or more cores typically yields smoother p95 latencies.
4.5.3 Short-lived job (SustainedLowLatency, no LOH thrash)
Event-driven jobs or ACA Jobs that start frequently should avoid GC stalls:
DOTNET_GCServer=0
DOTNET_GCHeapHardLimitPercent=60
In code:
GCSettings.LatencyMode = GCLatencyMode.SustainedLowLatency;
Use Native AOT where possible to eliminate JIT and minimize footprint. To prevent large-object heap churn:
ArrayPool<byte>.Shared.Rent(1024 * 32);
Reusing buffers keeps LOH allocation under control.
4.6 Measuring impact: RSS vs. GC heap vs. native allocations
When tuning GC, watch three distinct metrics:
- RSS (Resident Set Size) – Total physical memory, includes managed + native.
- GC heap size – Managed memory only; view via
dotnet-counters(gc-heap-size-bytes). - Native allocations – From thread stacks, runtime, buffers, JIT code (less in AOT).
Example command:
dotnet-counters collect --process-id <pid> --counters System.Runtime
Native AOT changes the shape of this curve: RSS is dominated by managed heap and native runtime segments, not JIT caches. Typically you’ll see 25–40% less total memory compared to equivalent R2R builds.
The key is correlating heap limits with RSS plateaus—when GC heap stabilizes but RSS keeps growing, you likely have unmanaged allocations (e.g., gRPC buffers or pinned arrays). Tune heap limits and buffer pools accordingly.
5 Real-world implementation on AKS and Azure Container Apps
Now that we’ve tuned build and runtime settings, let’s look at real-world deployment. The goal is to pair the right base image, pod spec, and scaling configuration to make container-optimized .NET deliver consistent performance across AKS and ACA.
5.1 Container base choices: Azure Linux/Mariner vs. Debian/Ubuntu; glibc vs. musl considerations for AOT; scanning and SBOM in CI
5.1.1 Azure Linux (Mariner)
Azure Linux (CBL-Mariner) is Microsoft’s container-optimized base. It uses glibc, supports musl images via compatibility shims, and offers smaller footprints than Debian-based images. Ideal for AKS workloads needing compliance scanning and long-term support.
5.1.2 Alpine (musl)
mcr.microsoft.com/dotnet/runtime-deps:8.0-alpine uses musl libc, which produces smaller AOT binaries and tighter memory usage. However, debugging tools (like dotnet-trace) can be limited, and some libraries assume glibc. For AOT microservices and ACA jobs, Alpine or Distroless is preferred.
5.1.3 Security and SBOM
Use dotnet publish --os linux --arch x64 --self-contained to avoid unnecessary dependencies. Then generate SBOMs in CI:
dotnet sbom generate --manifest ./manifest.spdx.json
Integrate with container scanners (e.g., Trivy, Azure Defender for Containers) to validate no outdated CVEs before pushing to ACR.
5.2 AKS: Pod specs that matter for .NET
5.2.1 Requests/limits for CPU/memory
Set CPU/memory requests based on queue math rather than guesswork. For example, if your service handles 100 req/s with 25 ms CPU time per request:
(100 * 0.025) = 2.5 CPU cores
With 50% buffer, set:
resources:
requests:
cpu: "3"
memory: "512Mi"
limits:
cpu: "3"
memory: "768Mi"
Minimum replicas are derived from expected p95 latency; always allocate one replica per vCPU for latency-critical APIs.
5.2.2 Startup probes vs. liveness/readiness for R2R/AOT apps
AOT apps start fast, often under 100 ms. To prevent premature restarts during image pulls or networking, decouple startup probes:
startupProbe:
httpGet:
path: /healthz
port: 80
periodSeconds: 3
failureThreshold: 20
Once startup is complete, readiness probes gate traffic routing:
readinessProbe:
httpGet:
path: /ready
port: 80
periodSeconds: 10
timeoutSeconds: 2
This distinction avoids false restarts when AOT images start before dependencies (e.g., databases) are ready.
5.2.3 Topology examples
- Single-container pod – simplest, best for APIs or workers.
- Sidecar (Dapr, Envoy) – add inter-container communication; budget memory explicitly.
- Init container warm-up – preload cache or compile templates before main container runs:
initContainers:
- name: warmup
image: curlimages/curl
command: ["sh", "-c", "curl -s http://localhost/prime-cache"]
This pattern is valuable when using R2R apps that benefit from warming data caches before traffic hits.
5.3 Azure Container Apps (ACA)
5.3.1 Cold-start reducers
ACA automatically scales from zero, so startup time directly affects latency. You can mitigate this with:
minReplicasto keep warm instances:
scale:
minReplicas: 1
maxReplicas: 10
- Use regional ACR and enable pre-pull for images.
- Keep AOT or trimmed images under 100 MB to minimize cold-start pull time.
- Open the port early in the app to signal readiness:
app.Urls.Add("http://*:8080");
- Define shorter custom probes for readiness.
5.3.2 Scaling rules (KEDA under the hood)
ACA uses KEDA for event-driven scaling. For queue-based jobs:
scale:
triggers:
- type: azure-queue
metadata:
queueName: myqueue
queueLength: "5"
For HTTP-based scaling, ACA watches concurrent requests per replica. Native AOT binaries help scale faster because they initialize in milliseconds.
Scheduled jobs use the cron trigger—ideal for lightweight AOT background tasks that run briefly and exit cleanly.
5.3.3 ACA vs. AKS decision points
Use ACA when:
- You need automatic scale-to-zero
- You want managed KEDA triggers
- You prioritize simplicity over customization
Use AKS when:
- You run service meshes or sidecars
- You require custom networking (private clusters, VNETs)
- You need advanced GC tuning and multi-container coordination
For most event-driven .NET services, ACA is simpler and cheaper; for tightly coupled microservices or mixed workloads, AKS remains the right tool.
5.4 Concrete walk-throughs (code + YAML snippets)
5.4.1 Minimal API service in three builds
Baseline JIT
dotnet publish -c Release
Image size: ~180 MB Startup: ~600 ms Memory: ~150 MB RSS
ReadyToRun
dotnet publish -c Release -p:PublishReadyToRun=true
Image size: ~220 MB Startup: ~380 ms Memory: ~130 MB RSS
Native AOT
dotnet publish -r linux-x64 -p:PublishAot=true
Image size: ~70 MB Startup: ~80 ms Memory: ~85 MB RSS
Deploy YAML excerpt:
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: myapi:aot
ports:
- containerPort: 8080
5.4.2 Background job (queue triggered) built as Native AOT
Program.cs:
var queue = new QueueClient("<conn>", "jobs");
await foreach (var msg in queue.ReceiveMessagesAsync())
{
await ProcessAsync(msg);
}
Publish:
dotnet publish -r linux-x64 -p:PublishAot=true -p:TrimMode=full
For ACA job configuration:
scale:
triggers:
- type: azure-queue
metadata:
queueName: jobs
template:
containers:
- image: myjob:aot
env:
- name: DOTNET_GCHeapHardLimitPercent
value: "60"
In AKS, equivalent as a CronJob:
schedule: "*/10 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: job
image: myjob:aot
5.4.3 Sidecar pattern: API + Dapr sidecar
For APIs using Dapr:
containers:
- name: api
image: myapi:aot
resources:
limits:
memory: 384Mi
- name: dapr
image: daprio/daprd:latest
resources:
limits:
memory: 128Mi
In code, you can disable Dapr tracing if redundant:
builder.Services.Configure<DaprOptions>(o => o.EnableTracing = false);
Monitor resource usage:
kubectl top pod myapi-pod
Expect combined RSS under 480 MiB with trimmed AOT builds, compared to 700+ MiB with baseline JIT. This difference can double your pod density on the same node pool.
6 Sidecars & service mesh: performance and cost
Sidecars and meshes bring powerful cross-cutting features to microservices—security, observability, and resilience—but they come at a measurable cost in startup time, memory, and CPU. For container-optimized .NET workloads, the goal is to decide when the benefits outweigh the overhead and how to integrate these components without undoing the gains from Native AOT or trimming.
6.1 When a sidecar is worth it (observability, retries, mTLS, state) vs. when embedded libraries are leaner
The sidecar pattern offloads networking and platform responsibilities to a separate process. Common use cases include mTLS enforcement, automatic retries, service discovery, and distributed tracing. In practice, this means adding a container such as Dapr, Envoy, or an OpenTelemetry collector alongside your .NET app.
Sidecars are worth it when:
- Security or compliance requires mTLS between services, and you can’t embed certificate rotation logic into each service.
- Multi-language polyglot systems need consistent retry/backoff or tracing without duplicating code.
- Stateful or event-driven integration is required (e.g., Dapr bindings or pub/sub).
However, each sidecar consumes 50–200 MiB of memory and adds inter-container latency (often 1–2 ms per hop). For lightweight APIs or short-lived jobs, embedded libraries are leaner and more predictable.
In-process options that replace sidecars effectively:
- Resilience – Polly for retries, circuit breakers, and fallback.
- Observability – OpenTelemetry SDK exporting directly via OTLP.
- Configuration/Secrets – Azure SDKs instead of external injectors.
For example, replacing a Dapr pub/sub call with direct Azure Service Bus SDK access:
await using var client = new ServiceBusClient(conn);
var sender = client.CreateSender("orders");
await sender.SendMessageAsync(new ServiceBusMessage(JsonSerializer.Serialize(order)));
This single in-proc call avoids network serialization overhead through the sidecar, making it ideal for latency-sensitive endpoints.
6.2 Dapr today: capabilities, performance considerations, and production configuration
Dapr has matured considerably, with focus shifting from developer convenience to production-grade performance and component predictability. Modern versions (v1.13+) support direct HTTP/gRPC integration, actor runtime optimization, and configurable connection pooling. Still, each Dapr sidecar adds measurable cost.
Typical footprint on AKS:
- Memory: 80–150 MiB RSS per Dapr sidecar
- CPU: ~50–100 millicores idle, more under load
- Startup delay: 200–400 ms for component initialization
6.2.1 Configuration to reduce impact
Limit Dapr’s scope to only what you use:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: pubsub
spec:
type: pubsub.azure.servicebus
version: v1
metadata:
- name: connectionString
secretRef:
name: servicebus-secret
scopes:
- orderservice
Avoid loading all default components—each one adds initialization cost. For high-QPS APIs, configure Dapr’s HTTP connection pool:
dapr.io/http-max-conns-per-host: "20"
dapr.io/http-max-idle-conns: "10"
Use dapr run --app-protocol grpc in development to match production wiring. For production builds, always pin to specific versions to avoid upgrade drift:
image: "daprio/daprd:1.13.2"
6.2.2 Performance testing
You can benchmark the Dapr sidecar’s added latency:
bombardier -c 50 -n 5000 http://localhost:3500/v1.0/invoke/orderapi/method/order
Expect roughly 1–1.5 ms per hop overhead compared to direct service invocation. With Native AOT apps, that extra hop may represent 10–15% of total request time, so confirm the trade-off aligns with business goals.
6.3 Envoy/Istio sidecars vs. gateway-only patterns for low-latency .NET APIs
Full meshes like Istio insert Envoy sidecars into every pod for traffic routing, telemetry, and mTLS. The result is powerful observability—but at the cost of per-pod overhead. Each Envoy typically consumes:
- 100–150 MiB memory
- 100–300 millicores CPU baseline
- Additional 0.5–2 ms per hop latency
For APIs optimized via AOT and trimming, that overhead can double end-to-end latency.
A practical alternative is the gateway-only pattern. In this model, only edge or shared ingress pods run Envoy/Istio, while backend services communicate directly over standard Kubernetes DNS. You still get centralized ingress routing and mTLS, but no per-pod sidecar.
Example configuration using Istio’s Gateway and VirtualService:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
hosts:
- "api.example.com"
This pattern scales better for small pods or ACA-style ephemeral instances. If you use a gateway-only mesh, make sure internal traffic still benefits from retry/backoff logic—handled in-process using Polly or the built-in HttpClientFactory.
6.4 Cost modeling: per-pod overhead multiplied by replicas
Sidecars introduce a fixed per-pod cost. For large clusters, that cost scales linearly with replica count. Suppose you deploy 50 microservices with 5 replicas each (250 pods) and each sidecar uses 100 MiB. That’s 25 GiB of RAM consumed just for sidecars, not business logic.
If each pod costs $0.001 per MiB-hour on Azure, that’s roughly $18/day in idle cost—over $500/month of pure overhead. Trimming or AOT builds that reduce app RSS by 30–40% can reclaim that headroom, letting you run both app and sidecar under the same budget.
You can visualize this trade-off with a quick model in C#:
double sidecarMemMiB = 100;
int replicas = 250;
double monthlyCostPerMiB = 0.001 * 24 * 30;
var total = sidecarMemMiB * replicas * monthlyCostPerMiB;
Console.WriteLine($"Sidecar monthly cost: ${total:F2}");
If your workloads depend on Dapr or Envoy, factor this baseline into pod sizing and choose node pools with higher pod density to amortize per-node idle cost.
7 Measuring what matters: counters, traces, and on-call dashboards
Optimization is only real if you can measure it. In containerized .NET, the right metrics are those that connect runtime behavior with container limits and user experience. You want numbers that help you answer, “Is this pod healthy under load?” and “Why did latency spike?” rather than hundreds of unrelated charts.
7.1 The short list of production counters to watch for .NET services in containers
7.1.1 GC counters
Key metrics:
% Time in GCGen0/1/2 Collection CountGC Heap Size (Bytes)LOH Size (Bytes)
These tell you if the heap is balanced. If % Time in GC exceeds 10–15% during normal load, you’re over-allocating or under-tuning heap size.
7.1.2 Thread pool metrics
ThreadPool Completed Work Items/secThreadPool Queue Length
Sudden queue buildup usually precedes latency spikes. Track how these behave under burst traffic to decide when to increase replicas.
7.1.3 Exception metrics
Exception CountFirst-Chance Exceptions/sec
Frequent first-chance exceptions are costly—even if caught. They often show up as minor latency drift before logs flag errors.
7.1.4 HTTP/gRPC counters
For ASP.NET Core:
requests-per-secondcurrent-requestsrequest-durationactive-connections
Expose these via Prometheus or OpenTelemetry. P95 and P99 latencies under load should guide your autoscaling thresholds.
7.1.5 Memory counters
working-setprivate-bytesgc-heap-sizecontainer memory usage(via cgroup)OOM kill count
These tie directly to AKS and ACA cost and stability. A rising working set with flat heap usually means native allocations leaking (e.g., pinned buffers).
7.2 Tooling in containers: dotnet-counters, dotnet-trace, dotnet-gcdump, dotnet-dump
All standard .NET diagnostics tools run inside containers now, even in restricted clusters.
-
dotnet-counters – real-time performance counter stream
dotnet-counters monitor --refresh-interval 1 --process-id 1 System.Runtime -
dotnet-trace – lightweight event tracing for performance analysis
dotnet-trace collect --process-id 1 --duration 30s -
dotnet-gcdump – GC heap snapshots
dotnet-gcdump collect --process-id 1 -
dotnet-dump – full process dumps for postmortem analysis
dotnet-dump collect --process-id 1
All these tools can be side-loaded into running pods using kubectl exec. For production safety, restrict tracing to short durations and redirect output to Azure Blob or persistent volumes.
7.3 OpenTelemetry for .NET: tracing and metrics pipelines
OpenTelemetry has become the default observability stack for containerized .NET. For minimal overhead, use OTLP exporters over gRPC.
Example setup
builder.Services.AddOpenTelemetry()
.WithMetrics(m => m.AddAspNetCoreInstrumentation()
.AddRuntimeInstrumentation()
.AddOtlpExporter())
.WithTracing(t => t.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddOtlpExporter());
For Azure Monitor:
.AddOtlpExporter(o => o.Endpoint = new Uri("https://otlp.azure.com"))
In AOT builds, ensure you reference explicit providers instead of using reflection-based discovery. The OpenTelemetry 1.7+ SDK is fully trimming- and AOT-compatible.
7.4 Example dashboards (Prometheus/Grafana)
Effective dashboards answer operational questions, not just show data. For containerized .NET, the core panels should include:
| Panel | Metric | What to look for |
|---|---|---|
| GC Heap | dotnet_gc_heap_size_bytes | Growth over time → memory pressure |
| GC Time % | dotnet_gc_time_ratio | >0.1 indicates GC contention |
| ThreadPool Queue | dotnet_threadpool_queue_length | Sustained >0 means CPU saturation |
| HTTP P95 latency | http_request_duration_seconds | SLO violations |
| Container Memory | container_memory_working_set_bytes | Rising → leaks or heap limits |
| CPU usage | container_cpu_usage_seconds_total | Correlate with thread pool saturation |
Example PromQL for GC time:
avg(rate(dotnet_gc_time_ratio[1m])) by (pod)
When alerts are too sensitive, alert on trends rather than single spikes—e.g., GC heap growth over 10 minutes.
7.5 Load testing harnesses (Bombardier, k6) and experiment templates
To evaluate JIT vs. R2R vs. AOT builds, run controlled load tests that measure cold-start and steady-state throughput. Tools:
-
Bombardier for quick HTTP benchmarks
bombardier -c 100 -n 10000 http://api/load -
k6 for scripted scenarios with thresholds
import http from 'k6/http'; import { check } from 'k6'; export default function () { let res = http.get('http://api/load'); check(res, { 'status was 200': (r) => r.status == 200 }); } -
ghz for gRPC load testing
Use these in CI/CD to confirm startup and p95 latency improvements translate to production behavior.
8 Putting it together: reference architectures, rollout, and a decision playbook
The last step is combining everything—build modes, trimming, GC tuning, observability—into coherent architectures that you can deploy and evolve safely.
8.1 Reference architecture A: low-latency API on ACA
Target: sub-200 ms cold start, 300 ms P95 under load.
- Build: Native AOT, trimmed,
InvariantGlobalization=true - Runtime: Server GC, 65% heap limit
- Deployment: ACA with
minReplicas: 1
Program.cs:
var app = WebApplication.CreateSlimBuilder(args).Build();
app.MapGet("/health", () => "OK");
app.Run();
containerapp.yaml:
scale:
minReplicas: 1
maxReplicas: 10
template:
containers:
- image: myapi:aot
env:
- name: DOTNET_GCHeapHardLimitPercent
value: "65"
8.2 Reference architecture B: high-throughput gRPC on AKS
Target: sustained 10k RPS, multi-core scaling.
- Build: R2R with dynamic PGO
- Runtime: Server GC, heap 75%
- Mesh: Gateway-only Envoy ingress
publish command:
dotnet publish -c Release -p:PublishReadyToRun=true -p:TieredPGO=1
Kubernetes deployment uses pinned CPU and memory:
resources:
requests:
cpu: "2"
memory: "1Gi"
limits:
cpu: "2"
memory: "1Gi"
Gateway handles TLS termination, avoiding per-pod sidecars.
8.3 Reference architecture C: event-driven job workers as ACA Jobs
- Build: Native AOT
- Runtime: Workstation GC, SustainedLowLatency
- Trigger: Azure Queue or Cron
job.yaml:
scale:
triggers:
- type: cron
metadata:
schedule: "*/15 * * * *"
template:
containers:
- image: jobworker:aot
env:
- name: DOTNET_GCHeapHardLimitPercent
value: "60"
Each job instance starts in under 100 ms and exits quickly after completion, minimizing compute cost.
8.4 Rollout steps: canaries and blue/green
- ACA – use revisions: deploy new image as a new revision, direct 10% traffic for 30 minutes, then promote.
- AKS – apply blue/green deployments with a temporary service routing.
- Observability bake-off – collect GC time %, startup latency, and memory for both versions.
- Fallback – pin old revision or rollback deployment if metrics regress.
Example ACA rollout command:
az containerapp revision set-mode --app myapi --mode multiple
8.5 Cost & performance worksheet
Estimate savings by comparing build modes:
| Metric | JIT | R2R | AOT |
|---|---|---|---|
| Image size | 180 MB | 220 MB | 70 MB |
| Startup | 600 ms | 380 ms | 80 ms |
| RSS | 150 MB | 130 MB | 85 MB |
| Cold start cost (ACA) | High | Medium | Low |
A 100-service cluster converting 50% of workloads to AOT can save dozens of cores and tens of GiB of memory monthly. Always validate with real metrics under load.
8.6 The decision checklist
8.6.1 Feature compatibility
Do you depend on reflection, dynamic proxies, or runtime codegen? Yes → R2R No → Native AOT
8.6.2 Latency sensitivity
If cold starts or P95 latency drive user experience, start with AOT + trimming, and use minReplicas for safety.
8.6.3 Sidecar requirements
If you require mTLS or centralized policies, use gateway-only or selective sidecars. Budget memory before GC tuning.
8.6.4 cgroup version awareness
On cgroup v2 clusters, revalidate heap limits. Set explicit DOTNET_GCHeapHardLimitPercent to avoid unexpected OOMs.
8.6.5 Diagnostics readiness
Ensure metrics and tracing pipelines are configured before rollout. Test with dotnet-counters in staging.