1 Practical Caching for .NET Microservices on Azure: Redis, Cosmos DB, and Cache-Aside in the Real World
Caching in .NET microservices is not just about making a slow API faster. It is about controlling latency, reducing database load, protecting shared dependencies, and keeping cloud cost predictable. In Azure-based systems, the usual stack includes ASP.NET Core, Azure Cache for Redis or Azure Managed Redis, Cosmos DB, and sometimes an L1 in-memory layer inside each service instance.
This article covers Sections 1–3 from the requested outline: the modern caching landscape in .NET 9 and 10, implementing cache-aside at scale, and resilience patterns for thundering herd and cache stampede scenarios. The structure and topic scope are based on the provided article brief.
1.1 Why This Matters Now
For years, most .NET applications used one of two approaches:
IMemoryCache cache;
IDistributedCache distributedCache;
IMemoryCache was fast but local to one process. IDistributedCache worked across multiple app instances, but it was intentionally minimal: byte arrays in, byte arrays out. Developers had to handle serialization, stampede protection, key naming, error handling, and refresh logic themselves.
That gap matters more in microservices. A product catalog service may run in ten pods behind Azure Kubernetes Service. Each pod may call Cosmos DB, Redis, downstream pricing APIs, identity services, and feature flag providers. A simple “cache miss, then query DB” implementation can work in development and fail badly under production traffic.
Microsoft introduced HybridCache in .NET 9 as a unified caching abstraction that combines in-memory and distributed caching with built-in features such as stampede protection and configurable serialization. It is available through the Microsoft.Extensions.Caching.Hybrid package and is documented for ASP.NET Core .NET 9 and 10 scenarios.
1.2 Common Mistakes (and Better Approaches)
The most common mistake is treating caching as a transparent performance switch. It is not. Caching changes system behavior.
Incorrect:
var product = await cache.GetAsync(id);
if (product == null)
{
product = await db.GetProductAsync(id);
await cache.SetAsync(id, product);
}
return product;
This looks fine, but it hides several problems:
The key is too vague. The cached shape is not versioned. There is no cancellation token. Multiple callers can stampede the database on expiry. The cache may return stale data after a write. And failures in Redis may break the request path unless handled carefully.
Better:
public static string ProductKey(
string tenantId,
string region,
string productId,
int schemaVersion)
{
return $"catalog:v{schemaVersion}:tenant:{tenantId}:region:{region}:product:{productId}";
}
Recommended:
Use explicit key design, clear expiration rules, stampede protection, observability, and a documented consistency model. A cache should be treated as part of the architecture, not as a helper utility.
1.3 What We’ll Cover
We will focus on practical implementation patterns:
API service
-> L1 cache: in-process memory / HybridCache
-> L2 cache: Azure Cache for Redis or Azure Managed Redis
-> Source of truth: Cosmos DB
-> Events: Azure Service Bus / Cosmos DB change feed
-> Observability: OpenTelemetry, Redis metrics, Cosmos DB RU metrics
ASP.NET Core supports in-memory caching, distributed caching, response caching, output caching, and HybridCache. In-memory caching is suitable for single-server scenarios or sticky-session scenarios, while distributed caching is useful when multiple app servers need shared cached data.
2 Foundations and Mental Models
2.1 Core Concepts and Terminology
The cache-aside pattern means the application owns the cache interaction. The application first checks cache. On miss, it reads from the database, stores the result in cache, and returns it.
That sounds simple, but the production version needs more decisions:
What is cached?
How long is it valid?
Who invalidates it?
Can stale data be served?
What happens if Redis is down?
How is cache behavior measured?
What happens during concurrent writes?
For .NET microservices on Azure, the main options are:
IMemoryCache
Fastest, local to one process, lost on restart.
IDistributedCache
Shared external cache abstraction, often backed by Redis.
HybridCache
Unified abstraction introduced in .NET 9 for multi-tier caching patterns.
Azure Cache for Redis / Azure Managed Redis
Shared distributed cache, useful for cross-instance reuse, hot data, and coordination.
Cosmos DB integrated cache
Gateway-based cache for Cosmos DB NoSQL accounts using the dedicated gateway.
Cosmos DB integrated cache is an in-memory cache built into Cosmos DB through the dedicated gateway. Applications must connect through the dedicated gateway endpoint to use it.
2.2 Dependencies, Inputs, and Constraints in This Domain
A good caching design starts with the workload.
For a high-read, low-write product catalog, caching product summary models makes sense. For a financial balance, security entitlement, or workflow approval state, aggressive caching can create correctness issues.
Use this simple rule:
Cache data that is expensive to read and acceptable to serve slightly stale.
Do not cache data where freshness is more important than latency unless invalidation is reliable.
Cosmos DB adds another constraint: request units. A cache hit can reduce RU consumption, but only if the cached object matches the API’s access pattern. Caching raw database documents often causes over-fetching. Caching view-ready projections usually gives better results.
Example projection:
public sealed record ProductCard(
string ProductId,
string Name,
string Brand,
decimal Price,
string Currency,
bool InStock,
string Region,
DateTimeOffset LastUpdatedUtc);
This is often better than caching the full product aggregate because the API response is already shaped for the caller.
2.3 Feedback Loops, Observability, and Measurement
Caching without measurement is guesswork. At minimum, capture:
cache.hit
cache.miss
cache.set
cache.error
cache.latency_ms
cache.keyspace
cache.eviction
source.read.latency_ms
source.read.ru_charge
A useful cache should improve at least one of these:
P95/P99 latency
Cosmos DB RU consumption
Downstream dependency calls
API throughput
Error rate during dependency degradation
OpenTelemetry is a good fit because it lets you add metrics and traces around cache access without hard-coding yourself to one monitoring backend.
using System.Diagnostics.Metrics;
public sealed class CacheMetrics
{
private readonly Counter<long> _hits;
private readonly Counter<long> _misses;
public CacheMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("Catalog.Cache");
_hits = meter.CreateCounter<long>("cache.hit");
_misses = meter.CreateCounter<long>("cache.miss");
}
public void Hit(string cacheName) =>
_hits.Add(1, new KeyValuePair<string, object?>("cache", cacheName));
public void Miss(string cacheName) =>
_misses.Add(1, new KeyValuePair<string, object?>("cache", cacheName));
}
Takeaways:
Caching is a data consistency decision first and a performance decision second. Measure it as a dependency, not as an invisible optimization.
3 Modern Caching Landscape in .NET 9 and 10
3.1 The Evolution of Caching: From IMemoryCache to HybridCache
IMemoryCache is still useful. It is extremely fast because it stays inside the process. But in a scaled-out microservice, each instance has its own copy. That means five pods may have five different cached values.
IDistributedCache solved the shared-cache problem, but the API is low-level. It does not know your object type, does not handle stampede protection, and does not automatically coordinate L1 and L2 behavior.
HybridCache addresses that middle ground. It gives .NET developers a higher-level API for cache retrieval and population, while supporting a multi-tier model. Microsoft’s documentation describes it as combining in-memory and distributed caching and addressing challenges in existing caching APIs.
A typical setup looks like this:
dotnet add package Microsoft.Extensions.Caching.Hybrid
dotnet add package Microsoft.Extensions.Caching.StackExchangeRedis
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("Redis");
options.InstanceName = "catalog:";
});
builder.Services.AddHybridCache();
var app = builder.Build();
app.Run();
Then use it in a service:
using Microsoft.Extensions.Caching.Hybrid;
public sealed class ProductQueryService
{
private readonly HybridCache _cache;
private readonly ProductRepository _repository;
public ProductQueryService(HybridCache cache, ProductRepository repository)
{
_cache = cache;
_repository = repository;
}
public async Task<ProductCard?> GetProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var key = $"catalog:v1:tenant:{tenantId}:region:{region}:product-card:{productId}";
return await _cache.GetOrCreateAsync(
key,
async token => await _repository.GetProductCardAsync(
tenantId,
region,
productId,
token),
cancellationToken: cancellationToken);
}
}
The important part is not just fewer lines of code. The important part is that cache population becomes a controlled operation instead of every request independently deciding to query the database.
3.2 Why Caching Fails in Microservices: Consistency, Latency, and Cost
Caching fails in microservices for three main reasons.
First, consistency. One service updates data, but another service still serves the old version from cache. TTL alone does not fix this. It only limits how long the wrong value may live.
Second, latency. Redis is fast, but it is still a network call. A local memory lookup may take microseconds; a Redis call involves serialization, network, TLS, server processing, and deserialization. For very hot small objects, L1 caching can matter.
Third, cost. Cosmos DB charges based on provisioned or consumed capacity models, and inefficient reads can become expensive. Redis adds its own cost, operations, memory planning, and availability design. The right question is not “Should we cache?” The better question is “Which read paths are expensive enough to justify cache complexity?”
3.3 State of the Art: Introduction to the .NET 9 HybridCache Abstraction
HybridCache is most useful when you want a standard .NET abstraction for cache-aside with reduced boilerplate. It is not a replacement for architectural thinking. You still need key design, expiration policy, invalidation strategy, and metrics.
A practical implementation should configure default expiration and serialization deliberately:
builder.Services.AddHybridCache(options =>
{
options.DefaultEntryOptions = new HybridCacheEntryOptions
{
Expiration = TimeSpan.FromMinutes(10),
LocalCacheExpiration = TimeSpan.FromMinutes(2)
};
});
The L1/L2 distinction matters:
L1: local memory cache
Very fast.
Per process.
Can become stale across pods.
L2: Redis distributed cache
Shared across instances.
Slower than memory.
Better for consistency across replicas.
Use short L1 expiration for frequently changing data and longer L2 expiration for stable projections. For example, a product description might tolerate ten minutes of staleness, while price availability may need a shorter TTL or event-driven invalidation.
3.4 Azure’s Role: Why Combine Azure Cache for Redis with Cosmos DB?
Cosmos DB is the source of truth. Redis is the read acceleration and coordination layer. They solve different problems.
Use Cosmos DB for durable data, partitioning, global distribution, change feed, indexing, and queryable JSON documents. Use Redis for hot read models, short-lived projections, distributed locks, rate counters, and cross-instance cache sharing.
Cosmos DB integrated cache is also an option, especially when the workload is strongly tied to Cosmos DB reads and you want caching without writing custom invalidation code. Microsoft describes the integrated cache as an in-memory cache that can help manage costs and low latency as request volume grows; it is configured through the dedicated gateway.
Redis is a better fit when:
Multiple services need the same cached projection.
You need explicit cache keys.
You need pub/sub invalidation.
You need distributed coordination.
You cache data from more than one backend.
Cosmos DB integrated cache is attractive when:
The cached reads are mostly Cosmos DB point reads or queries.
You want to reduce RU usage.
You prefer less application-level cache code.
You can route through the dedicated gateway.
Security also changed. Azure Redis offerings increasingly support Microsoft Entra ID authentication. Azure Managed Redis uses Microsoft Entra ID by default, and Microsoft’s Redis documentation recommends passwordless authentication patterns to reduce secret management risk.
4 Implementing the Cache-Aside Pattern at Scale
4.1 The Mechanics of Cache-Aside: More Than Just “If Null, Get”
A production cache-aside flow should look closer to this:
1. Build a deterministic cache key.
2. Try L1 cache.
3. Try L2 cache.
4. On miss, protect the database with single-flight/stampede protection.
5. Read from Cosmos DB.
6. Store a versioned projection in cache.
7. Emit metrics.
8. Return the result.
Here is a practical decorator around a repository:
public interface IProductReader
{
Task<ProductCard?> GetProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken);
}
public sealed class CachedProductReader : IProductReader
{
private const int SchemaVersion = 1;
private readonly HybridCache _cache;
private readonly IProductReader _inner;
public CachedProductReader(HybridCache cache, IProductReader inner)
{
_cache = cache;
_inner = inner;
}
public Task<ProductCard?> GetProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var key =
$"catalog:v{SchemaVersion}:tenant:{tenantId}:region:{region}:product-card:{productId}";
return _cache.GetOrCreateAsync(
key,
async token => await _inner.GetProductCardAsync(
tenantId,
region,
productId,
token),
new HybridCacheEntryOptions
{
Expiration = TimeSpan.FromMinutes(15),
LocalCacheExpiration = TimeSpan.FromMinutes(2)
},
cancellationToken: cancellationToken);
}
}
This keeps caching out of the domain repository and makes it easier to test.
4.2 Distributed Serialization: Optimizing System.Text.Json vs. Protobuf for Redis
Serialization becomes visible at scale. With Redis, the cached value is usually stored as bytes. JSON is readable and operationally friendly. Protobuf is compact and fast, but less transparent during troubleshooting.
Use JSON when:
You need easy debugging.
The payload size is moderate.
Schema changes are frequent.
Operational readability matters.
Use Protobuf when:
Payloads are large.
Network cost matters.
Throughput is very high.
You can manage schema contracts carefully.
A JSON serializer example:
public static class CacheJson
{
private static readonly JsonSerializerOptions Options = new(JsonSerializerDefaults.Web)
{
PropertyNameCaseInsensitive = false,
WriteIndented = false
};
public static byte[] Serialize<T>(T value) =>
JsonSerializer.SerializeToUtf8Bytes(value, Options);
public static T? Deserialize<T>(byte[] bytes) =>
JsonSerializer.Deserialize<T>(bytes, Options);
}
The trade-off is simple: JSON is easier to operate; Protobuf is often better for high-throughput internal paths. Do not start with Protobuf unless you have measured serialization overhead or payload size as a real bottleneck.
4.3 Key Design Strategies
4.3.1 Hierarchical Key Naming for Microservices
Cache keys should explain ownership and shape.
Recommended:
catalog:v1:tenant:contoso:region:us:product-card:ABC123
pricing:v3:tenant:contoso:region:us:price:ABC123
inventory:v2:tenant:contoso:warehouse:NY01:sku:ABC123
Avoid:
ABC123
product_ABC123
cache:product
A good key includes:
service or bounded context
schema version
tenant
region or partition
object type
identifier
This helps with troubleshooting, selective invalidation, and safe deployments.
4.3.2 Versioning Cache Schemas to Avoid Deployment Collisions
Cache schema versioning prevents old and new application versions from fighting over the same key.
Example:
public static class CacheKeys
{
public static string ProductCard(
string tenantId,
string region,
string productId)
{
return $"catalog:v2:tenant:{tenantId}:region:{region}:product-card:{productId}";
}
}
When you change the cached model shape, increment the version. Old keys expire naturally. New code writes to new keys. This avoids deserialization failures during rolling deployments.
4.4 Handling “Ghost Reads” and Race Conditions During Concurrent Updates
A ghost read happens when stale data reappears after an update. One request reads old data from the database, another request updates the database and invalidates the cache, then the first request writes the old value back into cache.
Problem timeline:
T1: Request A reads product from Cosmos DB.
T2: Request B updates product.
T3: Request B removes Redis key.
T4: Request A writes old product into Redis.
T5: Users see stale data again.
Better approaches:
Use version numbers or ETags.
Use write-through invalidation after commit.
Include LastUpdatedUtc in cached values.
Avoid caching results from reads that started before a known update.
Example guard:
public sealed record CachedEnvelope<T>(
T Value,
DateTimeOffset SourceLastUpdatedUtc,
DateTimeOffset CachedAtUtc);
Before writing to cache, compare source version metadata where possible. Cosmos DB items have ETags, and application-level UpdatedAtUtc fields are also useful for cache correctness.
5 Advanced Resilience: Solving the Thundering Herd and Stampedes
5.1 Anatomy of a Cache Stampede: Why Your DB Dies When the Cache Expires
A cache stampede happens when many requests miss the same key at the same time.
Example:
10,000 users request the same product page.
The Redis key expires.
All API pods miss the cache.
All pods query Cosmos DB.
RU usage spikes.
Latency increases.
Retries begin.
The system gets slower exactly when it needs to recover.
This is why TTL alone is dangerous for hot keys. A hot key needs protection.
5.2 Probabilistic Early Recomputation: Refreshing Data Before It Expires
Probabilistic Early Recomputation refreshes a cache entry before hard expiry, but not on every request. The closer the item gets to expiry, the more likely one request refreshes it.
A simplified implementation:
public static bool ShouldRefreshEarly(
DateTimeOffset now,
DateTimeOffset expiresAt,
TimeSpan refreshWindow)
{
var remaining = expiresAt - now;
if (remaining <= TimeSpan.Zero)
return true;
if (remaining > refreshWindow)
return false;
var elapsedRatio =
1.0 - remaining.TotalMilliseconds / refreshWindow.TotalMilliseconds;
var probability = Math.Clamp(elapsedRatio, 0.05, 0.80);
return Random.Shared.NextDouble() < probability;
}
This reduces synchronized expiry. It is especially useful for frequently read keys where serving stale data for a short time is acceptable.
5.3 Locking Strategies
5.3.1 Local Semaphores for Per-Instance Protection
A local semaphore prevents multiple requests inside the same process from recomputing the same value.
public sealed class LocalSingleFlight
{
private readonly ConcurrentDictionary<string, SemaphoreSlim> _locks = new();
public async Task<T> RunAsync<T>(
string key,
Func<Task<T>> factory)
{
var gate = _locks.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
await gate.WaitAsync();
try
{
return await factory();
}
finally
{
gate.Release();
}
}
}
This protects one pod, not the whole cluster. In AKS with ten replicas, ten pods may still recompute the same key.
5.3.2 Distributed Locking with RedLock.net for Cross-Service Synchronization
A distributed lock can reduce cross-instance recomputation. Libraries such as RedLock.net are commonly used with Redis for this pattern, but distributed locking should be used carefully. Locks add latency, failure modes, and operational complexity.
Use distributed locks when:
The recomputation is expensive.
The key is very hot.
Duplicate recomputation is harmful.
The lock timeout is short and safe.
Avoid distributed locks when:
The work is cheap.
The data changes frequently.
The lock can become a bottleneck.
You cannot tolerate lock acquisition failures.
A simplified shape:
public async Task<ProductCard?> GetWithDistributedLockAsync(
string key,
CancellationToken cancellationToken)
{
var cached = await TryGetFromCacheAsync(key, cancellationToken);
if (cached is not null)
return cached;
await using var distributedLock =
await _lockFactory.TryAcquireAsync(
resource: key,
expiry: TimeSpan.FromSeconds(10),
cancellationToken);
if (!distributedLock.Acquired)
{
await Task.Delay(50, cancellationToken);
return await TryGetFromCacheAsync(key, cancellationToken);
}
cached = await TryGetFromCacheAsync(key, cancellationToken);
if (cached is not null)
return cached;
var fresh = await _repository.GetProductCardAsync(key, cancellationToken);
await SetCacheAsync(key, fresh, cancellationToken);
return fresh;
}
Notice the second cache check after acquiring the lock. That avoids recomputing data that another request already refreshed.
5.4 Soft Expiration vs. Hard Expiration: Serving Stale-While-Revalidate Data
Hard expiration means the item is unusable after expiry. Soft expiration means the item is stale but still usable for a short grace period while refresh happens in the background.
A useful cache envelope:
public sealed record StaleAwareCacheEntry<T>(
T Value,
DateTimeOffset FreshUntilUtc,
DateTimeOffset ServeUntilUtc);
Read behavior:
Before FreshUntilUtc:
Serve from cache.
Between FreshUntilUtc and ServeUntilUtc:
Serve stale value.
Trigger background refresh.
After ServeUntilUtc:
Treat as cache miss.
Example:
public static CacheState GetState<T>(
StaleAwareCacheEntry<T> entry,
DateTimeOffset now)
{
if (now <= entry.FreshUntilUtc)
return CacheState.Fresh;
if (now <= entry.ServeUntilUtc)
return CacheState.StaleButUsable;
return CacheState.Expired;
}
public enum CacheState
{
Fresh,
StaleButUsable,
Expired
}
This pattern is valuable when availability matters more than perfect freshness. Product descriptions, category pages, public reference data, and feature metadata are good candidates. Payment status, access permissions, and approval workflow state are usually not.
6 The Hybrid Cache Revolution: L1 and L2 Integration
6.1 Why L2 Redis Is Sometimes Too Slow: The Cost of Network Hops and Deserialization
Redis is fast, but it is still outside the process. Every Redis lookup adds a network round trip, connection-pool behavior, TLS overhead if enabled, server-side processing, and deserialization back into a .NET object. For many APIs this is perfectly acceptable, especially when the alternative is a Cosmos DB query or a downstream service call. But for very hot read paths, Redis can become the second bottleneck after the database is protected.
The practical issue is not average latency. It is tail latency. A product detail API may usually read from Redis in a few milliseconds, but under pod pressure, network congestion, or large payload deserialization, the P95 and P99 can drift. If the API calls Redis five times in one request, small delays multiply quickly.
A better model is to treat Redis as the shared L2 cache and use process-local memory as the L1 cache. The L1 cache absorbs repeated reads inside the same pod. The L2 cache protects the database and keeps data broadly consistent across pods.
Request
-> L1 memory cache inside API pod
-> L2 Redis shared cache
-> Cosmos DB source of truth
This is the core reason HybridCache matters. Microsoft’s .NET caching guidance describes HybridCache as a .NET 9 abstraction that combines in-memory and distributed caching while adding features such as stampede protection and configurable serialization.
6.2 Tiered Caching Architecture
A tiered cache should be explicit. Do not let each developer choose their own TTL, key format, and fallback behavior inside controller methods. Put the rules into a service layer or caching decorator.
public sealed record CachePolicy(
TimeSpan LocalTtl,
TimeSpan DistributedTtl,
bool AllowStaleFallback);
Then apply the policy by use case:
public static class CatalogCachePolicies
{
public static readonly CachePolicy ProductCard = new(
LocalTtl: TimeSpan.FromMinutes(2),
DistributedTtl: TimeSpan.FromMinutes(15),
AllowStaleFallback: true);
public static readonly CachePolicy InventoryAvailability = new(
LocalTtl: TimeSpan.FromSeconds(15),
DistributedTtl: TimeSpan.FromMinutes(1),
AllowStaleFallback: false);
}
The policy tells future maintainers what the system values. Product cards can tolerate short staleness. Inventory availability usually cannot.
6.2.1 L1: In-Process Memory Fastest, Smallest
L1 is the fastest cache because it avoids the network entirely. It is best for frequently repeated reads within the same service instance: reference data, product cards, feature metadata, tenant configuration, and small view models.
The constraint is that L1 is not shared. In AKS, each pod has a separate memory cache. If one pod updates a value, the other pods do not automatically know. That is why L1 TTLs should usually be shorter than L2 TTLs.
builder.Services.AddMemoryCache(options =>
{
options.SizeLimit = 10_000;
});
When using raw IMemoryCache, set sizes if you configure a size limit:
public void SetProductCard(ProductCard product)
{
var options = new MemoryCacheEntryOptions()
.SetAbsoluteExpiration(TimeSpan.FromMinutes(2))
.SetSize(1);
_memoryCache.Set(
$"catalog:l1:product-card:{product.ProductId}",
product,
options);
}
L1 should hold compact objects. If a cached object is large enough to cause Gen 2 garbage collection pressure, it does not belong in local memory.
6.2.2 L2: Azure Cache for Redis Shared, Consistent
L2 is the shared cache. It is slower than L1 but much faster than recomputing the object or querying a heavily indexed data store. Redis is also useful because it supports patterns beyond simple get/set: pub/sub invalidation, counters, locks, and short-lived coordination data.
A common setup uses Redis through ASP.NET Core distributed caching:
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration["Redis:Configuration"];
options.InstanceName = "catalog-api:";
});
For new projects, prefer wrapping this behind HybridCache or another high-level cache abstraction instead of letting application services directly manipulate Redis values. The lower-level API is useful, but it pushes too many operational details into business code.
6.3 Leveraging FusionCache or .NET HybridCache for Automatic Synchronization
HybridCache is a strong default for new .NET 9 and .NET 10 services because it gives teams a Microsoft-supported abstraction with a clean GetOrCreateAsync flow. It also reduces boilerplate around cache population and coordinated misses. The package is installed through Microsoft.Extensions.Caching.Hybrid, as shown in Microsoft’s ASP.NET Core documentation.
builder.Services.AddHybridCache(options =>
{
options.DefaultEntryOptions = new HybridCacheEntryOptions
{
LocalCacheExpiration = TimeSpan.FromMinutes(2),
Expiration = TimeSpan.FromMinutes(15)
};
});
Usage stays clean:
public async Task<ProductCard?> GetAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var key = CacheKeys.ProductCard(tenantId, region, productId);
return await _hybridCache.GetOrCreateAsync(
key,
async token => await _repository.LoadProductCardAsync(
tenantId,
region,
productId,
token),
cancellationToken: cancellationToken);
}
FusionCache is also popular in production .NET systems because it provides high-level features such as fail-safe caching, soft timeouts, background refresh, and distributed cache integration. Use it when you need more control over stale fallback, backplane behavior, and advanced resilience behavior than the default abstraction provides.
The architectural rule is simple: application code should ask for data, not manage cache plumbing. The caching layer should own TTL, serialization, failure handling, and refresh behavior.
6.4 Backplane Communication: Using Redis Pub/Sub to Invalidate L1 Caches Across Pods
The weakness of L1 is stale data across pods. A Redis backplane solves this by publishing invalidation messages when data changes. Each pod subscribes to the invalidation channel and removes its local copy.
Redis keyspace notifications allow clients to subscribe to Pub/Sub channels for events that affect keys, and Azure Managed Redis currently documents keyspace notifications as a preview capability. For application-level cache invalidation, many teams still prefer explicit pub/sub messages because the payload can include domain context.
public sealed record CacheInvalidationMessage(
string CacheKey,
string Reason,
DateTimeOffset OccurredUtc);
Publishing after a successful write:
public async Task PublishInvalidationAsync(string cacheKey)
{
var subscriber = _redis.GetSubscriber();
var message = JsonSerializer.Serialize(new CacheInvalidationMessage(
cacheKey,
Reason: "product-updated",
OccurredUtc: DateTimeOffset.UtcNow));
await subscriber.PublishAsync("cache:invalidation", message);
}
Subscribing inside each API instance:
public sealed class CacheInvalidationSubscriber : BackgroundService
{
private readonly IConnectionMultiplexer _redis;
private readonly IMemoryCache _memoryCache;
public CacheInvalidationSubscriber(
IConnectionMultiplexer redis,
IMemoryCache memoryCache)
{
_redis = redis;
_memoryCache = memoryCache;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
var subscriber = _redis.GetSubscriber();
await subscriber.SubscribeAsync("cache:invalidation", (_, value) =>
{
var message = JsonSerializer.Deserialize<CacheInvalidationMessage>(value!);
if (message is not null)
{
_memoryCache.Remove(message.CacheKey);
}
});
}
}
The write path must publish only after the database commit succeeds. Otherwise, services may invalidate valid cache entries for a write that never completed.
7 Cosmos DB Integrated Cache vs. Sidecar Caching
7.1 Understanding the Cosmos DB Integrated Cache Gateway-Based
Cosmos DB integrated cache is different from Redis. It is not a general-purpose cache that your application fills with arbitrary keys. It sits in front of Cosmos DB through the dedicated gateway and caches point reads and query results. Microsoft’s documentation states that point reads and queries served from the integrated cache have zero RU charge, which makes it especially useful for read-heavy Cosmos DB workloads.
The application still uses the Cosmos DB SDK, but the connection must go through the dedicated gateway endpoint. That makes it attractive when the main goal is reducing RU cost without introducing Redis-specific cache-aside logic.
var client = new CosmosClient(
accountEndpoint: builder.Configuration["Cosmos:DedicatedGatewayEndpoint"],
tokenCredential: new DefaultAzureCredential(),
clientOptions: new CosmosClientOptions
{
ConnectionMode = ConnectionMode.Gateway
});
The main design trade-off is control. Redis gives explicit keys and application-managed projections. Cosmos DB integrated cache gives simpler read caching for Cosmos DB reads, but it does not replace Redis for cross-service coordination, pub/sub invalidation, distributed locks, or cached data built from multiple sources.
7.2 Comparing Redis as a Cache vs. Cosmos DB as a Persistent Cache
Redis works best when the cached value is a deliberate application projection. Cosmos DB integrated cache works best when the expensive operation is already a Cosmos DB point read or query.
Use Redis when the cached result combines multiple sources:
Product document from Cosmos DB
+ pricing from pricing service
+ inventory from inventory service
+ region rules from configuration
= ProductCard response model
Use Cosmos DB integrated cache when the repeated read is naturally a Cosmos DB read:
Read product by id and partition key
Run same query for active categories
Read tenant configuration document
This distinction matters because cache-aside with Redis changes the data model. Integrated cache does not. It accelerates reads behind the Cosmos DB API, while Redis usually stores an application-owned representation.
7.3 Implementation of Read-Through Style Wrappers Using Cosmos DB SDK
Even with integrated cache, it is useful to wrap Cosmos DB reads behind a query service. That keeps staleness settings, partition keys, and diagnostics in one place.
public sealed class CosmosProductReader
{
private readonly Container _container;
public CosmosProductReader(CosmosClient client)
{
_container = client.GetContainer("catalog-db", "products");
}
public async Task<ProductDocument?> ReadProductAsync(
string tenantId,
string productId,
CancellationToken cancellationToken)
{
try
{
var response = await _container.ReadItemAsync<ProductDocument>(
id: productId,
partitionKey: new PartitionKey(tenantId),
requestOptions: new ItemRequestOptions
{
DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
{
MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
}
},
cancellationToken: cancellationToken);
return response.Resource;
}
catch (CosmosException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
{
return null;
}
}
}
The wrapper makes the freshness decision visible. A five-minute staleness window may be fine for catalog metadata. It may be wrong for pricing, entitlements, or case status.
7.4 Cost-Benefit Analysis: When to Pay for Redis vs. When to Increase Cosmos DB RUs
Start with measurements, not assumptions. Measure Cosmos DB RU consumption by endpoint, Redis hit ratio, cache payload size, and latency percentiles. If Cosmos DB reads are the main cost and the read pattern is repetitive, integrated cache can reduce RU consumption. Microsoft’s Cosmos DB guidance recommends comparing consumed RUs before and after using integrated cache, then evaluating whether reduced throughput cost is greater than the dedicated gateway cost.
Redis is worth paying for when it reduces more than Cosmos DB cost. It may reduce API latency, protect multiple downstream systems, coordinate distributed workers, and support event-driven invalidation. Increasing Cosmos DB RUs is simpler when the workload is write-heavy, cache hit ratio is low, or freshness requirements make caching ineffective.
A practical decision table:
Mostly repeated Cosmos DB reads
Consider integrated cache.
Aggregated API responses from many systems
Use Redis.
Low cache hit ratio
Increase database capacity or improve query design first.
Hot keys with expensive recomputation
Use Redis with L1/L2 and stampede protection.
Strict freshness requirement
Avoid long TTL caching; use direct reads or event-driven invalidation.
8 Data Modeling: Derived Views vs. Raw Data Projections
8.1 The Raw Data Trap: Why Caching DB Entities Leads to Over-Fetching
Caching database entities feels natural because the entity already exists. But it often creates large payloads, unnecessary fields, and accidental coupling between storage design and API design.
Incorrect:
public sealed class ProductDocument
{
public string Id { get; set; } = default!;
public string TenantId { get; set; } = default!;
public string Name { get; set; } = default!;
public string Description { get; set; } = default!;
public List<ProductImage> Images { get; set; } = [];
public List<ProductAuditEntry> AuditHistory { get; set; } = [];
public Dictionary<string, object> Attributes { get; set; } = [];
}
If the API only needs the product name, brand, price, and stock indicator, caching the full document wastes memory and bandwidth. It also increases deserialization time on every cache hit.
Better:
public sealed record ProductCardCacheModel(
string ProductId,
string Name,
string Brand,
decimal DisplayPrice,
string Currency,
bool InStock);
Cache the model the API actually serves.
8.2 Caching View-Ready Models: Materializing Expensive Joins into the Cache
Microservices often build responses by combining several sources. That composition is expensive and should not be repeated for every request if the result is stable enough to cache.
public async Task<ProductCardCacheModel> BuildProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var product = await _productReader.ReadProductAsync(
tenantId,
productId,
cancellationToken);
var price = await _pricingClient.GetPriceAsync(
tenantId,
region,
productId,
cancellationToken);
var inventory = await _inventoryClient.GetAvailabilityAsync(
tenantId,
region,
productId,
cancellationToken);
return new ProductCardCacheModel(
product.Id,
product.Name,
product.Brand,
price.Amount,
price.Currency,
inventory.AvailableQuantity > 0);
}
This is the kind of model Redis should store. It is compact, response-oriented, and expensive enough to justify caching.
8.3 Domain-Driven Design in Caching: Caching Aggregates vs. Projections
In DDD terms, aggregates protect consistency. Projections optimize reads. Caching aggregates can be risky because callers may start depending on stale business state. Caching projections is safer because projections are already read models.
A product aggregate may include rules, supplier relationships, audit fields, and lifecycle state. A product card projection only answers one query: what should be shown on the product listing page?
Recommended boundary:
Command side
Load aggregate from source of truth.
Validate invariants.
Save changes.
Publish event.
Query side
Build projection.
Cache projection.
Invalidate or refresh projection after events.
This keeps caching away from write-side business rules.
8.4 The Change Feed Pattern: Automatically Updating Redis When Cosmos DB Data Changes
Cosmos DB change feed is a good fit for keeping Redis projections warm. Instead of waiting for users to hit a cold cache after a write, a background worker can observe changes and update or remove related Redis keys.
public sealed class ProductChangeFeedHandler
{
private readonly IProductProjectionBuilder _builder;
private readonly IDatabase _redis;
public async Task HandleChangesAsync(
IReadOnlyCollection<ProductDocument> changes,
CancellationToken cancellationToken)
{
foreach (var product in changes)
{
var projection = await _builder.BuildAsync(
product.TenantId,
product.Region,
product.Id,
cancellationToken);
var key = CacheKeys.ProductCard(
product.TenantId,
product.Region,
product.Id);
var json = JsonSerializer.Serialize(projection);
await _redis.StringSetAsync(
key,
json,
expiry: TimeSpan.FromMinutes(15));
}
}
}
This pattern moves cache refresh out of the request path. Users get lower latency, and the system avoids large cache-miss bursts after deployments or batch updates.
The failure mode is important: the change feed worker can fall behind. Track lag, processing errors, Redis write failures, and poison messages. If the worker is delayed, the API should still be able to rebuild the projection through cache-aside. That fallback keeps the system correct even when the warm-cache pipeline is temporarily unhealthy.
9 Operations, Invalidation, and Eviction Strategies
9.1 Beyond TTL: Event-Driven Invalidation Using Azure Service Bus
TTL is a safety net, not a complete invalidation strategy. In a high-scale API, waiting fifteen minutes for stale data to expire is often too slow after a product price, availability flag, or compliance status changes. A better approach is to publish a domain event after the source data changes and let cache-aware services invalidate or refresh their own keys.
Azure Service Bus fits this pattern well because it gives durable messaging, retries, dead-letter queues, and topic subscriptions. The write service does not need to know which API services cache product data. It only publishes ProductUpdated, and each subscriber decides what to do.
public sealed record ProductUpdatedEvent(
string TenantId,
string Region,
string ProductId,
DateTimeOffset UpdatedUtc);
Publishing after the database write succeeds:
public async Task PublishProductUpdatedAsync(ProductUpdatedEvent evt)
{
var sender = _serviceBusClient.CreateSender("catalog-events");
var message = new ServiceBusMessage(JsonSerializer.Serialize(evt))
{
Subject = "ProductUpdated",
ContentType = "application/json"
};
message.ApplicationProperties["tenantId"] = evt.TenantId;
message.ApplicationProperties["region"] = evt.Region;
await sender.SendMessageAsync(message);
}
A cache subscriber can remove or refresh the affected projection:
public async Task HandleProductUpdatedAsync(ProductUpdatedEvent evt)
{
var key = CacheKeys.ProductCard(
evt.TenantId,
evt.Region,
evt.ProductId);
await _redis.KeyDeleteAsync(key);
_memoryCache.Remove(key);
}
The important detail is ordering. Publish only after the source transaction is committed. Otherwise, the cache may be cleared for a write that never actually happened.
9.2 Active vs. Passive Eviction: Managing Memory Pressure in Redis Enterprise Tiers
Eviction is what Redis does when memory is full. In production, eviction should never be a surprise. If Redis starts evicting hot keys during peak traffic, the database suddenly receives extra load, and the system behaves as if the cache disappeared.
Passive eviction means keys expire naturally based on TTL. Active eviction means Redis removes keys under memory pressure according to the configured policy. For cache workloads, common policies include removing least recently used or least frequently used keys among keys that have an expiry.
A practical cache entry should always have an expiration:
await _redis.StringSetAsync(
key,
value,
expiry: TimeSpan.FromMinutes(15));
Avoid writing cache entries without TTL unless they are truly managed reference data. Persistent cache keys slowly turn Redis into an unmanaged database. That creates memory pressure, operational confusion, and painful cleanup work later.
For Redis Enterprise or large Azure Redis deployments, track memory fragmentation, used memory, evicted keys, expired keys, and command latency. Memory pressure is not only about total cache size. A small number of large objects can create instability just as easily as millions of tiny keys.
9.3 Observability and Monitoring
Caching should be visible in the same way database calls and HTTP dependencies are visible. If an endpoint becomes slow, engineers should be able to tell whether the cause is cache miss rate, Redis latency, serialization cost, Cosmos DB RU throttling, or downstream recomputation.
At minimum, dashboards should show:
Cache hit ratio by endpoint
Cache miss count by key family
Redis latency and timeout count
Evicted keys
Large keys
Hot keys
Cosmos DB RU charge after cache miss
Rebuild duration for cached projections
The most useful metric is not a global hit ratio. A 95 percent overall hit ratio can still hide a critical endpoint with a 20 percent hit ratio. Track cache behavior by bounded context and key family.
9.3.1 Tracking Cache Hit/Miss Ratios with OpenTelemetry
OpenTelemetry makes cache behavior measurable without tying the application to one monitoring vendor. Use counters for hits and misses, and histograms for cache access latency.
public sealed class CatalogCacheTelemetry
{
private readonly Counter<long> _hits;
private readonly Counter<long> _misses;
private readonly Histogram<double> _latency;
public CatalogCacheTelemetry(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("Catalog.Api.Cache");
_hits = meter.CreateCounter<long>("catalog.cache.hit");
_misses = meter.CreateCounter<long>("catalog.cache.miss");
_latency = meter.CreateHistogram<double>(
"catalog.cache.latency.ms");
}
public void RecordHit(string keyFamily, double latencyMs)
{
var tags = new KeyValuePair<string, object?>[]
{
new("key_family", keyFamily)
};
_hits.Add(1, tags);
_latency.Record(latencyMs, tags);
}
public void RecordMiss(string keyFamily, double latencyMs)
{
var tags = new KeyValuePair<string, object?>[]
{
new("key_family", keyFamily)
};
_misses.Add(1, tags);
_latency.Record(latencyMs, tags);
}
}
Avoid putting full cache keys into telemetry labels. Product IDs, tenant IDs, and user IDs create high-cardinality metrics and can increase monitoring cost. Use key families such as product-card, tenant-config, or inventory-summary.
9.3.2 Using Redis Insight to Identify Hot Keys and Big Keys
Redis Insight is useful during tuning because it helps engineers inspect memory usage, command patterns, and key distribution. Hot keys indicate uneven traffic or a missing L1 strategy. Big keys usually indicate that the cache is storing too much data per object.
A good operational review asks:
Which keys consume the most memory?
Which key families are requested most often?
Are any keys missing TTL?
Are large JSON payloads being cached repeatedly?
Are cache misses clustered after deployment?
A big key is often a data modeling issue. For example, caching an entire category with 5,000 products may look efficient until every request pulls a huge payload and deserializes it. Splitting the cache into smaller product-card projections is often safer.
9.4 Security: Managed Identities Entra ID for Redis and Cosmos DB Removing Connection Strings
Connection strings are convenient, but they become liabilities in production. They get copied into app settings, build pipelines, local machines, and support tickets. For Azure-hosted .NET services, prefer Microsoft Entra ID authentication with managed identities where supported.
For Cosmos DB, the application can authenticate using DefaultAzureCredential:
builder.Services.AddSingleton(_ =>
{
return new CosmosClient(
accountEndpoint: builder.Configuration["Cosmos:Endpoint"],
tokenCredential: new DefaultAzureCredential());
});
The managed identity must be granted the correct Cosmos DB data-plane role. Do not give broad access if the service only needs read access to one database or container.
For Redis, use Entra ID authentication where available in the selected Azure Redis offering. The goal is the same: remove passwords from configuration and let Azure manage identity, rotation, and access assignment. This makes deployments safer and reduces the blast radius of leaked configuration.
10 Real-World Implementation: Building a High-Scale API Service
10.1 Scenario: A Global Product Catalog with High-Read/Low-Write Traffic
Assume a global product catalog API serving web, mobile, and partner channels. Reads are heavy. Writes are controlled through internal admin workflows and batch imports. Product metadata changes a few times per day, while pricing and availability change more frequently.
The API must support regional responses because catalog visibility, currency, and inventory differ by market. That means the cache key cannot be only productId. It must include tenant, region, and projection type.
catalog:v2:tenant:contoso:region:us:product-card:ABC123
catalog:v2:tenant:contoso:region:uk:product-card:ABC123
The goal is not to cache everything. The goal is to cache the expensive, stable read paths and leave volatile paths with shorter TTLs or direct reads.
10.2 Step-by-Step Architecture
The architecture uses Cosmos DB as the durable source, Redis as the shared cache, and local memory through HybridCache or a cache library as the L1 layer. Azure Service Bus carries invalidation events. A change feed worker can warm selected projections after bulk updates.
Client
-> API Gateway
-> Catalog API
-> HybridCache L1
-> Redis L2
-> Cosmos DB
-> Service Bus invalidation events
-> Change feed projection worker
This layout keeps the request path fast but still gives the system a reliable fallback. If Redis is unavailable, the API can read from Cosmos DB with appropriate throttling and circuit-breaker protection.
10.2.1 Defining the Repository Pattern with Caching Decorators
Keep the base repository focused on data access. Put cache behavior in a decorator.
public sealed class ProductReader : IProductReader
{
private readonly Container _container;
public async Task<ProductCard?> GetProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var response = await _container.ReadItemAsync<ProductDocument>(
productId,
new PartitionKey(tenantId),
cancellationToken: cancellationToken);
return ProductMappings.ToProductCard(
response.Resource,
region);
}
}
The cached decorator controls keys and policy:
public sealed class CachedProductReader : IProductReader
{
private readonly HybridCache _cache;
private readonly IProductReader _inner;
public Task<ProductCard?> GetProductCardAsync(
string tenantId,
string region,
string productId,
CancellationToken cancellationToken)
{
var key = CacheKeys.ProductCard(
tenantId,
region,
productId);
return _cache.GetOrCreateAsync(
key,
token => _inner.GetProductCardAsync(
tenantId,
region,
productId,
token),
cancellationToken: cancellationToken);
}
}
This makes caching replaceable and testable.
10.2.2 Configuring Policy-Based Caching By User, By Region
Not every request should share the same cache entry. Some responses vary by region, customer segment, role, language, or entitlement. The cache policy must reflect that variation.
public sealed record ProductCacheContext(
string TenantId,
string Region,
string Segment,
string Language);
public static string ProductDetail(
ProductCacheContext context,
string productId)
{
return string.Join(':',
"catalog",
"v2",
$"tenant:{context.TenantId}",
$"region:{context.Region}",
$"segment:{context.Segment}",
$"lang:{context.Language}",
$"product-detail:{productId}");
}
Do not include user ID unless the response is truly user-specific. User-level cache keys create poor reuse and high memory consumption. Prefer segment-level or role-level variation where the business rules allow it.
10.3 Benchmarking Results: Performance Gains and Infrastructure Cost Savings
A useful benchmark compares four paths:
Direct Cosmos DB read
Redis L2 cache hit
L1 memory cache hit
Cache miss with projection rebuild
The exact numbers depend on payload size, region, networking, SDK configuration, and indexing. The benchmark should run in an Azure environment close to production, not only on a developer laptop.
A simple BenchmarkDotNet test can isolate serialization cost:
[MemoryDiagnoser]
public class ProductSerializationBenchmarks
{
private readonly ProductCard _product = ProductSamples.Create();
private byte[] _json = [];
[GlobalSetup]
public void Setup()
{
_json = JsonSerializer.SerializeToUtf8Bytes(_product);
}
[Benchmark]
public byte[] SerializeJson()
{
return JsonSerializer.SerializeToUtf8Bytes(_product);
}
[Benchmark]
public ProductCard? DeserializeJson()
{
return JsonSerializer.Deserialize<ProductCard>(_json);
}
}
Track cost by endpoint before and after caching. A successful implementation should show lower Cosmos DB RU consumption, reduced P95 latency, and fewer dependency calls during peak traffic.
10.4 Conclusion: A Checklist for Architects Starting a New .NET Azure Project
Use this checklist before adding caching to a new service:
Identify the expensive read paths first.
Cache projections, not raw database documents.
Design versioned keys from the start.
Separate L1 and L2 expiration policies.
Avoid user-specific keys unless required.
Add event-driven invalidation for important updates.
Measure hit ratio by key family, not only globally.
Track Redis latency, evictions, hot keys, and big keys.
Use managed identity where supported.
Test cache outage behavior before production.
Document the staleness model for each cached response.
The strongest caching designs are boring in production. They use explicit keys, measured policies, safe fallbacks, and clear ownership. For .NET microservices on Azure, that usually means combining local memory, Redis, Cosmos DB, Service Bus events, and OpenTelemetry into one deliberate read strategy rather than adding cache calls wherever latency looks high.