1 Introduction: The High Cost of a Simple Retry
Distributed systems are powerful but unforgiving. The smallest oversight in request handling can ripple into large-scale business incidents—lost money, corrupted data, and a loss of customer trust. One of the most deceptively simple but destructive culprits is retries. A client retries a request because of a timeout, a proxy retries on connection drop, or a server replays a message after a crash. If your API is not designed for idempotency, every retry can become a duplicate side effect.
This article begins with a real-world inspired nightmare scenario and then gradually builds the case for designing idempotent APIs in ASP.NET Core. We’ll explore how idempotency keys, the Outbox pattern, and exactly-once semantics come together to protect critical workflows. Along the way, you’ll see practical .NET implementations, database schemas, and trade-offs you’ll need to weigh as a senior developer, architect, or tech lead.
1.1 The “Double-Charge” Nightmare
Imagine you’re running a payment API for an e-commerce platform. A customer submits a request to pay $120 for an order. The client application sends an HTTP POST /payments call to your API. The request reaches your service, and the database transaction completes successfully—the payment is recorded, and the customer’s credit card is charged. Perfect.
Except the client never hears back.
Somewhere between your server and the client, a network timeout occurs. The client sees no response. From its perspective, it has no idea whether the payment was processed or dropped. To be safe, it retries the exact same request.
Your API—built without idempotency protection—processes the second request as a brand-new payment. The result? The customer is charged twice. Support tickets pile up. Refund processes kick in. Your team scrambles to find out how such a simple retry turned into a critical failure.
This “double-charge” nightmare isn’t unique to payments. Any workflow involving state mutations—creating orders, updating inventory, booking seats, registering subscriptions—is vulnerable to duplication when retries are possible. And in distributed systems, retries are not just possible—they’re inevitable.
1.2 What is Idempotency?
The mathematical definition of idempotency is simple: an operation is idempotent if executing it multiple times yields the same result as executing it once. In API design, this translates to:
Multiple identical requests should have the same effect as a single request.
For example:
GET /orders/123is idempotent. Calling it once or ten times returns the same order details.PUT /orders/123 { "status": "shipped" }is also idempotent. Reapplying the same status update multiple times doesn’t create new orders or duplicate effects.DELETE /orders/123is idempotent. Whether called once or repeatedly, the order is deleted (or remains deleted).POST /payments { "amount": 120 }is not idempotent. Each call may create a new payment record and trigger multiple charges.
HTTP standards reflect this distinction. Methods like GET, PUT, and DELETE are defined as idempotent. Methods like POST and PATCH are not. But here’s the catch: just because the HTTP method is “supposed” to be idempotent doesn’t mean your implementation automatically is. The semantics depend on how you design your business logic.
Moving from this academic definition to practice, idempotency in APIs ensures:
- No duplicate charges or payments.
- No duplicated orders or inventory reductions.
- No inconsistent states across services.
It’s not about performance—it’s about correctness and trust. Customers may tolerate a slow checkout. They will not forgive being charged twice.
1.3 The Goal: From At-Least-Once to Exactly-Once Processing
When designing distributed systems, you face the unavoidable reality of delivery guarantees. Messages and requests don’t always arrive exactly once. Failures at different layers force you into trade-offs.
The three most common guarantees are:
- At-Most-Once: A request is sent, and at most, it is processed once. If failures occur, the request may be lost forever. This is fast and simple but unacceptable for critical systems.
- At-Least-Once: The request is guaranteed to be delivered, but possibly multiple times. This is the default mode for most message brokers and API retries. Reliability is higher, but duplication is inevitable.
- Exactly-Once: The holy grail. Each request is processed once and only once. Achieving this in practice is not trivial—it usually means combining at-least-once delivery with idempotent processing logic to simulate exactly-once behavior.
In this guide, we aim to achieve exactly-once semantics for APIs in ASP.NET Core by layering:
- Idempotency Keys: Ensuring retries of the same client operation do not cause duplicate side effects.
- Outbox Pattern: Preventing inconsistencies when writing to a database and publishing to a message broker.
- Inbox Pattern: Ensuring consumers handle duplicates gracefully when receiving messages.
By the end, you’ll have the tools to build APIs that withstand retries and failures without corrupting your system or your customer’s trust.
2 The Root of the Problem: Why Duplicates Happen in Distributed Systems
Now that we’ve set the stage, let’s dive into the mechanics of why duplicates are unavoidable in distributed environments. It’s not enough to say “network issues happen.” We need to understand where and how retries sneak into the system so we can design defenses at the right layers.
2.1 The Three Classes of Failure
Duplicate requests typically arise from one of three sources: client, intermediary, or server-side.
2.1.1 Client-Side Failures
Consider a mobile app calling your API. The app sends a payment request, but just as the response is on its way back, the user’s network connection drops momentarily. The app receives no response. From its perspective, it has no idea whether the payment went through. Its only safe option is to retry.
Without idempotency, this retry risks duplicating the payment. With idempotency, the retry would return the original result, and the customer remains safe.
2.1.2 Network Intermediary Failures
Between client and server sit many intermediaries—load balancers, API gateways, reverse proxies, and CDNs. These layers often implement automatic retry logic. For example, if a gateway detects a TCP reset when forwarding a request to your backend, it may silently retry the request against another instance. This behavior is invisible to the client, yet your service suddenly receives the same request twice.
If your API doesn’t account for this, you’re already exposed to duplicates without even knowing it.
2.1.3 Server-Side Failures (and Message Queues)
Finally, consider server-side failures. Suppose your API is consuming messages from a queue like RabbitMQ or Azure Service Bus. The consumer processes the message and updates the database, but then crashes before sending the acknowledgment back to the broker. From the broker’s perspective, the consumer failed. It will redeliver the same message later, ensuring at-least-once delivery.
If your consumer isn’t idempotent, the same message may apply its business effect multiple times—leading again to double charges, duplicated orders, or corrupted state.
In short, clients retry, intermediaries retry, and servers retry. If you’re not building idempotency into your design, duplication is a certainty, not a possibility.
2.2 A Quick Look at Delivery Guarantees
Understanding delivery guarantees is central to designing resilient APIs. Let’s break them down more carefully.
2.2.1 At-Most-Once Delivery
At-most-once delivery means the system attempts to process each request once, but if anything fails during delivery, the request may be lost forever. No retries are made. This mode is simple and avoids duplicates, but it risks data loss.
For example, a stock-trading API using at-most-once semantics could drop an order during network issues. That’s catastrophic. Very few modern systems accept this guarantee outside of non-critical telemetry or fire-and-forget notifications.
2.2.2 At-Least-Once Delivery
At-least-once delivery is the default in reliable systems. The system retries until it gets confirmation that the request was processed. This eliminates data loss but introduces duplicates. Payment systems, inventory management, and most message brokers operate in this mode.
The safety net is there—your message won’t vanish—but without idempotent processing, duplicates cause their own form of chaos.
2.2.3 Exactly-Once Delivery
Exactly-once delivery means each request is processed once and only once. Achieving this is complex because networks, clients, intermediaries, and servers all introduce retries. Instead of relying on the network to guarantee exactly-once, we usually simulate it by:
- Delivering messages at-least-once.
- Designing processing logic to be idempotent.
- Using deduplication strategies like idempotency keys, outbox, and inbox patterns.
This layered approach allows us to give our customers the perception of exactly-once behavior, even though under the hood, retries and duplicates may be occurring.
3 The Idempotency Key: Your API’s First Line of Defense
The first building block toward reliable exactly-once semantics in APIs is the idempotency key. This simple mechanism—an opaque token carried with each request—enables the server to distinguish between a genuine new operation and a retry of an already submitted one. Implemented properly, it transforms potentially dangerous retries into safe replays of previous responses. In practice, this often means adding just one HTTP header and some careful server-side persistence logic, but the devil is in the details.
3.1 The Idempotency-Key Header Pattern
The Idempotency-Key pattern is increasingly standardized across the industry. Stripe, PayPal, and other payment providers have popularized it, and the IETF has a draft proposal for it (Idempotency-Key HTTP header). Its beauty lies in its simplicity: the client generates a unique key per logical operation, and the server uses this key to enforce idempotent behavior.
3.1.1 Client Responsibility: Generating the Key
On the client side, the rule is straightforward:
- One logical operation → One idempotency key.
- Retries of the same operation must use the same key.
- Distinct operations must use different keys.
For example, when a client initiates a payment, it generates a UUID to represent that payment attempt. Whether the request is sent once or retried ten times, the Idempotency-Key remains identical.
A client in C# might generate such a key like this:
var request = new HttpRequestMessage(HttpMethod.Post, "/payments")
{
Content = new StringContent(JsonSerializer.Serialize(new { amount = 120 }), Encoding.UTF8, "application/json")
};
request.Headers.Add("Idempotency-Key", Guid.NewGuid().ToString());
By making the client responsible for generating the key, the server can unambiguously associate retries with the same logical intent.
3.1.2 Server Responsibility: The Lifecycle of an Idempotent Request
The server’s role is more involved. It must handle the lifecycle of requests associated with an idempotency key, including caching results and managing concurrency.
The standard flow looks like this:
-
Check for Key The server extracts the
Idempotency-Keyheader from the request. If missing for an endpoint that requires it, return400 Bad Request.var key = httpContext.Request.Headers["Idempotency-Key"].FirstOrDefault(); if (string.IsNullOrEmpty(key)) { context.Result = new BadRequestObjectResult("Missing Idempotency-Key header."); return; } -
Lookup Query persistent storage (SQL, NoSQL, or cache) to see if the key already exists. This determines the handling path.
-
Three Possible Paths
-
Key Found, Request Completed The request has already been processed. Return the exact same response that was generated earlier. This includes both status code and response body, ensuring perfect replay.
if (record.State == "Completed") { context.Result = new ContentResult { StatusCode = record.StatusCode, Content = record.ResponseJson, ContentType = "application/json" }; return; } -
Key Found, Request In-Progress Another request with the same key is currently being processed. Returning a 409 (
Conflict) or 429 (Too Many Requests) signals to the client that it should back off and retry later. This prevents concurrent duplicate processing of the same operation.if (record.State == "InProgress") { context.Result = new StatusCodeResult(StatusCodes.Status409Conflict); return; } -
Key Not Found This is a brand-new request. The server should insert a new record marking the key as “InProgress,” process the operation, then store the response as “Completed” once finished. This ensures subsequent retries reuse the cached result.
await _idempotencyService.CreateRequestAsync(key, requestHash); var result = await next(); // process the action await _idempotencyService.UpdateRequestAsync(key, result);
-
3.1.3 Handling Concurrency and Locking
A subtle but crucial point is concurrency control. Imagine two retries of the same request arrive almost simultaneously. Without protection, both might see “Key Not Found” and process independently, leading to duplication.
To avoid this:
- Use database transactions with
SELECT ... FOR UPDATEor optimistic concurrency tokens. - Or, in Redis, use atomic commands like
SETNXto claim a lock.
This ensures only one request per key proceeds; others either wait or fail fast with a 409.
3.1.4 Returning the Exact Same Response
The server must not only avoid duplicate side effects but also provide the exact same HTTP response as the original. This includes:
- Status Code (e.g., 201 Created)
- Headers (e.g.,
Locationfor newly created resources) - Body (the JSON or payload returned)
Persisting these in the idempotency store is non-negotiable. Otherwise, a retry might yield a different response than the original, confusing clients and breaking consistency guarantees.
A schema for storage might include:
CREATE TABLE IdempotencyRecords (
Key UNIQUEIDENTIFIER PRIMARY KEY,
RequestHash NVARCHAR(256) NOT NULL,
ResponseJson NVARCHAR(MAX) NOT NULL,
StatusCode INT NOT NULL,
State NVARCHAR(20) NOT NULL, -- 'InProgress', 'Completed'
CreatedAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);
With this table, the API can reliably cache and replay responses.
3.2 Choosing a Good Idempotency Key
The reliability of the entire pattern depends on the quality of the key. Poorly designed keys can undermine idempotency by either colliding across distinct operations or being too brittle to reuse across retries.
3.2.1 UUIDs/GUIDs: The Standard Choice
Universally unique identifiers (UUIDs) are the safest and most common approach:
- Low collision probability across distributed clients.
- Easy to generate in virtually every programming language.
- Opaque—they carry no meaning, so clients can’t misuse them by encoding business data.
For example, in .NET:
var idempotencyKey = Guid.NewGuid().ToString("N");
This yields a 32-character string without dashes, ideal for headers.
UUID-based keys are especially valuable in payment systems, order processing APIs, and any workflow where duplication must be impossible.
3.2.2 Hash-Based Keys from Request Payloads
An alternative is deriving the key from the request payload itself—for example, computing a SHA-256 hash of the serialized JSON body. The idea is that identical requests will map to the same key, while distinct payloads map to different keys.
This approach has caveats:
- Brittleness: Even small changes in JSON formatting (e.g., whitespace or property order) change the hash.
- Ambiguity: Two requests may look semantically identical but differ slightly in serialization, producing different hashes.
- Security Risk: If the payload contains sensitive data, storing raw hashes may inadvertently expose information.
For example, hashing a payment request:
using var sha = SHA256.Create();
var json = JsonSerializer.Serialize(requestBody);
var hash = Convert.ToHexString(sha.ComputeHash(Encoding.UTF8.GetBytes(json)));
Hash-based keys may be useful in systems where the client cannot reliably generate UUIDs, but in most cases, they introduce more problems than they solve.
3.2.3 Why Opaque UUIDs Are Better
Opaque UUIDs sidestep many of the pitfalls:
- They don’t require canonicalizing payloads.
- They’re independent of request content, so retries remain consistent.
- They allow clients to explicitly control the lifecycle of operations.
For this reason, UUIDs should be the default choice. Payload hashes should be reserved for specialized cases where business rules demand deduplication based on content, not explicit operation identifiers.
3.2.4 Expiration and Reuse
Finally, consider how long an idempotency key should remain valid. Keys cannot live forever, or your persistence store will bloat indefinitely. Common practices:
- Expire after 24–72 hours, long enough for retries but not forever.
- Reject reuse of expired keys to prevent stale replay attacks.
- Use TTL policies in databases (e.g., SQL scheduled jobs, Cosmos DB TTL, Redis expiration).
By managing lifecycle correctly, you strike a balance between reliability and storage efficiency.
4 Practical Implementation in ASP.NET Core
Having explored the conceptual side of idempotency keys, let’s now bring this down to earth with a working implementation in ASP.NET Core. This is where the theory meets the practical engineering decisions you’ll face when building real-world APIs. We’ll cover architectural choices, step through building a reusable [Idempotent] action filter, and design a clean persistence abstraction that keeps the solution extensible. Along the way, we’ll highlight common pitfalls and demonstrate idiomatic .NET code you can adapt to your own projects.
4.1 Architecture Choice: Middleware vs. Action Filters
When integrating idempotency checks into an ASP.NET Core application, the first decision is architectural: should you implement it as middleware or as an action filter?
4.1.1 Middleware Approach
Middleware is powerful because it sits in the request pipeline before MVC even kicks in. This means:
- It can apply globally across all endpoints.
- It can short-circuit requests early, improving performance.
- It’s well-suited for services where almost every endpoint needs idempotency (e.g., a dedicated payment gateway).
A middleware-based implementation might look like this:
public class IdempotencyMiddleware
{
private readonly RequestDelegate _next;
public IdempotencyMiddleware(RequestDelegate next)
{
_next = next;
}
public async Task InvokeAsync(HttpContext context, IIdempotencyService service)
{
if (!context.Request.Headers.TryGetValue("Idempotency-Key", out var key))
{
await _next(context);
return;
}
var record = await service.GetResponseAsync(key);
if (record != null)
{
context.Response.StatusCode = record.StatusCode;
await context.Response.WriteAsync(record.ResponseJson);
return;
}
await service.CreateRequestAsync(key, "InProgress");
await _next(context);
// After response is written, hook into pipeline to save it
}
}
While this gives global coverage, it has limitations:
- It’s harder to tie into controller-specific logic or return different error semantics (e.g.,
409 Conflictfor in-progress requests). - Serializing responses after middleware has already written them to the response stream requires workarounds.
4.1.2 Action Filter Approach
Action filters integrate tightly with MVC:
- They run before and after controller actions.
- They can capture action results before they are written to the response.
- They’re declarative: you can apply them only where needed.
For example:
[HttpPost("payments")]
[Idempotent]
public async Task<IActionResult> CreatePayment(CreatePaymentRequest request)
{
var result = await _paymentService.ProcessAsync(request);
return CreatedAtAction(nameof(GetPayment), new { id = result.Id }, result);
}
This makes intent explicit. Developers can see at a glance which endpoints are protected, without wondering if middleware silently applies.
Because of this flexibility, the filter approach is often preferred in APIs where only some endpoints require idempotency, such as financial transactions, order placement, or resource creation.
We will focus on the action filter attribute approach in the sections that follow.
4.2 Building an Idempotent Action Filter
An action filter in ASP.NET Core allows us to intercept execution around a controller action. The OnActionExecutionAsync method gives us a pre/post hook: we can check idempotency before executing the action, and we can cache the result afterward.
4.2.1 The Attribute Definition
Let’s start by defining a custom attribute that developers can apply to actions:
[AttributeUsage(AttributeTargets.Method)]
public class IdempotentAttribute : Attribute, IAsyncActionFilter
{
public async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
{
var httpContext = context.HttpContext;
var service = httpContext.RequestServices.GetRequiredService<IIdempotencyService>();
var key = httpContext.Request.Headers["Idempotency-Key"].FirstOrDefault();
if (string.IsNullOrEmpty(key))
{
context.Result = new BadRequestObjectResult("Missing Idempotency-Key header.");
return;
}
var record = await service.GetResponseAsync(key);
if (record != null)
{
if (record.State == "Completed")
{
context.Result = new ContentResult
{
Content = record.ResponseJson,
StatusCode = record.StatusCode,
ContentType = "application/json"
};
return;
}
if (record.State == "InProgress")
{
context.Result = new StatusCodeResult(StatusCodes.Status409Conflict);
return;
}
}
await service.CreateRequestAsync(key, "InProgress");
var executedContext = await next();
if (executedContext.Result is ObjectResult objectResult)
{
var responseJson = JsonSerializer.Serialize(objectResult.Value);
await service.UpdateRequestAsync(key, responseJson, objectResult.StatusCode ?? 200);
}
}
}
This attribute:
- Reads the
Idempotency-Keyheader. - Checks if the key already exists in persistence.
- Short-circuits if completed or in-progress.
- If new, marks the request as “InProgress.”
- Executes the action and then stores the serialized response.
4.2.2 Dependency Injection of the Persistence Service
Notice how the filter resolves IIdempotencyService from the HttpContext.RequestServices. This makes it easy to swap different storage implementations (SQL, Cosmos, Redis) without changing the filter.
The key design principle here: separate orchestration from persistence. The filter orchestrates the lifecycle, while the service handles the storage details.
4.2.3 Managing Request and Response Bodies
Serializing responses is straightforward when they’re represented as ObjectResult. However, you may encounter:
- Custom results like
FileResultorEmptyResult. - Streams that cannot be re-serialized once consumed.
In such cases:
- Limit idempotency protection to JSON-based endpoints where caching is practical.
- Add fallbacks for unsupported result types (e.g., return
501 Not Implementedif an unsupported type is used under an idempotency filter).
A robust implementation should be clear about its scope. Trying to make every possible action result idempotent is rarely worth the complexity.
4.2.4 Incorrect vs. Correct Handling Example
Incorrect: Saving only a flag that the key was processed, but not the response.
await service.UpdateRequestAsync(key, null, 200); // Missing response body
On retry, the client would get a 200 OK but no body—breaking API contracts.
Correct: Saving both status and full response.
var json = JsonSerializer.Serialize(objectResult.Value);
await service.UpdateRequestAsync(key, json, objectResult.StatusCode ?? 200);
The client now sees identical results across retries.
4.3 Persistence Layer Abstraction
To decouple the filter from storage specifics, we define an abstraction. This also makes it easier to unit test the filter by mocking the service.
4.3.1 The IIdempotencyService Interface
public interface IIdempotencyService
{
Task<IdempotencyRecord?> GetResponseAsync(string key);
Task CreateRequestAsync(string key, string state);
Task UpdateRequestAsync(string key, string responseJson, int statusCode);
}
And the corresponding model:
public class IdempotencyRecord
{
public string Key { get; set; } = default!;
public string State { get; set; } = default!;
public string? ResponseJson { get; set; }
public int StatusCode { get; set; }
}
This abstraction covers the full lifecycle:
- GetResponseAsync: Look up by key.
- CreateRequestAsync: Insert an in-progress record.
- UpdateRequestAsync: Finalize the record with the response.
4.3.2 Example SQL Implementation
Here’s a simplified Entity Framework Core implementation:
public class EfCoreIdempotencyService : IIdempotencyService
{
private readonly AppDbContext _db;
public EfCoreIdempotencyService(AppDbContext db)
{
_db = db;
}
public async Task<IdempotencyRecord?> GetResponseAsync(string key)
{
var record = await _db.IdempotencyRecords.FindAsync(key);
return record == null ? null : new IdempotencyRecord
{
Key = record.Key,
State = record.State,
ResponseJson = record.ResponseJson,
StatusCode = record.StatusCode
};
}
public async Task CreateRequestAsync(string key, string state)
{
_db.IdempotencyRecords.Add(new IdempotencyEntity
{
Key = key,
State = state,
CreatedAt = DateTime.UtcNow
});
await _db.SaveChangesAsync();
}
public async Task UpdateRequestAsync(string key, string responseJson, int statusCode)
{
var record = await _db.IdempotencyRecords.FindAsync(key);
if (record != null)
{
record.State = "Completed";
record.ResponseJson = responseJson;
record.StatusCode = statusCode;
await _db.SaveChangesAsync();
}
}
}
This design is deliberately straightforward:
- It assumes a simple table
IdempotencyRecords. - It handles updates with
SaveChangesAsync, relying on EF Core’s change tracking.
4.3.3 Benefits of Abstraction
By abstracting persistence:
- We can swap to Redis for high-speed caching.
- We can switch to Cosmos DB for global-scale services.
- Unit tests can stub the service with an in-memory dictionary.
This aligns with the Open/Closed principle: the filter logic remains closed to modification but open to extension via new service implementations.
5 Persistence Strategies and Schemas
An idempotency filter is only as reliable as its persistence layer. Without durable and consistent storage, retries could slip through undetected or responses could be lost, breaking the guarantees we’ve worked so hard to build. The choice of persistence strategy determines not only correctness but also performance, scalability, and operational cost. In this section, we’ll explore three major approaches—SQL, NoSQL (Cosmos DB), and distributed caches like Redis—each with their own schemas, concurrency controls, and trade-offs.
5.1 SQL Database (Entity Framework Core)
Relational databases remain the default persistence choice for many ASP.NET Core applications. They offer strong transactional guarantees, mature tooling, and deep integration with Entity Framework Core. Let’s design an idempotency schema for SQL and see how to handle concurrency safely.
5.1.1 Schema Design
A straightforward schema for idempotency records might look like this:
CREATE TABLE IdempotencyKeys (
Key UNIQUEIDENTIFIER PRIMARY KEY,
RequestHash NVARCHAR(256) NOT NULL, -- To ensure retries are identical in content
Response NVARCHAR(MAX) NOT NULL,
StatusCode INT NOT NULL,
CreatedAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
State NVARCHAR(20) NOT NULL, -- 'InProgress', 'Completed'
RowVersion ROWVERSION NOT NULL
);
Key aspects:
- Key: A UUID provided by the client in the
Idempotency-Keyheader. - RequestHash: Optional but recommended. This ensures that if a client accidentally reuses a key with a different payload, the server can detect a mismatch and reject it.
- Response: Serialized JSON response cached for replay.
- RowVersion: An optimistic concurrency token managed by SQL Server. It changes automatically on updates, enabling safe detection of conflicting writes.
5.1.2 Entity Framework Core Model
Here’s how this table maps into EF Core:
public class IdempotencyEntity
{
public Guid Key { get; set; }
public string RequestHash { get; set; } = default!;
public string Response { get; set; } = default!;
public int StatusCode { get; set; }
public DateTime CreatedAt { get; set; }
public string State { get; set; } = default!; // "InProgress", "Completed"
[Timestamp]
public byte[] RowVersion { get; set; } = default!;
}
And configuration in OnModelCreating:
modelBuilder.Entity<IdempotencyEntity>()
.Property(e => e.RowVersion)
.IsRowVersion();
5.1.3 Handling Concurrency
The most subtle challenge is concurrency. Consider two requests with the same key arriving almost simultaneously. Without concurrency control, both might attempt to create a new record, leading to duplicates.
There are two main strategies:
Optimistic Concurrency
EF Core throws a DbUpdateConcurrencyException if an update conflicts with another write that modified the same row in the meantime. This works well for updates (e.g., moving from “InProgress” to “Completed”), but for inserts, you should catch SqlException for duplicate primary keys.
try
{
_db.IdempotencyRecords.Add(entity);
await _db.SaveChangesAsync();
}
catch (DbUpdateException ex) when (ex.InnerException is SqlException sql && sql.Number == 2627)
{
// Duplicate key violation: another request inserted first
}
Pessimistic Concurrency
Alternatively, you can lock rows while processing:
SELECT * FROM IdempotencyKeys WITH (UPDLOCK, ROWLOCK)
WHERE Key = @Key;
This ensures that if one request is handling a key, others must wait until the lock is released. It provides stronger guarantees but can reduce throughput under contention.
5.1.4 Example Service Logic
public async Task CreateRequestAsync(string key, string requestHash)
{
var entity = new IdempotencyEntity
{
Key = Guid.Parse(key),
RequestHash = requestHash,
State = "InProgress",
CreatedAt = DateTime.UtcNow
};
try
{
_db.IdempotencyRecords.Add(entity);
await _db.SaveChangesAsync();
}
catch (DbUpdateException ex) when (ex.InnerException is SqlException sql && sql.Number == 2627)
{
// Handle duplicate key - another request already in progress
}
}
SQL’s strength lies in its durability and strict transactional semantics. It’s a great fit for systems where correctness is paramount, though latency may be higher compared to in-memory stores.
5.2 NoSQL (Azure Cosmos DB)
When building APIs at global scale, relational databases may struggle with latency and geo-replication. Azure Cosmos DB provides low-latency reads and writes worldwide, automatic scaling, and built-in TTL (time-to-live) support. Let’s design an idempotency schema for Cosmos DB.
5.2.1 Document Design
A Cosmos DB document for an idempotency record might look like this:
{
"id": "uuid-goes-here",
"requestHash": "sha256-hash-of-request",
"response": {
"paymentId": "abc123",
"status": "created"
},
"statusCode": 201,
"state": "Completed",
"ttl": 86400
}
Key details:
- id: The
Idempotency-Keyitself, allowing O(1) point reads. - ttl: Ensures automatic cleanup after a specified period (e.g., 24 hours).
5.2.2 Handling Concurrency with ETags
Cosmos DB provides optimistic concurrency via ETags. Each document has an _etag field. When you perform a conditional update, you supply the ETag you last read. If the document has changed in the meantime, the operation fails.
Example in C#:
var requestOptions = new ItemRequestOptions
{
IfMatchEtag = existingRecord.ETag
};
await container.ReplaceItemAsync(record, record.Id, new PartitionKey(record.Id), requestOptions);
If another request modified the document (e.g., updating from “InProgress” to “Completed”), this update will fail with a PreconditionFailed error, which you can catch and retry safely.
5.2.3 Example Repository
public class CosmosIdempotencyService : IIdempotencyService
{
private readonly Container _container;
public CosmosIdempotencyService(Container container)
{
_container = container;
}
public async Task<IdempotencyRecord?> GetResponseAsync(string key)
{
try
{
var response = await _container.ReadItemAsync<IdempotencyRecord>(key, new PartitionKey(key));
return response.Resource;
}
catch (CosmosException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
{
return null;
}
}
public async Task CreateRequestAsync(string key, string requestHash)
{
var record = new IdempotencyRecord
{
Key = key,
RequestHash = requestHash,
State = "InProgress",
CreatedAt = DateTime.UtcNow
};
await _container.CreateItemAsync(record, new PartitionKey(record.Key));
}
public async Task UpdateRequestAsync(string key, string responseJson, int statusCode)
{
var record = await GetResponseAsync(key);
if (record != null)
{
record.State = "Completed";
record.ResponseJson = responseJson;
record.StatusCode = statusCode;
await _container.UpsertItemAsync(record, new PartitionKey(record.Key));
}
}
}
Cosmos DB’s ability to automatically expire documents makes it especially appealing for idempotency, since you don’t need to run cleanup jobs manually.
5.3 Distributed Cache (Redis)
For APIs where performance is critical, Redis offers sub-millisecond lookups. It’s often used as a first-level check to offload the database. However, because Redis is an in-memory store, it cannot serve as the sole source of truth—persistence is weaker, and data can be lost during restarts.
5.3.1 Use Case
Redis is best for:
- Quickly rejecting duplicate requests.
- Storing in-progress states temporarily.
- Acting as a cache in front of SQL or Cosmos DB.
It’s not recommended as the only persistence for financial transactions or mission-critical workflows.
5.3.2 Atomic Operations with SETNX
Redis provides the SETNX command (“set if not exists”), which atomically inserts a key only if it doesn’t exist. This prevents race conditions.
Example in C# using StackExchange.Redis:
public async Task<bool> TryCreateRequestAsync(string key)
{
var db = _redis.GetDatabase();
return await db.StringSetAsync(
key,
"InProgress",
expiry: TimeSpan.FromHours(24),
when: When.NotExists);
}
If TryCreateRequestAsync returns false, another request already claimed the key.
5.3.3 Lua Scripts for Atomic Check-and-Set
For more complex logic (e.g., update state from “InProgress” to “Completed” only if it hasn’t changed), Lua scripting ensures atomic multi-step operations:
-- Pseudocode for updating if current state is InProgress
if redis.call("GET", KEYS[1]) == "InProgress" then
redis.call("SET", KEYS[1], ARGV[1]) -- new state
return 1
else
return 0
end
Using Lua avoids race conditions where a GET followed by a SET could be interleaved by other operations.
5.3.4 Hybrid Pattern: Redis + SQL
A common production setup is:
- Check Redis for a completed response (fast path).
- If found, return immediately.
- If not found, proceed with SQL (slow path).
- After processing, update both SQL and Redis.
This gives the speed of Redis with the durability of SQL.
5.4 Trade-Offs: A Comparison Table
Choosing the right persistence depends on your priorities. Here’s a high-level comparison:
| Strategy | Durability | Latency | Consistency | Complexity |
|---|---|---|---|---|
| SQL (EF Core) | Strong ACID guarantees | Higher (ms) | Strong, transactional | Moderate (migrations, locks) |
| Cosmos DB | Strong consistency (configurable) | Low globally | Optimistic w/ ETags | Higher (partitioning, costs) |
| Redis | Volatile (memory, optional AOF/RDB) | Sub-ms | Eventual if hybrid | Moderate (scripts, TTL) |
- SQL is the safest choice for financial-grade correctness.
- Cosmos DB is excellent for geo-distributed APIs requiring low latency at scale.
- Redis shines as a performance accelerator but must usually be combined with a durable store.
6 The Dual-Write Problem: When Your API Does More Than One Thing
So far, we’ve focused on idempotency in the narrow sense: ensuring that retries of the same client request don’t cause duplicate side effects. But modern APIs rarely just write to one database table and call it a day. In practice, most APIs are part of an event-driven architecture where actions must ripple outward—publishing messages to a queue, notifying other services, or invalidating caches. And this is where we hit the notorious dual-write problem: performing two critical operations, only one of which succeeds.
6.1 The Hidden Failure Mode
Let’s walk through a common scenario: creating an order.
-
Begin API request The client posts to
POST /orderswith order details. -
Save to the database Your controller or service layer writes the order into the database.
var order = new Order { Id = Guid.NewGuid(), Status = "Created" }; _dbContext.Orders.Add(order); await _dbContext.SaveChangesAsync(); // ✅ Database write succeeds -
Publish an event After persisting, you want to notify downstream services—say, the inventory service should reserve stock. You publish an event via a message bus.
await _messageBus.PublishAsync(new OrderCreatedEvent(order.Id)); // ❌ Fails due to network issue
Now we have a disaster: the order exists in the database, but no event was published. The inventory service doesn’t know stock should be reserved, and the shipping service doesn’t know an order exists. From the client’s perspective, the API call returned success, but the system’s state is inconsistent.
6.1.1 Why Retries Don’t Fix It
Suppose the client retries. The second attempt will find that the order already exists in the database, so it doesn’t create a duplicate. But unless the retry also republishes the event, the downstream systems are still blind. If you try to republish automatically, you risk duplicating events. Either way, correctness is broken.
6.1.2 Ghost Data
This failure mode is often called a ghost record or phantom state:
- The database contains a valid record (e.g., order, payment).
- Downstream systems have no trace of it.
- Eventual consistency fails because no mechanism ensures reconciliation.
In production, this manifests as missing shipments, unreserved stock, or payments not linked to orders. Debugging is nightmarish: logs show the order exists, but other systems act as if it doesn’t. Ghost data erodes trust both internally (operations teams) and externally (customers).
6.1.3 Why Transactions Alone Don’t Help
A natural question: why not wrap both SaveChanges() and _messageBus.Publish() in a database transaction? The issue is that message brokers are separate systems. SQL Server transactions cannot span into RabbitMQ, Kafka, or Azure Service Bus. Even if you use distributed transactions (via MSDTC), these don’t scale well across cloud-native, polyglot architectures. You need a pattern that respects the boundaries of distributed systems.
6.2 Introducing the Transactional Outbox Pattern
The Transactional Outbox pattern elegantly solves the dual-write problem by flipping our perspective. Instead of treating the database and message broker as two independent systems to write to, we consolidate writes into one durable system—the database—and let an asynchronous process bridge the gap.
6.2.1 The Concept
The key idea:
- During the API request, write both the business entity (e.g.,
Order) and the outgoing message into the same database transaction. - If the transaction commits, both the order and the message exist.
- If the transaction rolls back, neither does.
- A separate background process (relay) later reads the message from the database and publishes it to the broker.
The database becomes the temporary source of truth for outgoing messages. No message leaves until it is first safely persisted.
6.2.2 Schema Example: Outbox Table
A simple schema might look like this:
CREATE TABLE OutboxMessages (
Id UNIQUEIDENTIFIER PRIMARY KEY,
EventType NVARCHAR(255) NOT NULL,
Payload NVARCHAR(MAX) NOT NULL,
CreatedAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
ProcessedAt DATETIME2 NULL
);
6.2.3 Writing to the Outbox
Instead of calling the message bus directly, the API writes an OutboxMessage entity:
var order = new Order { Id = Guid.NewGuid(), Status = "Created" };
_dbContext.Orders.Add(order);
var outbox = new OutboxMessage
{
Id = Guid.NewGuid(),
EventType = nameof(OrderCreatedEvent),
Payload = JsonSerializer.Serialize(new OrderCreatedEvent(order.Id)),
CreatedAt = DateTime.UtcNow
};
_dbContext.OutboxMessages.Add(outbox);
await _dbContext.SaveChangesAsync(); // ✅ Both entities saved atomically
At this point:
- The order is saved.
- The event is recorded in the outbox.
- If a crash occurs before commit, neither is persisted.
6.2.4 Relaying Messages
A separate background service, running in the same app or as a standalone worker, polls the OutboxMessages table:
- Fetch unprocessed messages (
ProcessedAt IS NULL). - Publish them to the broker.
- Mark them as processed (
ProcessedAt = now()).
This decouples the API request from broker availability. If RabbitMQ or Kafka is down, the outbox simply accumulates messages. When the broker comes back online, the relay resumes publishing. No events are lost, and no ghosts are created.
6.2.5 Atomicity Guaranteed
By relying on the database transaction:
- You guarantee that either both the order and its event exist, or neither does.
- You avoid the split-brain scenario where one succeeds and the other fails.
- You gain durability: the outbox acts as a persistent queue until the broker is available.
This is why the Outbox pattern is considered a cornerstone of reliable event-driven APIs. It’s not about performance—it’s about correctness. And correctness is the bedrock on which everything else is built.
7 Implementing the Outbox Pattern in .NET
Now that we’ve introduced the Outbox pattern conceptually, let’s roll up our sleeves and implement it in ASP.NET Core with Entity Framework Core. This section will walk step by step through schema design, entity modeling, integration into service logic, and building a robust relay service to publish messages from the database to your broker. By the end, you’ll see how to turn theoretical guarantees into working code you can trust in production.
7.1 Schema and EF Core Model
The Outbox table is central to the pattern. It acts as a reliable staging area for messages before they’re sent to the broker. Each row corresponds to one outgoing event.
7.1.1 SQL Schema
CREATE TABLE OutboxMessages (
Id UNIQUEIDENTIFIER PRIMARY KEY,
EventType NVARCHAR(255) NOT NULL,
Payload NVARCHAR(MAX) NOT NULL,
CreatedAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
ProcessedAt DATETIME2 NULL
);
This schema includes:
- Id: Unique identifier for the message.
- EventType: Helps consumers (or relays) deserialize payloads properly.
- Payload: Serialized JSON of the event.
- CreatedAt: Timestamp when the event was created.
- ProcessedAt: Null until the message has been published; then set to the processing timestamp.
7.1.2 EF Core Entity
public class OutboxMessage
{
public Guid Id { get; set; }
public string EventType { get; set; } = default!;
public string Payload { get; set; } = default!;
public DateTime CreatedAt { get; set; }
public DateTime? ProcessedAt { get; set; }
}
7.1.3 Adding to DbContext
public class AppDbContext : DbContext
{
public DbSet<OutboxMessage> OutboxMessages { get; set; } = default!;
public AppDbContext(DbContextOptions<AppDbContext> options)
: base(options) { }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<OutboxMessage>().HasKey(m => m.Id);
modelBuilder.Entity<OutboxMessage>().Property(m => m.CreatedAt).HasDefaultValueSql("SYSUTCDATETIME()");
}
}
With this in place, our application can persist outbox messages alongside business entities inside a single transaction.
7.2 Modifying the Service Logic
The next step is to modify service logic. Instead of publishing directly to a message bus, we enqueue a message into the Outbox table. This ensures atomicity with the business data.
7.2.1 Old (Problematic) Code
public async Task<Order> CreateOrderAsync(OrderRequest request)
{
var order = new Order { Id = Guid.NewGuid(), Status = "Created" };
_dbContext.Orders.Add(order);
await _dbContext.SaveChangesAsync();
// ❌ Risky: if this fails, the order exists but no event is published
await _messageBus.PublishAsync(new OrderCreatedEvent(order.Id));
return order;
}
7.2.2 Corrected Code with Outbox
public async Task<Order> CreateOrderAsync(OrderRequest request)
{
var order = new Order { Id = Guid.NewGuid(), Status = "Created" };
_dbContext.Orders.Add(order);
var outboxMessage = new OutboxMessage
{
Id = Guid.NewGuid(),
EventType = nameof(OrderCreatedEvent),
Payload = JsonSerializer.Serialize(new OrderCreatedEvent(order.Id)),
CreatedAt = DateTime.UtcNow
};
_dbContext.OutboxMessages.Add(outboxMessage);
// ✅ Atomic: order and outbox message saved in one transaction
await _dbContext.SaveChangesAsync();
return order;
}
Here, the database transaction ensures that either both the order and outbox message are persisted or neither is. The actual publishing will happen later, outside the request.
7.3 The Message Relay: The Background Processor
Once messages are in the Outbox, we need a reliable way to relay them to the broker. This is done by a background service that polls for unprocessed messages.
7.3.1 Architecture
Two deployment models are common:
- In-process hosted service: A
BackgroundServicerunning inside the same ASP.NET Core application. Easier to deploy but shares resources with the API. - Separate worker service: A dedicated .NET Worker Service or container, consuming the same database. Offers better isolation and scalability.
For critical systems, the second approach is recommended.
7.3.2 Processing Loop
Here’s a minimal hosted service:
public class OutboxRelayService : BackgroundService
{
private readonly IServiceProvider _services;
private readonly ILogger<OutboxRelayService> _logger;
public OutboxRelayService(IServiceProvider services, ILogger<OutboxRelayService> logger)
{
_services = services;
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
using var scope = _services.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
var bus = scope.ServiceProvider.GetRequiredService<IMessageBus>();
var messages = await db.OutboxMessages
.Where(m => m.ProcessedAt == null)
.OrderBy(m => m.CreatedAt)
.Take(50)
.ToListAsync(stoppingToken);
foreach (var message in messages)
{
var evt = JsonSerializer.Deserialize<object>(message.Payload);
await bus.PublishAsync(evt!);
message.ProcessedAt = DateTime.UtcNow;
}
await db.SaveChangesAsync(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing outbox messages");
}
await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
}
}
}
This loop:
- Queries unprocessed messages in batches.
- Publishes them to the broker.
- Marks them as processed.
7.3.3 Concurrency at Scale
With multiple relay instances, we must ensure messages aren’t processed twice. Strategies include:
- Database locks: In SQL Server, use
READPASTor row locks. - PostgreSQL: Use
SELECT ... FOR UPDATE SKIP LOCKEDto fetch rows without blocking others. - Leasing mechanism: Mark rows with a temporary lock field (e.g.,
LockedUntil) before processing.
Example PostgreSQL query:
SELECT * FROM OutboxMessages
WHERE ProcessedAt IS NULL
FOR UPDATE SKIP LOCKED
LIMIT 50;
This ensures each message is claimed by only one relay worker.
7.3.4 Error Handling and Retries
Publishing can fail (e.g., broker unavailable). A robust relay:
- Logs failures.
- Leaves messages unprocessed, so they’re retried in the next cycle.
- Optionally implements exponential backoff or dead-lettering for poison messages.
7.4 Leveraging Existing Libraries
While implementing Outbox manually provides learning and control, production systems often benefit from mature libraries that abstract much of the boilerplate:
- MassTransit: Provides an outbox middleware integrated with EF Core. Handles transactional enqueuing and background dispatch.
- NServiceBus: Offers built-in outbox support, guaranteeing atomicity across database and message bus interactions.
- Wolverine: A modern .NET messaging framework that has first-class outbox capabilities, including durable retries and deduplication.
Choosing a library depends on your broader architectural stack. If you’re already using MassTransit or NServiceBus, leveraging their outbox support avoids reinventing wheels.
8 Closing the Loop: The Idempotent Consumer and the Inbox Pattern
With the Outbox pattern, we’ve ensured that messages leave our service reliably. But the story doesn’t end there. Message brokers generally provide at-least-once delivery—meaning a consumer may receive the same message more than once. To make our systems fully reliable, consumers must also be idempotent. This is where the Inbox pattern comes in.
8.1 The Problem on the Other Side
Imagine our relay successfully publishes an OrderCreatedEvent to RabbitMQ. It then crashes before updating ProcessedAt in the database. On restart, it republishes the same event. From the consumer’s perspective, the event arrives twice.
If the consumer naively processes the event each time—say, decrementing inventory stock—it will apply the business effect twice, corrupting data. Even if the producer (Outbox) is reliable, the consumer must protect itself against duplicates.
8.2 The Inbox Pattern
The Inbox pattern ensures that consumers deduplicate messages before applying business logic. The idea is simple: record each message ID when processing, and ignore duplicates.
8.2.1 Schema
CREATE TABLE InboxMessages (
MessageId UNIQUEIDENTIFIER PRIMARY KEY,
ProcessedAt DATETIME2 NOT NULL
);
Each message has a unique ID (provided by the producer). Once processed, it’s inserted into the InboxMessages table. Future deliveries of the same message find it already recorded and are discarded.
8.2.2 Processing Flow
The standard consumer flow looks like this:
-
Receive message from the queue.
-
Begin a database transaction.
-
Check if
MessageIdexists inInboxMessages.- If yes: commit and discard message.
- If no: insert
MessageId, execute business logic, then commit.
-
Commit transaction.
This ensures atomicity: the message is either fully processed and recorded, or rolled back without partial effects.
8.2.3 Example Consumer Logic
public async Task Handle(OrderCreatedEvent evt)
{
using var transaction = await _dbContext.Database.BeginTransactionAsync();
var alreadyProcessed = await _dbContext.InboxMessages
.AnyAsync(m => m.MessageId == evt.Id);
if (alreadyProcessed)
{
await transaction.CommitAsync();
return; // Duplicate - ignore
}
// Insert into Inbox
_dbContext.InboxMessages.Add(new InboxMessage
{
MessageId = evt.Id,
ProcessedAt = DateTime.UtcNow
});
// Apply business logic (e.g., reserve inventory)
var stock = await _dbContext.Stocks.FirstAsync(s => s.ProductId == evt.ProductId);
stock.Quantity -= evt.Quantity;
await _dbContext.SaveChangesAsync();
await transaction.CommitAsync();
}
8.2.4 Why Transactions Matter
Without wrapping the inbox insert and business logic in a single transaction, you risk race conditions. Suppose you insert the inbox record but crash before updating stock—future retries will think the message is processed, leaving stock unchanged. By committing both together, you guarantee correctness.
8.2.5 Combining Outbox and Inbox
When combined:
- Outbox ensures producers never lose an event.
- Inbox ensures consumers never apply a duplicate.
- Together, they deliver end-to-end exactly-once semantics, even though the underlying broker only guarantees at-least-once delivery.
9 Production Readiness: Telemetry, Testing, and Advanced Concerns
Designing an idempotent API and implementing Outbox/Inbox patterns are only the beginning. In production, correctness is not enough—you also need visibility, maintainability, and safeguards against abuse. This section explores how to prepare your system for real-world operations: adding observability, implementing garbage collection, testing resilience, and addressing security concerns. Each step helps ensure that your design remains robust not just in theory but under live traffic and failure conditions.
9.1 Observability: Don’t Fly Blind
A system that silently “just works” in development often becomes a nightmare in production when things inevitably go wrong. Idempotency adds complexity—stateful persistence, background workers, deduplication logic—and without proper observability, diagnosing problems will be guesswork. Three pillars matter most: logging, metrics, and alerting.
9.1.1 Structured Logging
Logging should capture meaningful events, not just raw exceptions. Use structured logs (JSON or key-value fields) so you can query them in systems like Elasticsearch, Seq, or Azure Monitor. At minimum, log these events:
IdempotencyKeyReceived: log the key and endpoint whenever an incoming request includes it.CachedResponseReturned: include the key and response status code when serving a cached result.NewRequestProcessing: when a request transitions from “InProgress” to “Completed.”OutboxMessageCreated: record event type and aggregate ID whenever an outbox entry is written.MessageRelayed: record broker confirmation, message ID, and destination queue/topic.
Example with Serilog in C#:
_logger.LogInformation("CachedResponseReturned {Key} {StatusCode}", key, record.StatusCode);
_logger.LogInformation("OutboxMessageCreated {EventType} {AggregateId}", message.EventType, order.Id);
9.1.2 Metrics
Logs help with forensics; metrics help with real-time visibility. OpenTelemetry and System.Diagnostics.Metrics provide modern, vendor-neutral ways to collect metrics.
Define counters and gauges such as:
var meter = new Meter("MyApp.Idempotency", "1.0");
var cacheHits = meter.CreateCounter<long>("idempotency.cache.hits");
var cacheMisses = meter.CreateCounter<long>("idempotency.cache.misses");
var outboxProcessed = meter.CreateCounter<long>("outbox.messages.processed");
var outboxQueueDepth = meter.CreateObservableGauge("outbox.queue.depth",
() => new Measurement<long>(_dbContext.OutboxMessages.Count(m => m.ProcessedAt == null)));
These metrics allow dashboards to show cache efficiency, relay throughput, and whether the outbox backlog is growing abnormally.
9.1.3 Alerting
Metrics without alerts are like a smoke alarm with no siren. Define thresholds that indicate trouble:
- Outbox queue depth exceeds N messages for M minutes → possible broker outage.
- Idempotency cache miss rate unexpectedly spikes → potential client misbehavior or bug.
- Consumer lag grows steadily → consumer instance might be stuck.
Alerts should notify humans quickly but avoid false positives. Consider escalation paths: first Slack/Teams, then pager duty for sustained issues.
9.2 Garbage Collection
Persistence comes with a cost: storage grows over time. Idempotency keys, outbox messages, and inbox records cannot live forever. Left unchecked, they will bloat databases and slow queries. You need a garbage collection (GC) strategy.
9.2.1 Idempotency Keys
Most APIs only need to remember responses for 24–72 hours—long enough for client retries but not indefinite. Implement cleanup jobs:
-
SQL Server:
DELETE FROM IdempotencyKeys WHERE CreatedAt < DATEADD(HOUR, -72, SYSUTCDATETIME()); -
Cosmos DB: use
ttlfield for automatic expiration. -
Redis: set expiry when inserting (
EXparameter withSET).
9.2.2 Outbox Messages
Processed messages can also be trimmed after a grace period (e.g., 7 days). This ensures auditability while controlling size. For critical financial systems, you may archive them into long-term storage instead of outright deletion.
9.2.3 Inbox Messages
Inbox tables can grow fastest in high-throughput consumers. Since they only exist to prevent duplicates, most teams safely delete entries older than 7–30 days. Again, archive if regulatory requirements demand full retention.
9.2.4 Background Job Implementation
In ASP.NET Core, implement GC as a scheduled background worker:
public class CleanupService : BackgroundService
{
private readonly IServiceProvider _services;
public CleanupService(IServiceProvider services) => _services = services;
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
using var scope = _services.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
await db.Database.ExecuteSqlRawAsync(
"DELETE FROM IdempotencyKeys WHERE CreatedAt < DATEADD(HOUR, -72, SYSUTCDATETIME())",
cancellationToken: stoppingToken);
await Task.Delay(TimeSpan.FromHours(6), stoppingToken);
}
}
}
This strikes a balance between correctness and database health.
9.3 Testing Your Idempotent System
Idempotency cannot be an afterthought in testing. Traditional unit tests are not enough—you must simulate failure scenarios and retries.
9.3.1 Unit Tests
Write unit tests for:
- Idempotency filter: given the same key twice, verify the second returns cached response.
- Outbox service: saving entity and outbox message together.
- Inbox service: rejecting duplicate message IDs.
Example in xUnit:
[Fact]
public async Task SameKey_ShouldReturnCachedResponse()
{
var service = new InMemoryIdempotencyService();
await service.CreateRequestAsync("key1", "{}");
await service.UpdateRequestAsync("key1", "{\"ok\":true}", 200);
var record = await service.GetResponseAsync("key1");
Assert.Equal("{\"ok\":true}", record!.ResponseJson);
Assert.Equal(200, record.StatusCode);
}
9.3.2 Integration Tests
The most valuable tests simulate real-world failures:
-
Duplicate Request Test Send the same POST request twice with the same
Idempotency-Key. Assert the second is faster and identical in status and body. -
Worker Crash Test
- Insert an outbox message.
- Run relay once to publish but crash before marking processed.
- Restart relay.
- Verify consumer processes message only once (thanks to Inbox).
-
Broker Outage Simulation Block access to the broker, generate orders, then restore. Confirm relay resumes and no orders are lost.
Integration tests require test containers (e.g., Testcontainers for .NET) to spin up dependencies like SQL Server, RabbitMQ, or Redis in CI/CD pipelines.
9.3.3 Load and Stress Testing
Idempotency logic introduces extra writes and locks. Run load tests with tools like k6 or Locust to verify system behavior under sustained retries and bursts. Monitor outbox queue depth and database contention.
9.4 Security Considerations for Idempotency Keys
Security is often overlooked when discussing idempotency. Keys may seem like harmless metadata, but poor handling can open doors to abuse.
9.4.1 Unpredictability
Keys must not be guessable. If clients use sequential integers, attackers could replay or hijack operations by guessing keys. Always require UUIDs or cryptographically secure random strings.
9.4.2 Replay Attacks
If keys are long-lived, an attacker could reuse them maliciously. Short TTLs reduce risk. For highly sensitive APIs, tie keys to client authentication context—e.g., {clientId}:{guid}.
9.4.3 Abuse via Flooding
A malicious client could generate millions of unique keys to exhaust storage. Mitigation strategies:
- Rate limit requests per client.
- Enforce maximum outstanding keys per client.
- Reject overly long or malformed headers.
9.4.4 Logging Hygiene
Be cautious when logging keys. If keys contain embedded information (in cases where they are not opaque), logs may leak sensitive data. Treat keys as opaque identifiers and redact if unsure.
10 Conclusion: The Trade-Off Between Complexity and Correctness
We’ve traveled from a simple retry problem to a fully-fledged architecture for exactly-once semantics. Along the way, we layered multiple defenses, each addressing a different failure mode in distributed systems.
10.1 Summary of Patterns
- Idempotency Key: Ensures safe retries at the API edge by caching responses and preventing duplicate effects.
- Outbox Pattern: Eliminates the dual-write problem by atomically persisting events alongside business data.
- Inbox Pattern: Protects consumers against duplicate deliveries from message brokers.
Together, they simulate exactly-once processing in inherently at-least-once systems.
10.2 The Core Trade-Off
None of these patterns are free. They introduce:
- Extra database writes (inbox/outbox tables).
- More complex schemas and background services.
- Cleanup jobs and operational monitoring.
But for mission-critical domains like payments, logistics, or inventory, the cost of inconsistency is far greater. Customers will forgive latency but not incorrect balances or missing orders.
10.3 When Not to Use These Patterns
Not every endpoint deserves this level of protection:
- Pure read operations (
GET /products) are naturally idempotent. - Low-impact writes (e.g., updating a user preference) may not justify outbox complexity.
- Internal telemetry or analytics pipelines can tolerate at-most-once semantics.
Apply these patterns selectively, guided by business risk.
10.4 Final Thoughts
Designing for idempotency is about trust. Users trust that pressing “Buy” won’t double-charge them, that stock levels are accurate, and that distributed services agree on reality. Idempotency keys, Outbox, and Inbox are proven patterns to uphold that trust. Start simple—enforce idempotency keys at the edge—and layer Outbox/Inbox as your system scales and reliability demands grow. Complexity is the price of correctness, but it’s a price worth paying when correctness is the foundation of your business.