Skip to content
Hotel Booking Systems That Scale: Inventory Management, Double-Booking Prevention, and Dynamic Pricing with .NET

Hotel Booking Systems That Scale: Inventory Management, Double-Booking Prevention, and Dynamic Pricing with .NET

1 Architectural Blueprint: Designing for 50k Daily Bookings

A hotel booking system at scale is not “just a database with some CRUD screens.” Once you support tens of thousands of bookings a day, the system behaves more like a distributed control system: it has to coordinate inventory, pricing, and reservations across your own website, mobile app, and multiple OTAs—without ever letting two guests book the same room type for the same night.

To do that reliably, you need three things: very strict consistency for inventory, predictable booking workflows, and fast search. This section lays out the architectural foundations for that kind of system in .NET, using clear domain boundaries, asynchronous messaging, and contracts that are easy to reason about and evolve.

1.1 Domain Analysis and Bounded Contexts

Before picking technologies, you need to be clear about what the system is doing. Bounded contexts help you split the hotel domain into parts that can be built, scaled, and deployed independently. In a hotel booking platform, four domains almost always emerge: Inventory, Reservation, Pricing, and Channel Management. Each has different performance, data, and consistency requirements.

1.1.1 Identifying Core Domains

A scalable architecture starts with an explicit domain map. Here’s how these core domains typically break down and why they should not live in the same codebase and database.

Inventory (Availability Management)

Inventory answers a simple but critical question: “How many units of this room type are still available on this date?” It must:

  • Reflect up-to-date availability across all channels.
  • Handle increments and decrements from bookings, cancellations, and no-shows.
  • Publish events that other domains (pricing, channel manager, reporting) can react to.

This is a high-consistency domain. Even a brief inconsistency can cause overbooking, compensation costs, and unhappy guests. That’s why this domain gets the strongest protection in terms of locking, concurrency control, and validation.

Reservation (Booking Lifecycle)

Reservation owns the booking itself: who is coming, when, and under what conditions. It is responsible for:

  • Creating bookings and tracking their lifecycle.
  • Storing guest information and preferences.
  • Managing payment status (authorized, captured, failed, refunded).
  • Handling confirmation, modification, and cancellation flows.

This is a workflow-driven domain. It often uses Sagas to coordinate across external systems like payment gateways, email/SMS providers, and the inventory service. The main concern is doing each step in the right order and making sure failures are handled cleanly.

Pricing (Rate Management)

Pricing decides “what should we charge right now for this room type and date?” It covers:

  • Base rates per room type and season.
  • Dynamic adjustments based on occupancy, events, or competitor prices.
  • Promotions and discounts.

This is a high-read domain. Pricing data and calculations are read constantly (search results, quotes, OTA requests) but don’t need the same level of transactional consistency as inventory. Event-driven updates, caching, and read-optimized stores fit well here.

Channel Management (OTA Connectivity)

Channel Management keeps you in sync with platforms such as Booking.com, Expedia, Airbnb, and MakeMyTrip. It:

  • Pushes rates and availability out to OTAs.
  • Receives bookings and modifications coming back from OTAs.
  • Normalizes different OTA payloads into a common internal format.
  • Applies rate limits, retries, and batching to stay within OTA constraints.

This is a latency-sensitive, integration-heavy domain. Network failures and API quirks are the norm, so the design must include retry policies, circuit breakers, buffering, and reconciliation logic.

These domains should not share databases. Instead, they should expose clear APIs and communicate with each other using events and commands. That separation is what lets you scale and evolve each part at its own pace.

1.1.2 Defining the “Room” vs. “Room Type” Abstraction

One of the easiest ways to accidentally overcomplicate your model is to treat rooms and room types as the same thing. In the real world, guests book room types, not specific room numbers, and you assign the actual room later (often at check-in).

Room Type

Examples:

  • Deluxe King
  • Superior Twin
  • Suite with Balcony

Room types represent bookable units in your system:

  • A hotel might have 20 Deluxe King rooms.
  • You track availability per room type per date.
  • Pricing usually applies at the room-type level.

Room types are what you show in search results and send to OTAs. They are also what your pricing engine reasons about.

Room

A room is a specific physical unit, such as Room 501. It has:

  • A physical location (floor, building, wing).
  • A maintenance status (available, out of service).
  • A housekeeping status (clean, dirty, in progress).

Rooms are mainly important for operations: housekeeping, maintenance, and honoring guest preferences like “high floor” or “near elevator.”

Why you book a type but occupy a room

Keeping booking logic at the room-type level gives the hotel flexibility:

  • You can optimize occupancy by assigning physical rooms later based on arrivals, departures, and preferences.
  • You can manage overbooking at the room-type level, because you’re not pre-committing to specific rooms.
  • You can run pricing at a higher level of abstraction, without worrying about individual room numbers.

A scalable design stores confirmed reservations as room-type bookings and performs room assignment as a separate step closer to check-in, often handled by a different system or UI.

1.1.3 The Shared Kernel: Value Objects

Some concepts—like dates, currency, and guest details—show up in most domains. If each service models them differently, integrations become messy and error-prone. A small shared kernel of well-designed value objects helps keep things consistent without creating a big shared library that everything depends on.

Currency
public readonly record struct Currency(string Code)
{
    public static readonly Currency INR = new("INR");
    public static readonly Currency USD = new("USD");
}
DateRange
public record DateRange(DateOnly From, DateOnly To)
{
    public bool Overlaps(DateRange other) =>
        From < other.To && other.From < To;

    public int Nights => To.DayNumber - From.DayNumber;
}
GuestDetails
public record GuestDetails(string FirstName, string LastName, string Email);

These types are immutable and self-contained. That makes them safe to share between services, easy to test, and safe to use in concurrent code.

1.2 High-Level Architecture Design

With the domain boundaries in place, the technical architecture becomes easier to reason about. The system is composed of .NET 8/9 microservices, each focused on a specific domain, connected through messaging, and supported by the right data stores and caches (e.g., SQL, Redis, Elasticsearch, EventStoreDB).

Each service should be independent in deployment and scaling, but coordinated through events and APIs.

1.2.1 Microservices Decomposition Using .NET 8/9

A practical decomposition for a hotel booking platform looks like this.

Inventory Service
  • Maintains availability calendars per room type and date.
  • Uses SQL with optimistic concurrency and/or Redis for fast updates.
  • Publishes InventoryChanged events when counts change.
Reservation Service
  • Owns the booking lifecycle (requested → held → paid → confirmed/cancelled).
  • Runs Sagas (e.g., with MassTransit) and persists state with EF Core.
  • Calls payment gateways and coordinates with the inventory service.
  • Subscribes to inventory and channel events when needed.
Pricing Service
  • Applies pricing strategies and rules to compute rates.
  • Exposes a GetRateAsync API (often gRPC) for internal callers.
  • Offers an admin UI or API for revenue managers to manage rate plans and rules.
Channel Management Service
  • Sends availability and rates to OTAs.
  • Receives OTA bookings and maps them into internal booking requests.
  • Uses message queues to handle retries and decouple from OTA latency.
Search Service
  • Stores denormalized hotel and room-type data in Elasticsearch/OpenSearch.
  • Serves high-throughput search queries without touching transactional databases.

Each service should have:

  • Its own database (SQL or specialized store).
  • Clear contracts (HTTP/gRPC APIs and message schemas).
  • Messaging endpoints on RabbitMQ or Azure Service Bus for events and commands.

1.2.2 Communication Patterns

The system needs both request/response calls for immediate answers and asynchronous messaging for workflows and updates. Picking the right pattern per use case keeps the design maintainable.

Synchronous (gRPC)

Use gRPC when you need a fast, direct answer:

  • Getting current pricing for a room type and date range.
  • Fetching room-type configuration or hotel metadata.
  • Validating that a booking request matches current policies.

gRPC is a good fit because it:

  • Uses a compact binary protocol with low latency.
  • Enforces strong typing and contract-first design with .proto files.
  • Supports streaming if you later need real-time feeds.

Example contract:

service Pricing {
  rpc GetRate (GetRateRequest) returns (GetRateResponse);
}
Asynchronous Messaging (RabbitMQ/Azure Service Bus)

Use messaging when operations are longer-running, involve multiple services, or must be resilient to transient failures:

  • Creating reservations and coordinating the booking Saga.
  • Propagating inventory changes to pricing or channel management.
  • Handling payment callbacks or OTA reservation ingestion.

Benefits include:

  • Loose coupling between services.
  • Automatic retries and dead-letter queues.
  • Natural support for event-driven workflows and horizontal scaling.

Example message:

{
  "BookingId": "BKG-12345",
  "RoomType": "DeluxeKing",
  "DateRange": { "From": "2025-03-01", "To": "2025-03-05" },
  "Source": "DirectWeb"
}

The key is to be intentional: use synchronous calls for queries and small commands where the caller needs an immediate answer, and use messages for workflows and updates that can be processed asynchronously.

1.2.3 Role of .NET Aspire

Once you have several services plus Redis, RabbitMQ, and other infrastructure, local development and deployment can get messy. .NET Aspire helps by treating your distributed app as one logical unit and handling orchestration for you.

Distributed Application Orchestration

With Aspire, you can:

  • Start multiple projects (Reservation, Inventory, Pricing, Channel, Search) and infrastructure containers (Redis, RabbitMQ) with a single dotnet run.
  • Share a common configuration for connection strings, secrets, and environment variables.
  • View logs and traces across services in one place while debugging.

This reduces the “works on my machine” problem and makes it much easier for new developers to run the full system locally.

Cloud App Model

Aspire uses a manifest-style configuration to describe the distributed app. For example:

<distributedApplication>
  <container name="redis" image="redis:7" port="6379" />
  <container name="rabbitMq" image="rabbitmq:3-management" />
  <project name="InventoryService" path="./InventoryService" />
  <project name="ReservationService" path="./ReservationService" />
  <project name="PricingService" path="./PricingService" />
</distributedApplication>

This manifest describes what needs to run together, not how each piece is implemented internally. In cloud environments, the same model can be mapped to containers, secrets, and networking in Kubernetes or Azure. The result is that every engineer and every environment (dev, QA, staging) runs a consistent version of the full system, which is exactly what you want in a complex .NET-based hotel booking platform.


2 Solving the “Two Guests, One Room” Problem: Advanced Concurrency

When your system handles thousands of simultaneous searches and bookings—especially from OTAs—you quickly run into the classic “two guests, one room” problem. It happens when multiple users see the same availability and try to book it at nearly the same time. In a monolithic system with a single database and simple row locks, this is manageable. In a distributed .NET microservices architecture, with Inventory, Reservation, Pricing, and Channel services all working independently, it becomes much more complex.

The goal is simple: never let two bookings claim the same room type for the same night. But achieving this across services, caches, and message buses requires deliberate concurrency control.

2.1 The Race Condition Reality

2.1.1 The Gap Between “Check Availability” and “Confirm Booking”

A typical real-world flow looks like this:

  1. A guest searches for a Deluxe King room.
  2. The Search service returns “1 room left” based on inventory at that moment.
  3. The guest clicks “Book now.”
  4. At the same time, another guest (or an OTA) also tries to reserve that last room type.
  5. Both booking requests reach the Reservation service within milliseconds of each other.

The search result was technically correct at the time, but search data is always slightly behind real-time. What matters is what the Reservation service sees during booking confirmation. To avoid double booking, the system must treat availability checks and actual booking as two different operations with stronger guarantees during booking.

Rather than locking everything during search (which kills performance), we make the booking step atomic at the room-type/date-range level.

2.1.2 Why ACID Transactions Alone Can’t Protect You

It’s tempting to think: “We’ll just wrap booking operations in a SQL transaction.” But in a microservices environment, ACID guarantees stop at the database boundary.

Consider the architecture:

  • Inventory runs its own database (SQL or PostgreSQL).
  • Reservation runs separately and communicates via messaging.
  • Channel Management can push OTA bookings at any time.
  • All pieces are deployed independently.

Problems with relying only on database locking:

  • Locks can’t span multiple microservices or data stores.
  • Network delays might hold a lock longer than expected.
  • Multiple OTAs might send reservations in bursts, creating contention.
  • Long transactions reduce throughput and increase deadlocks.

In a distributed system, you must assume multiple processes will try to reserve the same inventory at the same time. That’s why we combine distributed locking, optimistic concurrency, and reservation holds.

2.2 Implementing Distributed Locking with Redis Redlock

To coordinate booking attempts across services, we use a distributed mutex. Redis Redlock is a widely used algorithm for short-lived locks that help ensure that only one booking attempt at a time can operate on the same room-type/date-range.

You don’t hold the lock for the whole booking workflow—just long enough to validate availability and create a temporary hold.

2.2.1 Implementing Redlock in .NET Using StackExchange.Redis

Acquiring the lock is simple:

public async Task<(bool acquired, string lockId)> TryAcquireLockAsync(string resource, TimeSpan ttl)
{
    var lockId = Guid.NewGuid().ToString();

    bool success = await _redis.StringSetAsync(
        key: resource,
        value: lockId,
        expiry: ttl,
        when: When.NotExists
    );

    return (success, lockId);
}

Releasing the lock uses a Lua script to ensure only the owner can release it:

public async Task ReleaseLockAsync(string resource, string lockId)
{
    var script = @"
        if redis.call('get', KEYS[1]) == ARGV[1]
        then 
            return redis.call('del', KEYS[1])
        else 
            return 0
        end";

    await _redis.ScriptEvaluateAsync(script,
        new RedisKey[] { resource },
        new RedisValue[] { lockId });
}

This ensures:

  • A service never releases someone else’s lock.
  • The lock expires automatically if the service crashes.
  • You can scale Reservation and Inventory services horizontally without issues.

2.2.2 Defining Lock Keys: RoomType + DateRange

The lock key should represent the smallest atomic unit of availability competition. In our domain, that is:

lock:inventory:{RoomType}:{FromDate}:{ToDate}

Example:

lock:inventory:DeluxeKing:2025-03-01:2025-03-05

This ensures:

  • Only one booking attempt for a room type/date-range runs at a time.
  • Channel Manager updates don’t conflict with Reservation updates.
  • The same key format is used across all services, avoiding fragmentation.

2.2.3 Choosing a Safe TTL

The lock should be held only long enough to:

  1. Validate inventory.
  2. Create a hold record.

This should be fast (ideally < 1 second). But you also want a buffer for network jitter.

Practical guidelines:

  • TTL: 3 seconds for most cases.
  • Retry: Up to 3 attempts with exponential backoff.
  • OTA requests: Often use tighter windows (1–2 seconds) due to high volume.

The TTL ensures that, even if a service crashes mid-operation, the lock is eventually released and availability isn’t stuck.

2.3 Fail-Safe Inventory Decrement Strategies

A distributed lock helps prevent collisions, but real-world systems need additional layers. Network failures, partial outages, or slow external APIs can still cause edge cases. To make the system resilient, we add fail-safe mechanisms that guarantee correct inventory counts even if the lock is lost or never acquired.

2.3.1 Optimistic Concurrency Control (OCC)

The Inventory database enforces its own safety net using row versions. Even if two processes bypass the distributed lock by mistake, the database will block the second update.

Table schema:

CREATE TABLE Inventory (
    RoomType NVARCHAR(50),
    Date DATE,
    Available INT,
    RowVersion ROWVERSION,
    PRIMARY KEY(RoomType, Date)
);

Update query with version check:

var sql = @"
UPDATE Inventory
SET Available = Available - @qty
WHERE RoomType = @roomType
  AND Date = @date
  AND Available >= @qty
  AND RowVersion = @expectedVersion;
";

If affectedRows == 0, then:

  • The inventory changed during the operation or
  • Availability is not enough anymore

Either way, the system safely rejects the booking.

OCC turns accidental race conditions into clean, predictable failures instead of silent double decrements.

2.3.2 The Reservation Hold Pattern

During the booking flow, especially when payment is involved, inventory shouldn’t be permanently committed until payment succeeds. That’s why we use temporary holds.

Flow:

  1. Acquire Redlock.
  2. Check inventory.
  3. Decrement inventory and create a hold record.
  4. Release Redlock.
  5. Continue the booking Saga (payment, confirmation, notifications).
  6. If payment fails or times out, release the hold and restore inventory.

Hold table example:

CREATE TABLE ReservationHold (
    HoldId UNIQUEIDENTIFIER,
    RoomType NVARCHAR(50),
    DateRangeFrom DATE,
    DateRangeTo DATE,
    ExpiresAt DATETIME,
    Status NVARCHAR(20) -- Active, Confirmed, Expired
);

Holds allow:

  • Safe payment authorization windows (5–15 minutes).
  • Avoiding stale availability checks later in the flow.
  • Consistency across retries, network delays, and OTAs.

Think of it as a “soft booking” that must either become a real booking or be automatically released.

2.3.3 Handling Lock Contentions with Polly Retry Policies

High-demand periods—New Year’s Eve, long weekends, festivals—cause many users to compete for the same room type. When a lock cannot be acquired immediately, it’s not an error; it’s a natural sign of demand.

Polly provides a clean way to retry operations with backoff:

var retryPolicy = Policy
    .Handle<LockNotAcquiredException>()
    .WaitAndRetryAsync(
        retryCount: 3,
        sleepDurationProvider: attempt => TimeSpan.FromMilliseconds(200 * attempt)
    );

await retryPolicy.ExecuteAsync(() => TryProcessBookingAsync());

Benefits:

  • Smooth handling of high contention
  • Reduced user-visible errors
  • Predictable retry behavior without overwhelming Redis

3 The Transactional Heart: Implementing Distributed Sagas with MassTransit

In a hotel booking platform, a reservation isn’t a single step—it’s a chain of dependent actions. You validate the request, create a temporary inventory hold, initiate payment, confirm the reservation, notify the guest, and sync updates with OTAs. Each step depends on the previous one completing successfully. And because these steps involve different microservices, failures are inevitable. A scalable system must handle those failures without leaving inventory stuck or creating incomplete bookings.

Distributed Sagas provide exactly this: a predictable, observable workflow that coordinates multiple services and ensures that either the entire booking completes successfully—or the system compensates cleanly.

3.1 Orchestration over Choreography

Earlier sections explained why inventory and bookings must remain strongly consistent across services. When workflows span multiple domains—Inventory, Payment, Reservation, Channel Management—simply broadcasting events and letting each service “figure things out” is risky. Booking workflows require strict sequencing, reliable compensation, and clear visibility into what happened.

3.1.1 Why a Central Orchestrator Is Safer for Financial Transactions

Payments and inventory changes are not independent events. When a guest books the last Deluxe King room type:

  • The inventory hold must be created first.
  • Only then can payment authorization start.
  • If payment succeeds, the inventory hold becomes a confirmed booking.
  • If payment fails, the hold must be released.

Choreography makes these dependencies fragile. Services might react to events at different speeds, retry incorrectly, or miss events entirely during network disruptions. You could end up with:

  • Inventory stuck on “reserved”
  • Payments captured without a reservation
  • Two services making conflicting decisions

A Saga orchestrator avoids this by controlling the sequence:

  1. Send a command to the Inventory service.
  2. Wait for a success or failure response.
  3. Move to the next step only when the previous one completes.

The flow is deterministic, making failures easier to diagnose and fix.

3.1.2 When Orchestration Simplifies the System

Using an orchestrator reduces complexity when:

  • Every step depends on the success of the previous step.
  • Failures require compensating actions, not just additional events.
  • External systems like payment gateways or OTAs participate in the workflow.
  • The business requires full traceability for audits or disputes.

Hotel reservations fit all of these points. A Saga ensures that inventory is always returned if payment fails, and no downstream service needs to guess what state the booking is in. This is cleaner and safer than embedding business rules across services.

3.2 Building the Booking State Machine

A booking lifecycle naturally maps to a state machine: each step is a transition, and each transition represents forward progress (or recovery) in the workflow. MassTransit provides a straightforward way to define this state machine and handle messages that drive it.

3.2.1 Defining the Booking States

For hotel reservations, a practical set of states is:

  • Initial — Request received, nothing processed yet.
  • InventoryReserved — A temporary inventory hold exists. The booking is not confirmed yet.
  • PaymentPending — Payment has started; the system is waiting for the gateway response.
  • Confirmed — Payment succeeded, reservation finalized, notifications trigger.
  • Failed — Any failure leads here. Compensations are triggered to clean up holds or pending actions.

These states mirror the real operational flow hotels follow. Timeouts and failures transition the Saga into Failed, ensuring the inventory is never lost.

3.2.2 Implementing the Saga Using MassTransit and RabbitMQ

Below is a simplified example showing how a Saga flows through inventory and payment stages using commands and events. This mirrors the “reservation hold” logic described in Section 2.

public class BookingState : SagaStateMachineInstance
{
    public Guid CorrelationId { get; set; }
    public string? RoomType { get; set; }
    public DateRangeDto? DateRange { get; set; }
    public string CurrentState { get; set; } = default!;
    public Guid? PaymentId { get; set; }
    public DateTime? ReservationExpiresAt { get; set; }
}

State machine:

public class BookingStateMachine : MassTransitStateMachine<BookingState>
{
    public State InventoryReserved { get; private set; }
    public State PaymentPending { get; private set; }
    public State Confirmed { get; private set; }
    public State Failed { get; private set; }

    public Event<BookingRequested> BookingRequested { get; private set; }
    public Event<InventoryReservedEvent> InventoryReservedEvent { get; private set; }
    public Event<PaymentCompletedEvent> PaymentCompletedEvent { get; private set; }
    public Event<PaymentFailedEvent> PaymentFailedEvent { get; private set; }

    public BookingStateMachine()
    {
        InstanceState(x => x.CurrentState);

        Initially(
            When(BookingRequested)
                .Then(ctx =>
                {
                    ctx.Instance.RoomType = ctx.Data.RoomType;
                    ctx.Instance.DateRange = ctx.Data.DateRange;
                })
                .Send(new Uri("queue:inventory-reservation"), ctx => new ReserveInventoryCommand
                {
                    BookingId = ctx.Instance.CorrelationId,
                    RoomType = ctx.Instance.RoomType,
                    DateRange = ctx.Instance.DateRange
                })
                .TransitionTo(InventoryReserved)
        );

        During(InventoryReserved,
            When(InventoryReservedEvent)
                .Then(ctx =>
                {
                    ctx.Instance.ReservationExpiresAt = DateTime.UtcNow.AddMinutes(15);
                })
                .Send(new Uri("queue:payment"), ctx => new InitiatePaymentCommand
                {
                    BookingId = ctx.Instance.CorrelationId
                })
                .TransitionTo(PaymentPending)
        );

        During(PaymentPending,
            When(PaymentCompletedEvent)
                .TransitionTo(Confirmed),

            When(PaymentFailedEvent)
                .TransitionTo(Failed)
        );

        During(Failed,
            Ignore(BookingRequested));
    }
}

A few key points that match earlier sections:

  • Inventory holds happen before payment.
  • The Saga sets an expiration time to prevent “zombie” holds.
  • Transitions depend on domain events, not database calls.

This is how you maintain consistency while keeping services independent.

3.2.3 Persisting Saga Data with EF Core

Sagas must persist state so they can continue even if a service restarts. MassTransit provides EF Core persistence out of the box:

services.AddMassTransit(cfg =>
{
    cfg.AddSagaStateMachine<BookingStateMachine, BookingState>()
        .EntityFrameworkRepository(r =>
        {
            r.ConcurrencyMode = ConcurrencyMode.Pessimistic;
            r.AddDbContext<BookingDbContext>((provider, builder) =>
            {
                builder.UseSqlServer(configuration.GetConnectionString("SagaDb"));
            });
        });

    cfg.UsingRabbitMq((context, busCfg) =>
    {
        busCfg.ConfigureEndpoints(context);
    });
});

Using pessimistic concurrency here prevents two messages from modifying the same Saga instance at the same time, which reinforces the distributed concurrency rules defined in Section 2.

3.3 Handling Failures and Compensating Transactions

Failures are normal in distributed environments—payment gateways time out, inventory updates may lag, and network outages happen. What matters is that the booking system recovers gracefully, and inventory is never left in an inconsistent state.

3.3.1 Compensating by Releasing Inventory Holds

When the Saga transitions to Failed, it must release the temporary hold created earlier. This matches the “reservation hold” concept established in Section 2.

public class ReleaseInventoryCommand
{
    public Guid BookingId { get; set; }
    public string RoomType { get; set; } = default!;
    public DateRangeDto DateRange { get; set; } = default!;
}

Triggering compensation:

During(Failed,
    ThenAsync(ctx => ctx.Publish(new ReleaseInventoryCommand
    {
        BookingId = ctx.Instance.CorrelationId,
        RoomType = ctx.Instance.RoomType!,
        DateRange = ctx.Instance.DateRange!
    })));

This ensures:

  • Expired or failed bookings don’t block inventory.
  • The Inventory service remains the single source of truth.

3.3.2 Avoiding “Zombie Bookings” Through Cleanup Jobs

A Saga might get stuck if an external service never responds (e.g., the payment gateway is down). To handle this, scheduled jobs periodically scan Saga state:

  • If InventoryReserved is older than 15 minutes → expire the hold.
  • If PaymentPending is older than 5 minutes → cancel payment attempts and release inventory.

Example query:

var expired = dbContext.BookingStates
    .Where(x => x.CurrentState == "PaymentPending" &&
                x.ReservationExpiresAt < DateTime.UtcNow)
    .ToList();

This keeps the system healthy without embedding timeout logic into every microservice.

3.3.3 Ensuring Idempotency for Message Consumers

In distributed systems, messages may be delivered more than once due to retries or network glitches. Consumers must handle this gracefully:

public async Task Consume(ConsumeContext<ReserveInventoryCommand> context)
{
    if (await _processed.ExistsAsync(context.MessageId))
        return;

    await _processed.AddAsync(context.MessageId);
    await _inventoryService.ReserveAsync(context.Message);
}

This prevents:

  • Duplicate inventory holds
  • Duplicate payments
  • Confusing booking states

Idempotency is crucial, especially when OTAs retry requests aggressively during high-traffic periods.


4 Immutable History: Event Sourcing with EventStoreDB

Once a booking system reaches meaningful scale—tens of thousands of bookings per day, constant pricing adjustments, OTA traffic, and staff operations—teams eventually ask the same questions:

  • “Who changed the price for Deluxe King last night?”
  • “Why did this booking get cancelled?”
  • “What was the occupancy level when this rate was calculated?”
  • “When did this guest request an upgrade?”

A traditional CRUD system only stores the latest state. That means the past disappears unless someone manually logs changes. Event sourcing solves this by storing a full timeline of everything that happened in the system. Instead of replacing data, the application appends new events. The current state is simply the sum of all past changes.

For a hotel domain—where disputes, audits, reconciliations, and OTA integrations are routine—event sourcing provides clarity that is otherwise hard to achieve.

4.1 Why Event Sourcing for Hotels?

4.1.1 The Need for Auditability

Inventory and reservations are sensitive data. Small mistakes can lead to revenue loss, angry guests, OTA penalties, or compliance issues. Event sourcing gives you a complete picture of what happened:

  • When a room-type price was updated and who updated it
  • The exact sequence of steps in a booking lifecycle
  • Whether a cancellation came from a guest, hotel staff, or an OTA
  • How often holds or overbooking strategies activated

Because events are immutable, they become a reliable audit trail. This is especially useful when dealing with disputes or cross-checking records with OTAs, where precise timelines matter.

4.1.2 When Event Sourcing Improves Operational Clarity

Event sourcing shines when:

  • Aggregates live for a long time (bookings can last months).
  • Data evolves frequently (date modifications, payment retries, price adjustments).
  • You need to replay scenarios, such as overbooking patterns or demand surges.
  • Regulators or OTAs require transparency about how availability and pricing were computed.

In hotel operations, it’s common to answer questions like, “What was the sequence of events leading to this overbooking?” With event sourcing, you don’t reconstruct history—you read it.

4.2 Implementing EventStoreDB in .NET

EventStoreDB is purpose-built for event sourcing, offering append-only streams, optimistic concurrency, server-side projections, and strong guarantees around consistency. It fits naturally into the microservices architecture described in earlier sections.

4.2.1 Modeling Events

Events should reflect domain actions that already occurred. For example, a room-type booking lifecycle might emit:

public record BookingCreated(
    Guid BookingId,
    string RoomType,
    DateRangeDto DateRange,
    decimal Price);

public record RoomAssigned(Guid BookingId, string RoomNumber);

public record CheckInDateChanged(Guid BookingId, DateOnly NewDate);

public record AmenitiesAdded(Guid BookingId, List<string> Amenities);

Characteristics of good events:

  • Immutable: once written, never changed.
  • Past tense: describes something that definitely happened (not an intent).
  • Small and specific: easier to replay and reason about.

These events become the source of truth for the Reservation domain.

4.2.2 Writing to the Event Stream

When a booking is created, the system appends a BookingCreated event to its stream. EventStoreDB ensures append-only writes with optional optimistic concurrency.

var eventData = new EventData(
    Uuid.NewUuid(),
    "BookingCreated",
    JsonSerializer.SerializeToUtf8Bytes(bookingCreated));

await _eventStore.AppendToStreamAsync(
    streamName,
    StreamState.NoStream,
    new[] { eventData });

If two services attempt to modify the same booking simultaneously, the stream write fails. The application reloads events, rebuilds the model, and retries the operation. This matches the concurrency guarantees used in Inventory (OCC) and ensures consistent, conflict-free event histories.

4.2.3 Snapshotting Long-Lived Aggregates

Some bookings collect many events:

  • Guest modifies dates
  • Hotel adds amenities
  • Payment retries occur
  • Room assignment changes
  • OTA sends amendments

Loading all events every time can slow things down. Snapshotting helps by periodically storing the aggregate’s current state.

public record BookingSnapshot(
    Guid BookingId,
    string RoomType,
    DateRangeDto DateRange,
    string Status,
    int Version);

Snapshot-based loading:

  1. Load the latest snapshot.
  2. Load only events after the snapshot version.
  3. Apply those events to rebuild the final state.

This keeps read operations fast without losing historical accuracy.

4.3 Projections and Read Models (CQRS)

Event sourcing is excellent for writes and domain correctness, but not ideal for queries. Most read scenarios—searching bookings, displaying history, generating staff dashboards—require flattened, query-optimized data. That’s where projections and CQRS come in.

4.3.1 Separating the Write Model from the Read Model

The write side (EventStoreDB) stores append-only event streams. The read side (SQL Server, PostgreSQL, MongoDB, Elasticsearch) stores precomputed projections. These are continuously updated as events occur.

Example projection handler:

public async Task Handle(BookingCreated evt)
{
    var booking = new BookingReadModel
    {
        BookingId = evt.BookingId,
        RoomType = evt.RoomType,
        From = evt.DateRange.From,
        To = evt.DateRange.To,
        Status = "Created"
    };

    await _db.Bookings.AddAsync(booking);
    await _db.SaveChangesAsync();
}

This creates a “query-friendly” version of the booking that powers:

  • Customer “My Bookings” pages
  • Staff dashboards
  • Housekeeping/operations screens
  • Reporting and analytics

4.3.2 Building a “My Bookings” Projection

Guests expect instant access to their booking history. Instead of calculating it on the fly, you maintain a projection that updates as new events are processed:

public async Task Handle(RoomAssigned evt)
{
    var booking = await _db.Bookings.FindAsync(evt.BookingId);
    booking.RoomNumber = evt.RoomNumber;
    await _db.SaveChangesAsync();
}

This creates a clean separation:

  • EventStoreDB = source of truth
  • SQL read models = optimized for queries

This mirrors how the Search Service uses Elasticsearch: read models are shaped for performance, not correctness logic.

4.3.3 Replaying Events to Fix Bugs or Build New Features

One of the biggest advantages of event sourcing is the ability to rebuild projections. If you change business rules or add fields, you don’t need to migrate data—you simply replay historical events.

await _eventStore.SubscribeToAllAsync(Position.Start,
    async (s, e, ct) =>
{
    await _projectionService.ApplyAsync(e);
});

Common use cases:

  • Recompute all nightly rates after changing pricing logic
  • Generate new analytics dashboards
  • Fix projection bugs without touching original data
  • Create new audit reports for OTAs or regulators

This kind of flexibility is extremely valuable in hotel environments where pricing, policies, and operational processes evolve frequently.


5 The Global Sync: Building a Real-Time Channel Manager

A hotel booking system only works if every channel—your website, mobile app, call center, and OTAs like Booking.com or Expedia—sees the same inventory and rates. When even one channel displays outdated availability, the hotel risks overbooking, unhappy guests, and penalties from partners. For a system handling thousands of daily updates, synchronizing availability and pricing across all platforms becomes a continuous, real-time task.

A Channel Manager is responsible for this synchronization. Its job is to ensure that every change made internally (new reservation, booking cancellation, pricing update) is communicated to external OTAs quickly and reliably—without overwhelming their APIs or breaking rate limits.

5.1 The Connectivity Problem

5.1.1 Push vs. Pull Models in OTA Integration

Most OTAs provide two ways to retrieve and update data:

Push model (preferred by OTAs for availability and pricing): Your system sends updates directly to the OTA whenever something changes.

Examples:

  • A room-type inventory decreases from 3 → 2
  • A pricing rule adjusts the Deluxe King rate
  • A cancellation frees up availability

Pull model: The OTA periodically calls your API to fetch current availability, pricing, or restrictions.

Examples:

  • “Give me your availability for the next 365 days.”
  • “Give me updated room-type metadata.”

Modern OTAs rely heavily on push for time-sensitive data like availability because it reduces their polling load and keeps systems tightly synchronized. However, hotels still need to support pull for fallback scenarios and metadata refreshes.

The channel manager must support both patterns, depending on OTA capabilities and agreements.

5.1.2 Navigating Rate Limits and Latency

OTAs strictly throttle API requests. A hotel that sends too many updates too quickly risks temporary suspension. But internal changes don’t slow down—Inventory and Pricing domains may generate thousands of updates during peak hours or promotions.

A scalable system must include:

  • Outbound queues: Store updates temporarily instead of sending them immediately.
  • Rate-limiters: Make sure updates respect OTA quotas.
  • Retry strategies with jitter: Avoid sending retries at the same time, which can create traffic spikes.
  • Async processing: Never block reservation workflows while OTA updates are in progress.

Because OTA APIs sometimes respond slowly, updates must be asynchronous. Your system acknowledges work internally and then processes OTA notifications in the background, preventing delays in reservation flows.

5.2 Change Data Capture (CDC) Implementation

In earlier sections, we discussed the Inventory domain and how it emits events when room-type availability changes. The Channel Manager listens to these changes. But instead of querying the entire database repeatedly, we use Change Data Capture (CDC) to capture only what changed.

5.2.1 Using Debezium or SQL Server CDC

CDC tracks row-level updates, such as:

  • Availability changes from reservations or cancellations
  • Price changes triggered by dynamic pricing rules
  • Rate plan updates made by revenue managers

Debezium streams these changes into Kafka or Redpanda. SQL Server CDC stores them in system tables.

Example Debezium config:

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
    "database.hostname": "sqlserver",
    "database.port": "1433",
    "database.user": "cdcuser",
    "database.password": "password",
    "database.dbname": "HotelInventory",
    "table.include.list": "dbo.Inventory",
    "topic.prefix": "hotel"
  }
}

Each inventory update produces an event in a queue. The Channel Manager consumes these events and decides how and when to notify OTAs.

5.2.2 Decoupling Internal Updates from External Pushes

If the Channel Manager pushed every update immediately, OTAs would quickly throttle or block requests. To avoid this, we introduce a decoupled pipeline:

CDC → Message Queue → Channel Manager → OTA API

This gives the system room to:

  • Batch updates for efficiency
  • Debounce frequent adjustments
  • Apply rate limits automatically
  • Retry without blocking internal operations

This architecture follows the same event-driven principles used in Inventory, Reservation, and Pricing services.

5.2.3 Prioritizing High-Impact Updates

Not all updates deserve equal urgency.

Examples:

  • Availability dropping to zero for a room type → critical update
  • Availability increasing from 10 → 9 → normal update

The Channel Manager can classify updates using simple rules:

int priority = updated.AvailableRooms switch
{
    0 => 1,   // Critical: OTA should reflect sold-out status immediately
    1 => 2,   // High: Close to sold-out
    _ => 3    // Normal: No immediate rush
};

This ensures that meaningful updates reach OTAs quickly while lower-impact changes wait for the next batch.

5.3 Buffering and Throttling Updates

OTAs do not expect one update per internal event. They expect clean, aggregated updates delivered at a reasonable pace. The Channel Manager is the buffer between internal volatility and external stability.

5.3.1 Using Azure Functions or Dedicated Consumers

A practical implementation is a scheduled Azure Function that processes queued updates every few seconds:

[Function("SyncAvailability")]
public async Task Run([TimerTrigger("*/5 * * * * *")] TimerInfo timer)
{
    var batch = await _queue.FetchPendingUpdatesAsync();
    await _otaClient.SendBatchAsync(batch);
}

Why this works well:

  • The Channel Manager isolates OTA communication from internal booking flow.
  • OTA outages don’t block internal services.
  • The system can autoscale during high-demand periods.

This matches the architecture where the Reservation service and Inventory service publish domain events but never directly call OTA APIs.

5.3.2 Batching Updates to Prevent Throttling

OTAs often allow fewer requests if each request contains more data. Sending one aggregated update per room type per date range is better than sending dozens of small requests.

Example batch:

{
  "hotelId": "H123",
  "updates": [
    { "roomType": "DeluxeKing", "date": "2025-03-01", "availability": 3 },
    { "roomType": "DeluxeKing", "date": "2025-03-02", "availability": 2 }
  ]
}

This approach:

  • Minimizes the number of outbound calls
  • Keeps your system within OTA rate-limit rules
  • Produces clean, predictable updates

It also integrates nicely with Redis-backed buffers or message queues used elsewhere in the architecture.

5.3.3 Handling “Out of Sync” Scenarios

Even with a perfect design, OTAs sometimes drift out of sync due to:

  • Their own internal delays
  • High booking volume
  • Parallel updates from multiple partners
  • Network outages

To correct this, the Channel Manager runs reconciliation jobs.

A typical flow:

  1. Pull the OTA’s latest availability snapshot.
  2. Compare to internal inventory.
  3. Identify differences.
  4. Push corrected availability.

Example logic:

if (otaAvailability != internalAvailability)
{
    await _otaClient.CorrectAvailabilityAsync(roomType, date, internalAvailability);
}

Reconciliation ensures that temporary mismatches don’t become long-term inconsistencies. This keeps OTAs aligned with your system, protecting both revenue and guest experience.


6 Algorithmic Revenue: Dynamic Pricing and Overbooking Engines

Once booking, inventory, and channel synchronization workflows are stable, the next logical step is helping the hotel maximize revenue. A scalable booking system should not only prevent double bookings—it should help the hotel sell rooms at the right price based on demand, seasonality, and competition.

Dynamic pricing and controlled overbooking form the “intelligence layer” of the platform. These components run alongside the core services, reading projections (occupancy, competitor rates, historical data) and writing updated rates or availability rules back into the Pricing and Inventory domains. With .NET, you can structure this as a clean set of strategies and services rather than scattering pricing logic across controllers or APIs.

6.1 Building the Pricing Engine

A hotel’s pricing logic changes constantly. Some days occupancy drives prices. On holidays, seasonal rules dominate. During major events, demand spikes and a completely different set of rules apply. A flexible pricing engine lets revenue managers adjust strategy without forcing developers to rewrite code every week.

A good design uses three components:

  • Pricing strategies — Independent pricing rules applied in sequence
  • An optional rules engine — For non-technical teams to define pricing policies
  • Caching — To support the high read volume that pricing typically sees

This keeps the system predictable, testable, and easy to extend.

6.1.1 Strategy Pattern for Pricing Rules

A strategy-based design keeps pricing modular. Each strategy receives a PricingContext and returns a modified rate. You define as many strategies as needed and compose them into a pipeline.

public record PricingContext(
    decimal BaseRate,
    double OccupancyRate,
    DateOnly CheckIn,
    int LengthOfStay,
    string RoomType,
    string Channel);

public interface IPricingStrategy
{
    decimal Apply(PricingContext context, decimal currentRate);
}

Here, each strategy acts on the current rate and returns a revised value.

An occupancy-based rule might increase the rate when room-type availability approaches zero:

public class OccupancyBasedStrategy : IPricingStrategy
{
    private readonly decimal _maxMarkupPercentage;

    public OccupancyBasedStrategy(decimal maxMarkupPercentage = 0.30m)
    {
        _maxMarkupPercentage = maxMarkupPercentage;
    }

    public decimal Apply(PricingContext context, decimal currentRate)
    {
        if (context.OccupancyRate < 0.6) return currentRate;

        var factor = (decimal)(context.OccupancyRate - 0.6) / 0.4m;
        var markup = currentRate * _maxMarkupPercentage * factor;

        return currentRate + markup;
    }
}

A competitor-based rule uses external data, which might come from a pricing aggregator or the hotel’s own analytics service:

public interface ICompetitorRatesProvider
{
    decimal? GetAverageRate(string roomType, DateOnly checkIn, int lengthOfStay);
}

public class CompetitorParityStrategy : IPricingStrategy
{
    private readonly ICompetitorRatesProvider _provider;
    private readonly decimal _delta;

    public CompetitorParityStrategy(ICompetitorRatesProvider provider, decimal delta)
    {
        _provider = provider;
        _delta = delta;
    }

    public decimal Apply(PricingContext context, decimal currentRate)
    {
        var competitorAvg = _provider.GetAverageRate(
            context.RoomType,
            context.CheckIn,
            context.LengthOfStay);

        if (competitorAvg is null) return currentRate;

        var target = competitorAvg.Value + _delta;
        return Math.Max(currentRate, target);
    }
}

Finally, the pipeline orchestrates the strategies:

public class PricingPipeline
{
    private readonly IReadOnlyList<IPricingStrategy> _strategies;

    public PricingPipeline(IEnumerable<IPricingStrategy> strategies)
    {
        _strategies = strategies.ToList();
    }

    public decimal GetFinalRate(PricingContext context)
    {
        var rate = context.BaseRate;

        foreach (var strategy in _strategies)
            rate = strategy.Apply(context, rate);

        return decimal.Round(rate, 2);
    }
}

This mirrors earlier architectural patterns—small, composable building blocks instead of one large “pricing monster service.”

6.1.2 Using a Rules Engine for Complex Pricing Scenarios

As hotels grow, pricing logic becomes more complex. Revenue managers often need rules like:

  • “Increase rates by 20% on weekends when occupancy exceeds 80%.”
  • “Cap rates at 3× base price during major city events.”
  • “Reduce rates for OTA channels when direct channel occupancy is low.”

Writing these rules in code is possible, but not sustainable when non-developers are driving changes. A rules engine like NRules lets teams configure pricing behavior declaratively.

A simplified rule might look like:

public class HighDemandWeekendRule : NRules.Fluent.Dsl.Rule
{
    public override void Define()
    {
        PricingContext context = default!;
        decimal candidateRate = default;

        When()
            .Match(() => context, c =>
                (c.CheckIn.DayOfWeek is DayOfWeek.Friday or DayOfWeek.Saturday) &&
                c.OccupancyRate > 0.8)
            .Match(() => candidateRate);

        Then()
            .Do(ctx =>
            {
                var newRate = candidateRate * 1.20m;
                ctx.Update(newRate);
            });
    }
}

This allows business teams to adjust formulas without touching the core booking services. The rules engine functions as another strategy—just one powered by data instead of custom code.

6.1.3 Caching Pricing with FusionCache

Pricing endpoints typically see heavy read traffic, especially during search. Rather than recomputing prices for the same date range hundreds of times per minute, caching stabilizes performance.

public class PricingService
{
    private readonly FusionCache _cache;
    private readonly PricingPipeline _pipeline;

    public PricingService(FusionCache cache, PricingPipeline pipeline)
    {
        _cache = cache;
        _pipeline = pipeline;
    }

    public async Task<decimal> GetRateAsync(PricingContext context, CancellationToken ct)
    {
        var key = $"price:{context.RoomType}:{context.CheckIn}:{context.LengthOfStay}:{context.Channel}";

        return await _cache.GetOrSetAsync(
            key,
            async _ =>
            {
                var baseRate = await LoadBaseRateAsync(context, ct);
                var ctx = context with { BaseRate = baseRate };
                return _pipeline.GetFinalRate(ctx);
            },
            options => options.SetDuration(TimeSpan.FromMinutes(2)));
    }
}

Short-lived caches work well because pricing changes frequently, and cache invalidation can be tied to events such as:

  • Inventory level changes
  • Competitor rate updates
  • Special event announcements

This keeps dynamic pricing responsive without overloading the system.

6.2 Statistical Overbooking Strategies

Overbooking is a reality in hotel operations. Hotels know a portion of guests cancel or don’t show up, especially on certain room types or during certain seasons. If the system always stops at physical capacity, the hotel leaves money on the table.

Done correctly, overbooking boosts revenue while keeping walk situations rare.

6.2.1 Understanding Cancellations and No-Shows

Overbooking is based on probability. Instead of assuming every guest will arrive, the system uses historical show-up rates to decide how many extra bookings are safe.

For example:

  • Deluxe King no-show rate: 8%
  • Peak season no-show rate: 3%
  • Weekday corporate no-show rate: 12%

You can model the guest arrival count as a binomial distribution. This computation happens offline—perhaps part of nightly jobs—and produces a recommended overbooking threshold per room type and season.

Example of an offline helper:

from math import comb

def prob_shows_at_most(capacity, bookings, show_prob):
    total = 0.0
    for k in range(0, capacity + 1):
        total += comb(bookings, k) * (show_prob ** k) * ((1 - show_prob) ** (bookings - k))
    return total

def recommended_overbooking(capacity, show_prob, max_walk_prob=0.02):
    max_extra = 0
    for extra in range(0, 20):
        bookings = capacity + extra
        prob_safe = prob_shows_at_most(capacity, bookings, show_prob)
        if 1 - prob_safe <= max_walk_prob:
            max_extra = extra
        else:
            break
    return max_extra

The output becomes an operational guideline stored in a configuration table.

6.2.2 Configurable Overbooking Factors per Room Type

Once the overbooking factor is known, you incorporate it into the availability calculation. Rather than changing the underlying physical inventory, you expand the bookable count.

public record OverbookingConfig(
    string RoomType,
    DateOnly From,
    DateOnly To,
    int ExtraCapacity);

And then:

public class OverbookingAwareAvailabilityService
{
    private readonly IInventoryReadRepository _inventory;
    private readonly IOverbookingConfigRepository _configRepository;

    public OverbookingAwareAvailabilityService(
        IInventoryReadRepository inventory,
        IOverbookingConfigRepository configRepository)
    {
        _inventory = inventory;
        _configRepository = configRepository;
    }

    public async Task<int> GetBookableUnitsAsync(string roomType, DateRange dateRange, CancellationToken ct)
    {
        var physicalAvailability = await _inventory.GetMinAvailabilityAsync(roomType, dateRange, ct);
        var extra = await _configRepository.GetExtraCapacityAsync(roomType, dateRange, ct);

        return physicalAvailability + extra;
    }
}

Nothing else changes: the Reservation Saga still performs holds, payments, confirmations, and compensations exactly as before. The system simply operates with a wider capacity window.

6.2.3 Graceful Walks and Operational Handling

Even with careful modeling, rare situations will occur where guests exceed capacity. A scalable hotel system should help operations decide who to walk and provide clarity about compensation and alternatives.

A simple rule-based selector:

public class WalkDecisionService
{
    public IEnumerable<BookingCandidate> SelectGuestsToWalk(
        IEnumerable<BookingCandidate> candidates,
        int excessCount)
    {
        return candidates
            .OrderByDescending(c => c.IsDirect)  // Prefer keeping direct guests
            .ThenBy(c => c.AverageDailyRate)     // Walk lowest ADR first
            .Take(excessCount);
    }
}

public record BookingCandidate(
    Guid BookingId,
    bool IsDirect,
    decimal AverageDailyRate);

Situations where walks occur should also generate domain events like GuestWalked, feeding analytics and future overbooking models. This closes the feedback loop and improves future forecasts.


7 High-Performance Reads: CQRS and Projections

By this point, we have a solid booking workflow, strong concurrency controls, and reliable synchronization with OTAs. The next challenge is handling read traffic, especially search. In most hotel systems, search and browsing traffic is much higher than actual bookings. Thousands of users may be exploring dates, room types, and prices for every booking that gets confirmed.

If every “Show available rooms” request hits your transactional database, you’ll either overload it or twist the schema into something that’s good for reads but terrible for correctness. CQRS (Command Query Responsibility Segregation) plus projections let you keep the write side focused on correctness and the read side focused on speed.

7.1 Separating Search from Inventory

The key idea is simple: Use one model for updating inventory and reservations, and a different model for searching and browsing.

7.1.1 Why the Transactional DB Cannot Handle Search Loads

Your Inventory and Reservation services are optimized for:

  • Strict consistency on room-type availability.
  • Short, targeted queries (e.g., “What’s the availability for Deluxe King on 2025-03-01?”).
  • Reliable writes and concurrency control (sagas, OCC, locks).

Search is a completely different workload. A typical user request might say:

“Show me hotels in Delhi on these dates, sorted by price, filtered by free Wi-Fi, rating > 4, and refundable.”

This kind of query:

  • Touches many hotels at once.
  • Combines multiple filters and sorts.
  • Requires pagination and quick responses.

Trying to serve this from the transactional schema leads to:

  • Many JOINs across hotel, room-type, rate, and inventory tables.
  • Complex indexes that still don’t cover all combinations.
  • Increased lock contention and slower writes, especially during peak load.

To avoid that, you move search to a separate subsystem and treat the transactional database as a source of truth that feeds projections.

7.1.2 Flattening Data for Read Optimization

Search engines (like Elasticsearch or OpenSearch) prefer denormalized documents. Instead of spreading hotel data across many tables, you store it in a single document:

{
  "hotelId": "H123",
  "name": "City Center Hotel",
  "location": { "lat": 28.6139, "lon": 77.2090 },
  "rooms": [
    {
      "roomType": "DeluxeKing",
      "minRate": 4500,
      "maxRate": 9000,
      "maxOccupancy": 2
    }
  ],
  "amenities": ["wifi", "parking", "pool"],
  "rating": 4.3,
  "city": "Delhi",
  "country": "IN"
}

This structure lets the search engine:

  • Filter by city, rating, amenities, and price ranges.
  • Sort by rate or rating.
  • Combine keyword search (“City Center”) with structured filters.

Availability can be:

  • Included as simplified “availability buckets” (e.g., “has rooms for some of these dates”), or
  • Fetched separately at quote time from a fast availability cache (as we did in previous sections).

The goal is not to mirror your entire transactional model but to store just enough to answer search queries quickly.

7.2 Implementing the Search Subsystem

To implement search, you project data from your core services (Inventory, Pricing, Hotel metadata) into a dedicated index. This follows the same event-driven mindset we used for Sagas and channel management.

7.2.1 Indexing Hotels and Room Types into Elasticsearch/OpenSearch

Using a .NET Elasticsearch client, you can maintain a HotelSearchDocument for each hotel.

public class HotelSearchDocument
{
    public string HotelId { get; set; } = default!;
    public string Name { get; set; } = default!;
    public double Rating { get; set; }
    public string City { get; set; } = default!;
    public string Country { get; set; } = default!;
    public GeoLocation Location { get; set; } = default!;
    public List<string> Amenities { get; set; } = new();
    public List<RoomTypeSearchInfo> Rooms { get; set; } = new();
}

public class RoomTypeSearchInfo
{
    public string RoomType { get; set; } = default!;
    public decimal MinRate { get; set; }
    public decimal MaxRate { get; set; }
    public int MaxOccupancy { get; set; }
}

Indexer example:

public class HotelSearchIndexer
{
    private readonly IElasticClient _elasticClient;

    public HotelSearchIndexer(IElasticClient elasticClient)
    {
        _elasticClient = elasticClient;
    }

    public async Task IndexHotelAsync(HotelSearchDocument doc, CancellationToken ct)
    {
        var response = await _elasticClient.IndexAsync(doc, i => i
            .Index("hotels")
            .Id(doc.HotelId), ct);

        if (!response.IsValid)
        {
            // Log and handle indexing error
        }
    }
}

Domain events like HotelCreated, RoomTypeCreated, RatePlanChanged, and HotelMetadataUpdated feed this index. Whenever something important changes, a projection updates the relevant hotel document.

7.2.2 Handling Stale Availability with a “Book Now” Validation Step

Search results are eventually consistent. Even with fast updates, there will always be a small delay between a room being sold and the index reflecting that.

To avoid promising rooms that are already gone, you introduce a “quote and validate” step when the user clicks “Book now” on a search result:

  1. User searches → results served from Elasticsearch.

  2. User selects a specific room type and date range.

  3. The frontend calls a backend endpoint that:

    • Checks current availability via the Inventory or Availability service.
    • Recomputes price using the pricing engine.
    • Returns a short-lived quote or a “sold out” response.

Example:

public async Task<BookingQuote> GetQuoteAsync(QuoteRequest request, CancellationToken ct)
{
    var availability = await _availabilityService
        .GetBookableUnitsAsync(request.RoomType, request.DateRange, ct);

    if (availability <= 0)
        throw new SoldOutException();

    var occupancyRate = await _occupancyService.GetOccupancyRateAsync(request.DateRange, ct);

    var pricingContext = new PricingContext(
        BaseRate: 0, // will be set inside
        OccupancyRate: occupancyRate,
        CheckIn: request.DateRange.From,
        LengthOfStay: request.DateRange.Nights,
        RoomType: request.RoomType,
        Channel: request.Channel);

    var rate = await _pricingService.GetRateAsync(pricingContext, ct);

    return new BookingQuote(request.RoomType, request.DateRange, rate);
}

This approach keeps:

  • Search fast (served from index + cache).
  • Bookings accurate (validated against real-time availability and pricing).

It also fits nicely into the Saga approach from earlier sections: the quote leads into a hold and payment workflow.

7.2.3 Geo-Spatial Queries for “Hotels Near Me”

For mobile and location-driven experiences, users often ask for “hotels near me.” Elasticsearch/OpenSearch make this straightforward with geo-distance queries.

public async Task<IReadOnlyCollection<HotelSearchDocument>> SearchNearbyAsync(
    double lat, double lon, double radiusKm, CancellationToken ct)
{
    var response = await _elasticClient.SearchAsync<HotelSearchDocument>(s => s
        .Index("hotels")
        .Query(q => q
            .GeoDistance(g => g
                .Field(f => f.Location)
                .Distance($"{radiusKm}km")
                .Location(lat, lon)
            )
        )
        .Size(50), ct);

    return response.Documents;
}

You can combine this with other filters:

  • Price range
  • Rating threshold
  • Amenities (parking, breakfast, Wi-Fi)

All of this happens in the search engine, keeping the transactional database free to focus on bookings and inventory updates.

7.3 Multi-Level Caching Strategies

Even with Elasticsearch in place, caching is still important. Some read patterns don’t need a search engine at all. Others benefit from Redis or in-memory caches layered on top.

Think in three layers:

  1. In-memory cache — For small, rarely-changing data (hotel metadata, amenities).
  2. Distributed cache (Redis) — For data that changes often but is heavily reused (availability calendars).
  3. Search index — For full-text and complex filtering queries.

7.3.1 Distributed Caching for Availability Calendars

Availability per room type over a month is a small, structured dataset and gets read a lot. Redis fits this pattern very well.

public class AvailabilityCalendarCache
{
    private readonly IDatabase _redis;

    public AvailabilityCalendarCache(IConnectionMultiplexer mux)
    {
        _redis = mux.GetDatabase();
    }

    public async Task<AvailabilityCalendar?> GetAsync(string hotelId, string roomType, int year, int month)
    {
        var key = $"avail:{hotelId}:{roomType}:{year}-{month}";
        var json = await _redis.StringGetAsync(key);
        return json.IsNullOrEmpty ? null : JsonSerializer.Deserialize<AvailabilityCalendar>(json!);
    }

    public async Task SetAsync(string hotelId, string roomType, int year, int month, AvailabilityCalendar calendar)
    {
        var key = $"avail:{hotelId}:{roomType}:{year}-{month}";
        var json = JsonSerializer.Serialize(calendar);
        await _redis.StringSetAsync(key, json, TimeSpan.FromMinutes(10));
    }
}

Search and availability endpoints can hit this cache instead of querying the Inventory table for every request.

7.3.2 In-Memory Caching for Static Content

Static hotel data—names, photos, amenities, policies—rarely changes. You don’t need Redis for this. Simple IMemoryCache in each API instance is enough.

public class HotelMetadataService
{
    private readonly IMemoryCache _cache;
    private readonly IHotelMetadataRepository _repo;

    public HotelMetadataService(IMemoryCache cache, IHotelMetadataRepository repo)
    {
        _cache = cache;
        _repo = repo;
    }

    public Task<HotelMetadata> GetAsync(string hotelId)
    {
        return _cache.GetOrCreateAsync(
            $"hotel-meta:{hotelId}",
            async entry =>
            {
                entry.SlidingExpiration = TimeSpan.FromHours(1);
                return await _repo.GetAsync(hotelId);
            })!;
    }
}

This keeps metadata calls cheap and frees up your databases for more important work.

7.3.3 Event-Driven Cache Invalidation

As always, the hard part with caching is keeping it fresh. The event-driven design used earlier helps here too. When inventory changes, the Inventory service publishes an InventoryChanged event. Consumers react to that event and update or invalidate relevant cache keys.

public class InventoryChangedConsumer : IConsumer<InventoryChanged>
{
    private readonly IConnectionMultiplexer _mux;

    public InventoryChangedConsumer(IConnectionMultiplexer mux)
    {
        _mux = mux;
    }

    public async Task Consume(ConsumeContext<InventoryChanged> context)
    {
        var db = _mux.GetDatabase();
        var hotelId = context.Message.HotelId;
        var roomType = context.Message.RoomType;

        // In practice, you might track exact keys or use tags.
        var key = $"avail:{hotelId}:{roomType}:{context.Message.YearMonth}";
        await db.KeyDeleteAsync(key);
    }
}

In a production system, you’d likely:

  • Maintain an index of keys per room type, or
  • Use a tagging or key-prefix scheme plus Lua scripts to delete groups of keys.

The core idea stays the same: inventory events drive cache updates, keeping read performance high without compromising correctness.


8 Resilience and Observability in Production

Everything we’ve built so far—inventory safety, Sagas, pricing, search, and OTA sync—only works if the system behaves predictably under real-world stress. In production, OTAs time out, payment gateways misbehave, networks get flaky, and databases slow down. A scalable hotel booking system has to expect these problems and remain usable anyway.

That’s where resilience patterns and observability come together. Resilience keeps the system from falling over. Observability tells you why something is slow or failing so you can fix it before guests notice.

8.1 Chaos Engineering and Stress Testing

You don’t want the first real load test to be New Year’s Eve on production. Stress tests and controlled failure tests (chaos) help you verify that your concurrency controls, Sagas, caches, and circuit breakers behave as expected in realistic traffic patterns.

8.1.1 Simulating High-Concurrency Booking Spikes with k6

Tools like k6 allow you to simulate thousands of users hitting your APIs at once. For a hotel system, a useful scenario might be a promotional campaign where everyone is trying to book the last few Deluxe King rooms for a festival weekend.

The details of the tool don’t matter as much as how you structure the test:

  • Gradually ramp up traffic.
  • Hold at peak concurrency.
  • Then ramp down.

During the test, you’re not looking for 100% success. You’re checking that:

  • Successful bookings stay consistent (no double bookings).
  • Failures are clean (e.g., HTTP 409 for sold-out, not 500).
  • Latency stays within acceptable limits.

Example (illustrative Python that prints a k6 script):

# This is illustrative; actual k6 scripts are written in JavaScript.
script = """
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '30s', target: 100 },
    { duration: '2m', target: 500 },
    { duration: '30s', target: 0 },
  ],
};

export default function () {
  const res = http.post('https://api.example.com/bookings/quote-and-reserve', {
    roomType: 'DeluxeKing',
    checkIn: '2025-12-24',
    checkOut: '2025-12-26',
    channel: 'Web',
  });

  check(res, {
    'status is 200 or 409': (r) => r.status === 200 || r.status === 409,
  });

  sleep(1);
}
"""
print(script)

In this scenario, 200 means the booking (or quote) worked; 409 means “sold out” and is an expected outcome when inventory runs out. You want to confirm that no errors show up when inventory hits zero and that your Redlock/OCC/Saga logic holds under load.

8.1.2 Testing Saga Compensation by Injecting Payment Failures

You also need to know what happens when payment fails after inventory has been held. The Saga should:

  • Mark the booking as failed.
  • Trigger compensation to release the inventory hold.
  • Send any necessary notifications.

A clean way to test this is to inject failures via feature flags in non-production environments:

public class TestPaymentGateway : IPaymentGateway
{
    private readonly IFeatureFlags _flags;

    public TestPaymentGateway(IFeatureFlags flags)
    {
        _flags = flags;
    }

    public Task<PaymentResult> ChargeAsync(PaymentRequest request, CancellationToken ct)
    {
        if (_flags.IsEnabled("ForcePaymentFailure") ||
            request.Metadata.TryGetValue("Fail", out var failFlag) && failFlag == "true")
        {
            return Task.FromResult(PaymentResult.Failed("Injected failure"));
        }

        // Normal flow or call to a real gateway stub
        return Task.FromResult(PaymentResult.Success("PAY-123"));
    }
}

You then:

  • Run a batch of test bookings.
  • Force payment failures.
  • Verify that the Saga releases reservation holds and that Inventory returns to expected values.

This proves your compensating actions do what they’re supposed to do in real failure conditions.

8.2 Observability with OpenTelemetry

With multiple services—Reservation, Inventory, Pricing, Search, Channel Manager—problems rarely show up in just one place. A guest might say, “The booking page was spinning forever,” but the root cause could be a slow payment gateway or a locked inventory row.

Observability (traces, metrics, logs) gives you the ability to follow a booking request across the whole system and see where time is spent and where things fail.

8.2.1 Distributed Tracing from Frontend to Database Lock

OpenTelemetry provides a standard way to collect traces across your .NET services. Traces let you see one booking request as a single flow: from API Gateway to Reservation Saga to Inventory locking and payment, and back.

Basic setup:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddSource("BookingSaga", "InventoryService")
            .AddOtlpExporter();
    });

Within critical operations—like acquiring the Redis lock or decrementing inventory—you enrich the trace:

using var activity = MyActivitySource.StartActivity("ReserveInventory");
activity?.SetTag("booking.id", bookingId);
activity?.SetTag("roomType", roomType);
activity?.SetTag("dateRange", $"{from:O}-{to:O}");

// Call inventory reservation
await _inventoryService.ReserveAsync(...);

Later, in your tracing UI, you can see:

  • How long each step took.
  • Where contention is high (e.g., for certain room types or dates).
  • Which calls are causing most latency during peak periods.

This is invaluable when debugging real production issues like “sometimes bookings take 10 seconds.”

8.2.2 Metrics That Actually Matter

You don’t need a massive list of metrics. You need a handful that map directly to how the hotel business and the platform behave.

Examples for this system:

  • InventoryLockContentionRate — How often lock acquisition is retried or fails. High numbers might signal over-aggressive locking or under-provisioned Redis.
  • BookingConversionRate — Quotes vs confirmed bookings. Drops may indicate payment issues, pricing problems, or UX issues.
  • ChannelSyncLatency — Time from internal inventory change to OTA acknowledgment. High values can mean OTA slowness or channel manager bottlenecks.

You can track these with OpenTelemetry Metrics:

var meter = new Meter("Hotel.Booking", "1.0.0");
var lockContentionCounter = meter.CreateCounter<long>("inventory_lock_contention");
var bookingConversionCounter = meter.CreateCounter<long>("booking_conversion_total");

public void RecordLockContention()
{
    lockContentionCounter.Add(1);
}

public void RecordBookingConversion(bool success)
{
    bookingConversionCounter.Add(1,
        new KeyValuePair<string, object?>("success", success));
}

In dashboards, you can then correlate spikes in contention or latency with events like deployments, promotions, or OTA outages.

8.2.3 Correlating Logs, Traces, and Metrics

Logs alone are noisy. Metrics alone tell you that something is wrong, but not why. Traces alone can be overwhelming. The real power comes from tying them together with correlation IDs and trace IDs.

A simple pattern is to enrich logs with the current TraceId:

public class TraceEnricher : ILogger
{
    private readonly ILogger _inner;

    public TraceEnricher(ILogger inner) => _inner = inner;

    public void Log<TState>(LogLevel logLevel, EventId eventId,
        TState state, Exception? exception, Func<TState, Exception?, string> formatter)
    {
        var traceId = Activity.Current?.TraceId.ToString() ?? "none";

        using (_inner.BeginScope(new Dictionary<string, object>
        {
            ["traceId"] = traceId
        }))
        {
            _inner.Log(logLevel, eventId, state, exception, formatter);
        }
    }

    // Other ILogger members delegate to _inner...
}

When something goes wrong, you can:

  • Start from a metric spike.
  • Jump into a specific trace.
  • Then open logs for that exact trace ID.

This turns “something is slow” into a concrete trail you can follow.

8.3 Circuit Breakers and Bulkheads

Even a well-designed system will eventually encounter misbehaving dependencies. Circuit breakers and bulkheads prevent those failures from dragging the entire platform down.

8.3.1 Protecting the Booking Flow When an OTA API Hangs

OTAs are important but not critical to booking a room right now. If an OTA API is slow or down, the Channel Manager should degrade gracefully:

  • Buffer updates.
  • Open a circuit breaker to avoid repeated failing calls.
  • Rely on reconciliation jobs later.

With Polly, a circuit breaker around OTA calls might look like this:

var otaCircuitBreaker = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromSeconds(30));

public async Task PushAvailabilityAsync(OtaUpdate update, CancellationToken ct)
{
    await otaCircuitBreaker.ExecuteAsync(async () =>
    {
        var response = await _client.PostAsJsonAsync("/availability", update, ct);
        response.EnsureSuccessStatusCode();
        return response;
    });
}

When the circuit is open:

  • The Channel Manager stops hammering the OTA.
  • Updates are queued for later.
  • Core booking operations (Inventory, Reservation) continue running normally.

This matches the principle from earlier: OTAs are consumers of your data, not gatekeepers of your booking flow.

8.3.2 Bulkheading the Booking Service

Bulkheads prevent one part of the system from exhausting shared resources. For example, reporting or analytics jobs should never consume all database connections and block live bookings.

You can enforce this with separate connection pools or concurrency limits:

var bookingBulkhead = Policy.BulkheadAsync(
    maxParallelization: 100,
    maxQueuingActions: 200);

public Task<TResult> ExecuteInBookingBulkheadAsync<TResult>(Func<Task<TResult>> action)
{
    return bookingBulkhead.ExecuteAsync(action);
}

You’d apply this bulkhead to booking-related operations. Reporting services might:

  • Use a read replica instead of the primary database.
  • Have their own connection pool.

This ensures that even if someone runs a heavy report or a long-running export, guests can still search and confirm bookings.

Advertisement