Skip to content
The Apache Pulsar Advantage: Why Tencent Moved from Kafka - Multi-Tenancy, Geo-Replication, and Tiered Storage in Practice

The Apache Pulsar Advantage: Why Tencent Moved from Kafka - Multi-Tenancy, Geo-Replication, and Tiered Storage in Practice

1 Introduction: The Scale Ceiling and the Architectural Pivot

Streaming platforms behave very differently once they move past “large” and enter true enterprise scale. At modest volumes, most architectures work well enough with tuning and careful operations. But once throughput climbs into the tens or hundreds of billions of messages per day, the system starts exposing limits that can’t be fixed with configuration alone. Tencent reached this point while running core billing, advertising, and payment systems—workloads that require strict ordering, complete auditability, and effectively zero data loss.

Kafka had served Tencent well for years and was heavily optimized. Still, operational stability and cost efficiency became harder to maintain as scale increased. The decision to move to Apache Pulsar wasn’t driven by missing features or marginal performance gains. It was driven by architecture. Kafka’s monolithic log model had reached a ceiling, and continuing to push against it meant increasing operational risk. Pulsar offered a different foundation: separating compute from storage, breaking partitions into smaller units, and building multi-tenancy into the core of the system.

This section walks through the limits Tencent encountered, why those limits were structural rather than operational, and how Pulsar’s design provided a sustainable way forward.

1.1 The Evolution of Streaming Data at Scale

When teams talk about “high-scale” streaming, they often focus on peak throughput or daily ingestion numbers. Tencent’s workload went far beyond that. The system processed over 100 billion messages per day, continuously, across many independent business units. These were not disposable metrics or debug logs. They included billing records, ad delivery counts, payment reconciliation events, and other data that needed to be accurate, traceable, and durable.

At this level, scale stops being linear. Growth in one part of the organization affects everyone else sharing the same infrastructure. Three issues became increasingly visible.

First, data volume grows faster than expected. New products don’t just add new topics; they increase load across brokers, disks, and networks that are already busy. Kafka clusters that supported unrelated teams began to influence each other through uneven partition placement and broker hotspots.

Second, operational effort increases faster than throughput. Kafka’s partition-based model requires planning ahead. Partition counts must be chosen early and revisited regularly. Increasing them later often means repartitioning and moving data. At Tencent’s scale, this wasn’t a background maintenance task. Moving terabytes or petabytes of data required long maintenance windows, careful coordination, and real risk to production stability.

Third, storage growth becomes tightly coupled to cluster health. In Kafka, storage and compute live on the same brokers. Adding disk capacity often means adding brokers, even if CPU and network are already sufficient. Teams spent more time managing disk pressure, broker replacements, and uneven storage utilization than improving the platform itself.

Defining the Hyper-Scale Problem

Hyper-scale streaming is not just about handling more messages. It’s about handling several hard requirements at the same time:

  • Large consumer fan-out, where many applications read the same data without slowing producers.
  • Huge numbers of topic segments spread across thousands of partitions.
  • Independent scaling of reads and writes, so one does not constrain the other.
  • Strong durability guarantees, even while hardware fails and clusters evolve.

Kafka handled many of these well in isolation. But when all of them appeared together—high partition counts, long retention, strict ordering, and cross-region replication—the system became increasingly difficult to operate. At that point, more tuning produced diminishing returns.

The Limitations of the Monolithic Log Architecture

Kafka’s architecture tightly couples several concerns:

  • Partitions are owned by specific brokers
  • Storage lives with compute
  • Scaling topics often means moving physical data

This coupling creates predictable failure modes at large scale.

Rebalancing becomes expensive. When partitions move, their data moves with them. At petabyte scale, rebalancing is no longer routine—it becomes a planned production event with real risk.

Hotspots are hard to fix. If partitions concentrate on a subset of brokers, the only real fix is moving data. Load cannot be shifted dynamically without triggering rebalancing.

Storage scaling forces compute scaling. Even if brokers have spare CPU and network capacity, adding disk often requires adding more machines.

Replication multiplies the problem. Cross-cluster replication creates additional full copies of data, increasing both storage costs and operational complexity.

By the time Tencent crossed the hundred-billion-message mark, these issues formed a clear ceiling. Continuing with the same architecture meant increasing operational effort without improving reliability. Pulsar offered a fundamentally different approach.

1.2 The Migration Driver: Why Tencent Moved

Tencent’s engineers pointed to three primary reasons for moving away from Kafka: operational burden, lack of strong multi-tenancy, and durability guarantees for financial systems. Each mattered on its own. Together, they made the migration unavoidable.

Operational Toil: Partition Rebalancing and Broker Coupling

Kafka brokers are stateful. When a broker fails, is overloaded, or needs maintenance, partitions must be reassigned. For Tencent, that meant:

  • Rebalances involving tens of terabytes of data
  • Noticeable latency spikes during broker restarts
  • Careful scheduling of maintenance to avoid customer impact

Kafka’s early tiered storage options did not fully solve this, because partitions were still tied to brokers. Teams spent significant time planning rebalances, monitoring long-running data movement, and coordinating changes across teams. Pulsar’s stateless brokers removed most of this operational friction.

Resource Isolation: Strict Multi-Tenancy

Tencent runs many independent teams on shared infrastructure. Kafka provides limited isolation:

  • No built-in tenant concept
  • No namespace-level controls
  • Most limits apply at the cluster level

As a result, a load spike or misconfiguration from one team could affect others, including critical billing systems.

Pulsar was designed differently. It introduces clear boundaries:

  • Tenants represent organizational ownership
  • Namespaces group related applications
  • Topics represent individual streams

Each level supports quotas and isolation for throughput, storage, and dispatch. This aligned well with Tencent’s internal structure and made it possible to safely run many workloads on a single platform.

Consistency: Zero Data Loss for Billing Data

Billing and payment systems require stronger guarantees than “usually durable.” Kafka provides strong durability, but edge cases exist during broker crashes, ISR changes, or recovery from disk issues.

BookKeeper, Pulsar’s storage layer, writes data to multiple storage nodes and acknowledges success only after a quorum confirms persistence. This model matched Tencent’s requirements for financial-grade durability and reduced the risk of edge-case data loss.

1.3 Pulsar vs. Kafka: The High-Level Architectural Divergence

The difference between Kafka and Pulsar is not incremental. It’s architectural.

  • Kafka is partition-centric, built around large, broker-owned logs
  • Pulsar is segment-centric, built around small, distributed ledgers
  • Kafka couples storage and compute
  • Pulsar separates them

Partition-Centric vs. Segment-Centric

In Kafka, a partition is a long-lived log owned by a broker. Scaling usually means moving that log.

In Pulsar, a partition is a logical concept made up of many small segments. Each segment is a BookKeeper ledger distributed across multiple storage nodes. New segments can be created on different storage ensembles without touching old data. Scaling affects future writes, not historical data.

This removes the need for large-scale data movement and spreads I/O naturally across the cluster.

Compute–Storage Separation

Pulsar brokers:

  • Do not store long-term data
  • Can be added or removed easily
  • Focus on routing, replication, and access control

BookKeeper handles durability, replication, and storage growth independently. This separation allowed Tencent to scale storage without scaling brokers and to adjust compute capacity without touching stored data.


2 Architecture Deep Dive: Decoupling Compute from Storage

Once Tencent identified that Kafka’s limits were architectural, not operational, the next step was understanding what a different architecture needed to look like. The core insight behind Pulsar is simple: compute and storage do not scale in the same way. Message routing, protocol handling, and authorization scale with connections and throughput. Storage scales with retention, replication, and durability requirements. Treating both as a single unit eventually creates friction.

Kafka’s design ties these concerns together. Pulsar deliberately pulls them apart. That single decision explains most of the differences in behavior, operability, and long-term scalability discussed throughout this article.

2.1 The Two-Layer Architecture Explained

Pulsar is built from three main components that work together but scale independently:

  1. Brokers – the stateless compute layer
  2. Bookies (Apache BookKeeper) – the durable storage layer
  3. ZooKeeper – metadata coordination (with a move toward pluggable metadata services)

This separation allows each layer to grow, shrink, or fail without forcing changes in the others.

The Serving Layer (Brokers)

Brokers sit on the hot path between clients and storage. They are responsible for:

  • Managing producer and consumer connections
  • Dispatching messages to consumers
  • Enforcing authentication and authorization
  • Applying quotas and rate limits
  • Creating and rolling message segments

What brokers do not do is just as important: they do not own long-lived data. Apart from short-term caches, brokers remain stateless. If a broker fails or becomes overloaded, clients reconnect to another broker and continue. No data needs to be moved, reassigned, or rebuilt.

This design directly addresses one of Kafka’s biggest operational pain points. In Kafka, broker failures trigger partition leadership changes and sometimes rebalancing. In Pulsar, broker failure is closer to a load balancer failure—noticeable, but not disruptive.

The Storage Layer (Bookies)

Durable storage lives entirely in Apache BookKeeper. Instead of large, broker-owned logs, BookKeeper stores data in append-only ledgers distributed across many storage nodes, called bookies.

Several characteristics matter in practice:

  • Striped writes distribute each segment across multiple bookies, increasing throughput and resilience.
  • Small failure domains limit recovery work to individual ledgers instead of entire partitions.
  • Independent scaling allows operators to add bookies purely to increase storage capacity or write bandwidth.

At Tencent scale, this mattered because storage growth followed retention and compliance requirements, not traffic spikes. BookKeeper allowed storage to grow without dragging broker count—and operational complexity—along with it.

It is also common to share a single BookKeeper cluster across multiple Pulsar clusters. This pattern works especially well in large, multi-tenant environments and cloud deployments.

Coordination via ZooKeeper

ZooKeeper holds only metadata:

  • Topic and namespace definitions
  • Ledger metadata
  • Lightweight ownership and coordination data

Because this metadata is small and changes infrequently compared to message traffic, ZooKeeper remains stable even under heavy load. Pulsar continues to reduce dependency on ZooKeeper by supporting pluggable metadata stores, but the principle remains the same: coordination is kept lightweight and isolated from the data plane.

2.2 Understanding Segments vs. Partitions

One of the most important differences between Pulsar and Kafka lies in how they model data over time. Kafka partitions are long-lived, broker-owned entities. Pulsar treats partitions as logical containers made up of many smaller physical pieces called segments.

This distinction explains why Pulsar scales without rebalancing storms.

How Pulsar Breaks a Partition into Segments

A Pulsar partition consists of a sequence of BookKeeper ledgers:

  • Each ledger represents a segment of the stream
  • Each segment has its own storage ensemble
  • Segments are immutable once closed

When a segment reaches a configured size or time limit, the broker closes it and creates a new one. This happens continuously and transparently.

Unlike Kafka, where partitions grow indefinitely, Pulsar partitions are constantly rolling forward. Old data stays where it is. New data adapts to the current shape of the cluster.

Rolling Segments Enable Instant Scalability

This rolling behavior changes how scaling works.

In Kafka, rebalancing means moving existing data. In Pulsar, scaling affects only future writes. When new bookies are added:

  • New segments are written using new storage ensembles
  • Existing segments are left untouched
  • No data migration is required

This makes scaling predictable. The cost of adding capacity does not depend on how much data already exists. At Tencent’s scale, this removed one of the largest sources of operational risk.

Parallel Reads and Writes Across the Cluster

Segments are distributed across bookies, which enables parallelism by default:

  • Writes are striped across multiple storage nodes
  • Reads can come from different bookies depending on which segment is accessed
  • Hot partitions naturally spread out over time as new segments are created

This behavior reduces hotspots without operator intervention. Instead of trying to rebalance partitions manually, the system continuously balances itself as data flows through it.

2.3 Apache BookKeeper Internals for Architects

BookKeeper often looks unfamiliar to architects used to traditional log-based systems. It is not a distributed filesystem and not a classic log. Understanding a few core concepts helps clarify why it fits Pulsar’s needs so well.

Ledgers and Entries

A ledger is an append-only sequence of entries. An entry is the smallest atomic unit written to storage.

Important properties of ledgers:

  • They are immutable once closed
  • Their storage ensemble is fixed at creation
  • They are designed for fast sequential writes

This immutability simplifies recovery and replication. Instead of repairing large logs, BookKeeper focuses on ensuring each ledger is either complete or safely fenced off.

The Quorum Protocol (Qe, Qw, Qa)

BookKeeper uses quorum-based replication rather than leader–follower replication.

Key parameters:

  • Qe (Ensemble Size): how many bookies store each ledger
  • Qw (Write Quorum): how many bookies receive a write
  • Qa (Ack Quorum): how many acknowledgements are required

Example configuration:

Qe = 3
Qw = 3
Qa = 2

This means data is written to three bookies and considered durable once two acknowledge it. The system can tolerate one failure without losing availability. Architects can tune these values to balance durability, latency, and cost depending on workload requirements.

I/O Isolation: Journal vs. Ledger Storage

BookKeeper physically separates write and read paths:

  • Journal disks handle sequential writes for durability
  • Ledger disks serve reads and background scans

Kafka typically combines these on the same disks, which can cause read traffic to interfere with writes. BookKeeper’s separation ensures that heavy reads—such as backfills or reprocessing—do not slow down incoming writes.

For Tencent, this isolation was critical. Billing and payment systems require predictable write latency, even when downstream systems replay large volumes of historical data.


3 Native Multi-Tenancy: Building a Shared Platform

At Tencent’s scale, running one messaging cluster per team was never realistic. Hundreds of internal services needed streaming infrastructure, and many of them handled sensitive or mission-critical data. The platform had to be shared—but shared safely. This is where Kafka-based approaches started to strain. Multi-tenancy existed mostly as a convention, enforced by naming rules and operational discipline rather than by the system itself.

Pulsar approaches this differently. Multi-tenancy is not an add-on or a best practice. It is built into the core data model and enforced by the broker and storage layers. This makes it possible to run many independent teams on a single cluster without one team’s workload destabilizing another’s.

3.1 The Hierarchy: Tenants, Namespaces, and Topics

Pulsar organizes data using a clear, explicit hierarchy:

  1. Tenant – represents an organization or business unit
  2. Namespace – represents an application or service boundary
  3. Topic – represents a specific event stream

In a large enterprise, this often maps directly to how teams already think about ownership:

Tenant: finance
Namespace: billing-service
Topic: transaction-events

This structure matters because policies attach naturally at each level. Security, quotas, replication rules, and retention settings can be defined once at the namespace or tenant level instead of being repeated for every topic. Compared to flat topic naming schemes, this drastically reduces operational overhead as the number of teams grows.

Java Admin API: Managing Tenants Programmatically

In practice, tenants and namespaces are rarely created manually. Large organizations automate provisioning as part of onboarding workflows. Pulsar’s Admin API is designed for this.

PulsarAdmin admin = PulsarAdmin.builder()
    .serviceHttpUrl("https://pulsar.example.com")
    .authentication(
        AuthenticationFactory.token("ADMIN_TOKEN")
    )
    .build();

// Create Tenant
admin.tenants().createTenant(
    "finance",
    TenantInfo.builder()
        .allowedClusters(Set.of("us-east", "eu-west"))
        .roles(Set.of("finance-admin"))
        .build()
);

// Create Namespace
admin.namespaces().createNamespace("finance/billing-service");

// Create Partitioned Topic
admin.topics().createPartitionedTopic(
    "persistent://finance/billing-service/transaction-events",
    64
);

This API-driven model fits well with internal developer platforms. Teams request a namespace, automation applies defaults, and the platform enforces boundaries consistently. There is no need to deploy or tune a separate cluster per team.

3.2 Resource Isolation and Quotas

True multi-tenancy requires more than logical separation. It requires enforced limits. Pulsar allows operators to define resource controls at the namespace level, which becomes the primary unit of isolation.

Namespaces can limit:

  • Message publish rate
  • Dispatch rate to consumers
  • Storage usage
  • Bandwidth consumption
  • Producer concurrency

These limits are enforced by brokers at runtime, not just monitored after the fact.

Example: Setting Namespace Quotas

admin.namespaces().setNamespaceQuota(
    "finance/billing-service",
    new NamespaceQuota(
        500_000_000L, // storage in bytes
        20000,        // messages per second
        false         // throttle instead of blocking
    )
);

This ensures that a single service cannot exhaust broker resources or disk capacity. If the service exceeds its quota, Pulsar applies backpressure automatically. Other namespaces continue operating normally.

This directly addresses one of Kafka’s common pain points in shared environments: the “noisy neighbor” problem. In Pulsar, isolation is enforced by design rather than by operational vigilance.

Flow Control from the Java Client

Client-side flow control complements server-side quotas. Pulsar producers can apply backpressure locally to avoid overwhelming brokers during spikes.

PulsarClient client = PulsarClient.builder()
    .serviceUrl("pulsar://cluster.example.com:6650")
    .build();

Producer<byte[]> producer = client.newProducer()
    .topic("persistent://finance/billing-service/transaction-events")
    .accessMode(ProducerAccessMode.Shared)
    .blockIfQueueFull(true)
    .maxPendingMessages(500)
    .batchingEnabled(true)
    .create();

When combined with namespace quotas, this creates a closed feedback loop. Producers slow down naturally under pressure, brokers remain stable, and other tenants are unaffected—even during unexpected traffic bursts.

3.3 Security at Scale

Security becomes harder—not easier—when infrastructure is shared. Each tenant must be isolated not only in terms of performance, but also in terms of access. Pulsar treats security as a first-class concern rather than an external integration.

Out of the box, Pulsar supports:

  • OIDC and OAuth2 authentication
  • Role-based authorization (ACLs)
  • End-to-end message encryption

These features integrate directly with the tenant and namespace model.

Client Authentication Using OAuth2

Authentication auth = AuthenticationFactoryOAuth2.builder()
    .issuerUrl("https://auth.example.com")
    .credentialsUrl("/etc/pulsar/client.json")
    .audience("pulsar-cluster")
    .build();

PulsarClient client = PulsarClient.builder()
    .authentication(auth)
    .serviceUrl("pulsar+ssl://pulsar.example.com:6651")
    .build();

Authentication determines who the client is. Authorization then determines what that identity can access—down to individual namespaces or topics.

End-to-End Encryption

For sensitive data such as billing or user events, encryption must persist across the entire pipeline, including brokers and storage. Pulsar supports client-side encryption without requiring changes to the broker or BookKeeper.

Producer<byte[]> producer = client.newProducer()
    .topic("persistent://finance/billing-service/transaction-events")
    .addEncryptionKey("finance-key")
    .cryptoKeyReader(
        new RawFileKeyReader("/keys/pub.key", "/keys/priv.key")
    )
    .create();

Authorized consumers decrypt transparently:

Consumer<byte[]> consumer = client.newConsumer()
    .topic("persistent://finance/billing-service/transaction-events")
    .subscriptionName("billing-processor")
    .cryptoKeyReader(
        new RawFileKeyReader("/keys/pub.key", "/keys/priv.key")
    )
    .subscribe();

This model ensures that even operators with access to the cluster cannot read sensitive payloads. For Tencent’s billing and financial systems, this level of isolation was mandatory.


4 Infinite Retention: Implementing Tiered Storage

Once compute and storage are fully decoupled, a new question naturally follows: how long should a messaging system retain data? At Tencent, retention requirements were driven by billing audits, compliance rules, historical analysis, and incident investigation. In Kafka-based systems, long retention often means large broker disks, careful capacity planning, and frequent hardware expansion. Pulsar approaches the problem differently.

Tiered storage turns Pulsar from a short-lived messaging system into a long-term event backbone. Because brokers don’t own data and BookKeeper stores data in immutable segments, Pulsar can move older data out of the hot path without changing how applications interact with topics. Retention stops being constrained by local disk capacity and becomes a policy decision instead.

4.1 The “Infinite Stream” Concept

“Infinite stream” doesn’t mean data is kept forever without cost. It means retention is no longer limited by the size of local disks or the number of brokers in the cluster. Pulsar achieves this by splitting storage into tiers, each optimized for a different phase in a message’s lifecycle.

The idea is straightforward. Recent data—data that is actively produced and consumed—stays on BookKeeper, backed by fast disks and optimized for low latency. Older data—data that is rarely read but must be retained—moves to an object store, where storage is cheap, durable, and effectively unlimited. Pulsar manages this transition automatically based on time or size thresholds.

This changes how teams plan capacity. With Kafka, operators often size clusters for worst-case retention, which leads to over-provisioning even when most data is rarely accessed. Pulsar removes this pressure. Storage grows horizontally in object storage, while the hot tier stays small and efficient.

It also changes how developers think about data access. Topics no longer have an implicit “expiration date” tied to disk capacity. Teams can safely replay months of history for audits or backfills without keeping all of it on SSD.

4.2 Tiered Storage Architecture

Tiered storage in Pulsar is implemented through an offloader that runs inside the broker. The broker continuously evaluates closed ledgers and determines whether they meet offload criteria. When they do, the broker copies ledger data directly from BookKeeper to an object store. Producers and consumers continue operating normally during this process.

Because segments are immutable once closed, offloading is safe and predictable. There is no need to lock topics, pause traffic, or rebalance data. Old segments move out of the hot tier; new segments continue writing to BookKeeper.

Pulsar supports multiple object storage backends:

  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • HDFS for on-premises environments

All backends implement the same interface. From the broker’s point of view, offloading to S3 or HDFS follows the same flow. This makes it possible to adopt tiered storage without committing to a single cloud provider.

Internally, offloaded data is stored as immutable objects that mirror Pulsar’s ledger structure. This maps cleanly onto object storage semantics and avoids expensive rewrite or compaction cycles. Because offloading happens incrementally as segments roll over, the load is spread evenly over time instead of creating spikes.

4.3 Practical Implementation Guide

Enabling tiered storage starts with broker configuration. Operators choose an offload driver and define when ledgers should move to cold storage. A typical S3 configuration looks like this:

# broker.conf
managedLedgerOffloadDriver=aws-s3

s3ManagedLedgerOffloadBucket=pulsar-archive
s3ManagedLedgerOffloadRegion=us-east-1
s3ManagedLedgerOffloadServiceEndpoint=https://s3.us-east-1.amazonaws.com

# Offload thresholds
managedLedgerOffloadThresholdInBytes=1073741824   # 1 GB
managedLedgerOffloadThresholdInMillis=86400000    # 24 hours

Both thresholds can be enabled at the same time. High-volume topics usually hit the size limit first, keeping BookKeeper storage under control. Low-volume topics may hit the time limit instead, ensuring that even quiet streams eventually move to cold storage. This combination works well in shared clusters where traffic patterns vary widely across namespaces.

A common concern is how consumers behave when data has been offloaded. From the application’s perspective, nothing changes. When a consumer requests a message, the broker checks whether the ledger is still local. If it has been offloaded, the broker fetches the data from object storage, streams it to the consumer, and optionally caches it. The same consumer and reader APIs work without modification.

This transparency is important. Developers do not need to handle multiple storage tiers or write conditional logic for historical reads. The platform absorbs that complexity.

From a cost perspective, tiered storage usually pays for itself quickly. SSD-backed BookKeeper storage is optimized for performance, but it is expensive to scale for long retention. Object storage is orders of magnitude cheaper for cold data. For workloads dominated by real-time reads, the added latency of cold reads is rarely noticeable. For backfills or audits, the trade-off is usually acceptable.

At Tencent scale, tiered storage made long retention practical without turning storage management into a continuous operational project. It completed the picture started by compute–storage separation: hot paths stay fast, cold data stays cheap, and applications don’t have to care where the bytes live.


5 Global Availability: Geo-Replication in Production

Once a messaging platform becomes central to billing, advertising, and user-facing systems, locality matters. Data must be available close to where it is produced and consumed. At the same time, outages—whether caused by network issues, power failures, or regional incidents—cannot be allowed to interrupt core business workflows. This is where many streaming systems struggle. Replication is often bolted on after the fact, leading to fragile dual-write patterns or complex operational playbooks.

Pulsar approaches geo-replication as a first-class concern. Its segment-based storage model and namespace-level configuration make cross-region replication predictable and manageable. Instead of stitching together multiple systems, operators configure replication once, and the platform handles the rest.

5.1 Replication Models

Pulsar supports two replication models, each designed for different priorities: synchronous and asynchronous replication.

Synchronous replication relies on BookKeeper’s quorum protocol across regions. A write is acknowledged only after multiple regions confirm durability. This guarantees zero data loss, even in the face of regional failures. The trade-off is latency. Cross-region round-trip time becomes part of the write path, which makes this approach suitable only for workloads where correctness outweighs latency—typically financial settlement or ledger-style systems.

Most large-scale systems prefer asynchronous replication. In this model, producers write locally and return immediately. Dedicated replicators running at the broker layer stream data to remote clusters in the background. Producer latency stays low and predictable, while remote regions eventually receive the same data. This model fits the majority of real-time workloads, including ads, user activity streams, and analytics pipelines.

Asynchronous replication in Pulsar operates at the segment level. Replicators transfer completed segments efficiently, rather than pushing individual messages one by one. This keeps overhead low even for very high-volume topics. Because replication is configured at the namespace level, operators can choose exactly which applications require global replication and which do not.

An important operational benefit is resilience during outages. If a remote region becomes unavailable, producers continue writing locally. Replicators pause and resume automatically once connectivity is restored. There is no need for manual intervention or failover scripts to restart replication.

5.2 Active-Active Replication Patterns

In global systems, it is rarely enough to have a single “primary” region. Many applications need to both produce and consume data in multiple regions at the same time. Pulsar supports this through active-active replication.

In an active-active setup, every region can accept writes, and all regions receive replicated data. A common pattern is full-mesh replication across three or more regions, such as US-East, EU-West, and APAC. Each region runs independently but stays logically in sync with the others. This works well for global consumer platforms, where user activity needs to be visible everywhere with minimal delay.

Replication is configured declaratively. First, clusters are associated with a tenant. Then replication is enabled at the namespace level:

# Update tenant with allowed clusters
admin.tenants().updateTenant(
    "finance",
    TenantInfo.builder()
        .allowedClusters(Set.of("us-east", "eu-west", "apac"))
        .roles(Set.of("finance-admin"))
        .build()
)

# Enable geo-replication for a namespace
admin.namespaces().setNamespaceReplicationClusters(
    "finance/billing-service",
    Set.of("us-east", "eu-west", "apac")
)

Once enabled, messages produced in any region automatically replicate to the others. Pulsar embeds origin metadata in each message so replicators can detect where a message came from. This prevents replication loops, where messages bounce endlessly between regions—a common failure mode in custom replication setups.

Active-active replication does introduce a trade-off around ordering. Each region maintains ordering for messages produced locally, but there is no global total order across regions. For most applications, this is acceptable. Consumers typically care about ordering per key or per region, not across continents. When strict global ordering is required, synchronous replication or transactional pipelines are better suited.

5.3 Failover Scenarios and Disaster Recovery

Replication alone does not guarantee smooth failover. Applications must reconnect cleanly, consumers must resume from the correct position, and producers must continue writing without complex logic. Pulsar addresses this primarily through client behavior and replicated metadata.

The Pulsar client supports multiple service URLs, allowing applications to fail over automatically if a region becomes unavailable:

client = PulsarClient.builder() \
    .serviceUrl("pulsar://us-east.example.com:6650") \
    .alternatives([
        "pulsar://eu-west.example.com:6650",
        "pulsar://apac.example.com:6650"
    ]) \
    .build()

When the primary endpoint fails, the client retries and connects to the next available region. Connection management, retries, and backoff are handled internally. Application code does not need to special-case regional failures.

For consumers, replicated subscriptions are the key to clean recovery. Without them, each region tracks its own cursor position, making it difficult to resume processing elsewhere without duplicates or gaps. With replicated subscriptions, cursor updates are replicated along with data.

admin.namespaces().setReplicationClusters(
    "finance/billing-service",
    Set.of("us-east", "eu-west")
)

admin.namespaces().createSubscription(
    "finance/billing-service/transaction-events",
    "billing-subscription"
)

With this configuration, consumer progress stays synchronized across regions. If processing stops in one region and resumes in another, consumption continues from the correct position. This is critical for billing and financial pipelines, where reprocessing or skipping messages can have real business impact.

In practice, regional failovers are rarely symmetrical. One region may lag behind due to slower consumers or temporary network issues. Pulsar handles this by keeping dispatch queues independent per region. Faster regions are not slowed down by slower ones, and replication catches up when possible.

At Tencent scale, disaster recovery drills involve replaying massive backlogs and validating recovery times. Pulsar’s predictable replication and cursor management make these exercises repeatable and far less risky than custom multi-region solutions. Geo-replication becomes part of normal operations rather than a special-case feature reserved for emergencies.


6 Advanced Messaging: Ordering, Schemas, and Patterns

Once a messaging platform becomes a shared backbone rather than a single application queue, basic publish-and-consume is no longer enough. Teams need clear guarantees around message structure, ordering, and consistency. Without those guarantees, failures tend to show up late—often as corrupted downstream state or subtle data mismatches that are difficult to debug.

Pulsar’s earlier architectural choices—segmented storage, stateless brokers, and namespace isolation—make it possible to offer stronger messaging semantics without pushing complexity into application code. This section focuses on how developers use those capabilities in practice and why they matter in large, multi-team environments like Tencent’s.

6.1 The Schema Registry Advantage

Schema drift is one of the most common causes of production issues in event-driven systems. A team adds a field, another removes one, and a third changes a type. Everything compiles, messages still flow, but consumers quietly break. By the time someone notices, the data is already inconsistent.

Pulsar avoids this class of problems by making schemas part of the topic itself. Every message carries schema information, and producers and consumers are validated against it automatically. This means incompatible changes are rejected early—at publish time—rather than surfacing as runtime failures downstream.

Most enterprises standardize on Avro or Protobuf because they provide clear evolution rules. JSON and primitive schemas still exist for lightweight use cases, but structured schemas become essential as the number of producers grows. Pulsar stores a schema fingerprint with the topic and checks compatibility whenever a producer attempts to publish with a new schema version.

Schema evolution then becomes predictable. Adding optional fields works as expected. Removing required fields or changing types fails fast. Pulsar enforces these rules consistently so application code does not need to.

A typical Java example looks like this:

import org.apache.pulsar.client.api.Schema;

public class TransactionEvent {
    private String transactionId;
    private long amount;
    private String currency;

    public TransactionEvent() {}

    public TransactionEvent(String transactionId, long amount, String currency) {
        this.transactionId = transactionId;
        this.amount = amount;
        this.currency = currency;
    }

    // getters and setters omitted
}

// Producer with Avro schema
Producer<TransactionEvent> producer =
    client.newProducer(Schema.AVRO(TransactionEvent.class))
          .topic("persistent://finance/billing-service/transaction-events")
          .create();

Once the schema is registered, all producers and consumers interact through that contract. Teams can evolve data models independently, confident that incompatible changes will be caught immediately rather than after deployment.

6.2 Subscription Modes and Ordering Guarantees

Ordering requirements vary widely across applications. Some systems require strict, single-threaded processing. Others prioritize throughput and parallelism. Pulsar exposes these trade-offs explicitly through subscription types instead of forcing teams to redesign topics or partition layouts.

Exclusive and Failover subscriptions provide strict ordering by ensuring that only one consumer processes messages at any given time. Exclusive enforces a single active consumer, while Failover allows a standby to take over if the primary fails. These modes fit stateful processors, billing pipelines, or workflows where correctness matters more than throughput.

Shared subscriptions distribute messages across consumers in round-robin fashion. This maximizes throughput but offers no ordering guarantees. It works well for workloads like batch processing, enrichment, or stateless transformations where events are independent. Teams often combine Shared subscriptions with batching to improve efficiency.

Key-Shared subscriptions sit in between. They preserve ordering per key while still allowing horizontal scaling. Messages with the same key always go to the same consumer, but different keys are distributed across the group. This pattern works well for user-centric workloads, session processing, or sharded business logic.

A simple example shows how a consumer subscribes in Key-Shared mode:

Consumer<TransactionEvent> consumer =
    client.newConsumer(Schema.AVRO(TransactionEvent.class))
          .topic("persistent://finance/billing-service/transaction-events")
          .subscriptionName("billing-keyshared")
          .subscriptionType(SubscriptionType.Key_Shared)
          .subscribe();

Key-Shared avoids the common Kafka trade-off where teams must choose between ordering and scalability. Pulsar’s routing logic keeps keys balanced as consumers join or leave, preventing hotspots and uneven load.

6.3 Achieving Deduplication and Transactional Messaging

In distributed systems, retries are unavoidable. Network timeouts, broker failovers, and client restarts all lead to duplicate sends unless the platform accounts for them. Without support from the messaging layer, applications end up implementing custom idempotency logic, which is error-prone and difficult to maintain.

Pulsar supports producer-side deduplication by tracking sequence identifiers. When enabled, brokers detect and discard duplicate messages automatically. This is especially useful in systems with intermittent connectivity or aggressive retry policies. Developers can rely on the platform instead of reinventing deduplication in every service.

Transactional messaging addresses a related but more complex problem: atomicity across multiple writes. In financial systems, it is common to publish multiple related events—such as debit, credit, and audit records—that must either all appear or not appear at all. Pulsar supports transactions at the ledger layer, allowing multiple messages across topics to commit as a single unit.

A simplified example illustrates the flow:

Transaction txn = client.newTransaction()
    .withTransactionTimeout(5, TimeUnit.MINUTES)
    .build()
    .get();

// Publish messages as part of the transaction
producerA.newMessage(txn).value(eventA).sendAsync();
producerB.newMessage(txn).value(eventB).sendAsync();

// Commit atomically
txn.commit().get();

If any part of the transaction fails, the entire transaction is aborted. Consumers never see partial results. This significantly simplifies downstream processing logic and error handling.

Deduplication can also be enabled at the namespace level so that all producers benefit automatically:

admin.namespaces().setDeduplicationStatus(
    "finance/billing-service",
    true
);

Together, deduplication and transactions provide a practical definition of “effectively once” semantics. They reduce coordination between services, eliminate whole classes of failure scenarios, and allow teams to reason about data flow with confidence.


7 Operational Strategy: Migration and Observability

Once the architecture and feature set are in place, the remaining question is operational: how do you move to Pulsar without disrupting production, and how do you keep the system observable once it becomes shared infrastructure? At Tencent scale, migrations are rarely “big bang” events. Systems evolve gradually, and new platforms must coexist with old ones for a period of time. Pulsar’s design makes this kind of transition practical.

Observability also becomes non-negotiable. With many tenants, thousands of topics, and multiple regions, problems need to be visible early and diagnosed quickly. Pulsar provides the raw signals needed to operate at this level, but the strategy around how those signals are used matters just as much.

7.1 Migrating from Kafka: The “KoP” Bridge

Rewriting every Kafka application at once is usually unrealistic. Many services are stable, business-critical, or owned by teams that cannot change clients on short notice. Kafka-on-Pulsar (KoP) addresses this by implementing the Kafka protocol directly on Pulsar brokers.

With KoP enabled, existing Kafka producers and consumers continue to work without code changes. The Kafka API calls are translated at the broker layer into Pulsar operations. Data is stored using Pulsar’s segmented, replicated storage model, even though clients still think they are talking to Kafka.

This approach lets organizations adopt Pulsar underneath their workloads first. Teams immediately benefit from decoupled storage, tiered retention, and multi-tenancy without waiting for application rewrites. Over time, services that need advanced features—such as Key-Shared subscriptions, schema enforcement, or transactions—can migrate to native Pulsar clients on their own schedule.

Some organizations choose a dual-write approach during migration, writing to both Kafka and Pulsar in parallel. This provides a safety net and allows for controlled cutover. Others rely entirely on KoP to avoid maintaining two pipelines. The choice depends on risk tolerance and operational maturity. In practice, KoP often becomes the fastest path to production adoption because it minimizes change while delivering immediate architectural benefits.

7.2 Tuning the JVM for High Throughput

As discussed earlier, Pulsar brokers and bookies serve very different roles, and their JVM configurations should reflect that. Treating them as identical services leads to unstable performance and difficult-to-debug latency issues.

Brokers sit on the request path. They manage connections, dispatch messages, enforce quotas, and coordinate replication. For brokers, predictable latency matters more than raw throughput. Short GC pauses and stable heap behavior are the priority. G1GC with aggressive pause targets is a common choice.

Bookies, in contrast, are optimized for sustained write throughput. They handle append-heavy workloads and large volumes of replication traffic. Slightly longer GC pauses are acceptable if they allow higher overall throughput. Bookies also rely heavily on direct memory because Netty buffers are allocated off-heap.

Direct memory sizing is particularly important. Too little direct memory leads to allocation pressure and throttling. Too much starves the heap and hurts GC behavior. Operators typically size direct memory based on expected write rates and average entry size.

A typical broker configuration might look like this:

# Broker JVM options
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50
-XX:InitiatingHeapOccupancyPercent=30
-XX:MaxDirectMemorySize=4g
-XX:+AlwaysPreTouch

And a bookie configuration:

# Bookie JVM options
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:MaxDirectMemorySize=8g
-XX:+AlwaysPreTouch
-XX:+PerfDisableSharedMem

These differences reflect operational reality. Brokers need to stay responsive under fluctuating load, while bookies need to absorb steady write pressure. Tuning them independently leads to more predictable cluster behavior.

7.3 Observability Stack

Once Pulsar becomes a shared platform, observability shifts from “nice to have” to “required for survival.” Operators need to understand not only whether the cluster is healthy, but also how individual tenants and namespaces are behaving.

Pulsar exposes a rich set of metrics that integrate cleanly with Prometheus. These metrics cover the broker layer, BookKeeper storage, and replication paths. A few metrics tend to matter early and often:

  • backlog_size shows whether consumers are keeping up or falling behind
  • storage_write_latency reveals pressure on bookies and disks
  • entry_size helps detect inefficient batching or unexpected payload growth

Prometheus scrapes metrics from brokers, bookies, and ZooKeeper endpoints. Dashboards typically include per-topic throughput, consumer lag by subscription, ledger creation rates, and geo-replication delays. When a problem occurs, these signals help narrow the issue quickly—whether it is a slow consumer, a storage bottleneck, or uneven load across brokers.

Pulsar Manager complements metrics with a visual control plane. It allows operators to browse tenants, namespaces, topics, and subscriptions, inspect backlogs, and verify offload or replication status. While large organizations rely heavily on automation for day-to-day operations, Pulsar Manager remains valuable during incident response and root-cause analysis.

Together, KoP-based migration, role-aware JVM tuning, and first-class observability reduce the operational risk of running Pulsar at scale. They turn what could be a complex migration into a controlled evolution, allowing teams to adopt Pulsar incrementally while maintaining confidence in production stability.


8 Conclusion: The Future of Unified Messaging

As streaming systems grow from individual pipelines into shared platforms, the expectations placed on messaging infrastructure change. It is no longer enough to move messages quickly. The system must scale across teams, regions, and years of retained data while remaining predictable to operate. Tencent reached the point where those requirements could no longer be met by extending a partition-centric log architecture.

Pulsar’s design addresses these pressures directly. By breaking large logs into small segments, separating compute from storage, and treating multi-tenancy as a first-class concern, Pulsar turns messaging into durable infrastructure rather than a fragile coordination layer. What looks like a collection of features—tiered storage, geo-replication, schema enforcement, transactions—works together because the underlying architecture supports them naturally.

8.1 Beyond Messaging: Pulsar Functions and SQL

Once messaging becomes reliable and long-lived, teams start asking for computation closer to the data. Pulsar Functions fill this gap by allowing lightweight processing to run inside the platform itself. Common use cases include validation, enrichment, routing, and simple aggregations. These functions execute as part of the messaging flow, eliminating the need to deploy and manage a separate stream-processing cluster for small tasks.

Developers write functions in Java or Python and deploy them using the same administrative tools they already use for topics and namespaces. Scaling is handled by the platform, and failure handling follows the same patterns as the rest of Pulsar. This encourages small, focused logic that complements—not replaces—larger processing frameworks.

Pulsar SQL extends the platform in a different direction. By integrating with Trino, Pulsar allows teams to query live and historical streams using familiar SQL. Analysts can inspect message schemas, explore retained data, and run ad-hoc queries without waiting for data to be exported into a warehouse. Because SQL access respects Pulsar’s security and retention policies, operators maintain control while expanding access.

Together, Functions and SQL turn Pulsar into more than a transport layer. They connect operational streams with analytical workflows and reduce the need to duplicate data across multiple systems.

8.2 Summary of the Tencent Advantage

Tencent’s experience shows what happens when streaming systems cross a certain scale threshold. At that point, operational complexity—not raw throughput—becomes the limiting factor. Partition rebalancing, broker coupling, storage growth, and cross-region coordination start consuming more effort than application development itself.

Pulsar’s advantage comes from addressing these problems at the architectural level. Segment-based storage removes the need for disruptive rebalancing. Compute–storage separation lets teams scale capacity where it is actually needed. Native multi-tenancy makes shared platforms safe and manageable. Tiered storage and geo-replication extend these benefits across time and geography without forcing new operational models.

For architects deciding between Kafka and Pulsar, the question is less about features and more about long-term fit. If workloads are small and isolated, Kafka may remain sufficient. But for organizations expecting rapid growth, long retention, global access, and many independent teams sharing the same platform, Pulsar offers a clearer path forward. It replaces a collection of workarounds with a system designed to scale cleanly, predictably, and sustainably.

In that sense, Pulsar is not just an alternative to Kafka. It represents the next step in how large organizations think about messaging—as durable, shared infrastructure rather than a series of tightly coupled logs.

Advertisement