Example: 9 nodes across us-east-1a/b/c. Set T=1 so RF_min=3. RepliQ enforces per-AZ caps for replicas, so a full-AZ outage still leaves quorum. After coverage is healthy, it balances leaders.

Operate separate RabbitMQ clusters per site for locality. Link with federation or shovels. Run RepliQ inside each cluster to keep placement AZ-resilient locally.

Treat racks or rooms as AZ labels (r1, r2, r3). RepliQ spreads replicas across those labels and chooses the least-crowded node within each AZ.

RepliQ listens for changes and queues up work. On a short timer it works through batches: decide the right set of members, add, wait to sync, then remove extras. Default target is three copies, one per zone. Only after that looks good does it move leaders. If the broker says “not permitted,” the item goes back on the list and tries again. Zone names come from each node’s env; you set it once in thier env and forget it.

Processing pipeline

Creating Two Quorum Queues

Two queues landed unevenly. RepliQ first adds the missing copy in the empty zone. You’ll briefly see four copies while it syncs.

Sync completes, then the extra copy is trimmed. We’re back to three copies, one per zone—no risk window.

Coverage is correct, so leader moves are enabled. One leader starts moving off the busy zone.

Move finishes. Leaders per zone are even. Copies didn’t change; clients keep flowing.

What to watch

Use two simple charts: replicas per AZ and leaders per AZ. After changes, both converge quickly to flat lines. Keep label cardinality low.

Scope

Quorum queues only. Defaults target RF=3 and one‑per‑AZ coverage; you can adjust tolerated‑AZ‑failures if your policy requires it. Unlimited AZ count by design.

Thomas Bhatia - RabbitMQ Consultant, Seventh State

Discover more from SeventhState.io

Subscribe now to keep reading and get access to the full archive.

Continue reading