RabbitMQ Mirrored Queues Gotchas

RabbitMQ Classic Mirrored Queues (also known as HA queues) are a feature of RabbitMQ that allows the replication of queue contents across multiple nodes within a RabbitMQ cluster. This replication ensures that if the node hosting the leader queue fails, one of the followers can take over, providing high availability.

Although we now have a better solution with Quorum Queues, and RabbitMQ Classic Mirrored Queues are deprecated, many systems still have them in use for various reasons. Thus, we would like to explore some unexpected behaviours of HA queues to help us better understand their usage and management.

In order to maintain a clear flow, we have placed the detailed RabbitMQ cluster setup in a separate section at the end of this blog post. If you are interested in testing the scenarios presented, feel free to check it out.

In brief, we have a cluster of 2 RabbitMQ 3.12.0 nodes running on Docker with the following specifications:

  • rmq1:
    AMQP: amqp://localhost:5672
    Management UI: http://localhost:15672
  • rmq2:
    AMQP: amqp://localhost:5673
    Management UI: http://localhost:15673

The default username and password to login to the Management UI is guest/guest by default.

We will also use RabbitMQ PerfTest 2.21.0, which is a testing tool for RabbitMQ.

HA queues are enabled by setting up a policy. This can be done via Management UI or a command line as follows:

The command sets

  • A policy named "ha-all" on all queues and exchanges (the ".*" is a regular expression that matches all names).
  • The parameters of the policy: In this case, it sets the "ha-mode" to all, which means that all nodes in the cluster will keep a copy of the messages. The "ha-sync-mode" is set to automatic, which means that the leader will automatically synchronise data to all followers. The "ha-sync-batch-size" is set to 2, which means that synchronisation will happen in batches of 2 messages.

With docker:

If we take a closer look at our policy, it specifies the queue synchronisation as automatic:

"ha-sync-mode": "automatic"

What automatic synchronisation does is when a new follower is created, it will automatically synchronise messages from its leaders. Sounds convenient!

But watch out! Queue synchronisation is a blocking process, meaning all the queue operations are temporarily stopped. In simpler terms, messages cannot be published (“routed”, to be exact) to and consumed from that queue. The queue looks like “freezing” until the synchronisation finishes.

Let’s have a look at an example in which we publish 1M messages to an HA queue, then attach a slow consumer to it and restart one node.

We are going to have 10 producers each sending 100,000 messages results in a total of 1,000,000 messages sent. First, publish messages by:

Check the queue is filled up at: http://localhost:15672/#/queues

When it’s ready, we can start the slow consumer which will process 10 msgs/s with a prefetch count of 10:

Then restart the rmq2 by:

Now, let’s observe the consumer log. We can see that there was an interruption in consuming messages around 10s.

Take a look at the RabbitMQ log by:

We can see the following:

RabbitMQ started synchronising the newly “restarted” follower from 16:16:08 to 16:16:24 which was nearly 16 seconds. This was around the time that the queue looked like it was freezing from the consumer’s perspective.

This is a simple demonstration to help us understand the automatic synchronisation mechanism and how it would affect the queue operation.

In production, of course, we should not set the "ha-sync-batch-size" that low.

This setting determines the number of messages to be synchronised at a time, and it is by default set to 4096. If we didn’t adjust this value, the queue synchronisation process could not take that long. However, we need to consider that in a live system, there could be hundreds, thousands, or even millions of mirrored queues with a large number of messages. This could lead to high traffic and prolonged synchronisation, causing queues to remain blocked for extended periods.

Automatic synchronisation is still our recommended setting for the HA queues. Queue blocking during the synchronisation is also should be aware of so that we could have active control over our system and consumer development. A key to a happy Rabbit is to keep queues short.

Quorum queues, on the other hand, can synchronise only the changes in the queue state, called the delta, across the nodes in the RabbitMQ cluster. This synchronisation happens asynchronously, improving the efficiency and reliability of data replication without impacting queue availability.

In RabbitMQ, the auto-delete property is a feature that allows queues to be automatically removed when their last consumer disconnects.

Mirroring an auto-delete queue can lead to an unexpected behaviour that actually breaks the auto-delete feature.

First, let’s remove our old queue by:

Now we start a new producer which connects to rmq2 and publishes messages to an auto-delete queue by:

Check the queue is running at: http://localhost:15672/#/queues

Now, we start a consumer which connects to rmq2 and consumes messages from the above queue.

Then, it’s time to have some network disruption.

Take a look at the UI of both RabbitMQ nodes; we can see the following.

On rmq1, http://localhost:15672 shows that rmq2 is disconnected. No queue is left.

On rmq2, http://localhost:15673 shows that rmq1 is disconnected. Our queue is still running.

The reason is that when the network disruption occurred, from the perspective of rmq1, the consumer disconnected from the queue and the queue was deleted. However, from the perspective of rmq2, the consumer was still there and the queue remained active.

But remember, the queue leader was initially located on rmq1, so the queue remaining on rmq2 is actually a follower being promoted. From the log of rmq2, we can notice that.

The consequence of this is that messages which could not be replicated before the network disconnection occurred were lost (unless publisher confirm was in use).

If we switch our producer to rmq2, we can see the message flow back to normal.

In general, we do not recommend mirroring non-durable/exclusive/auto-delete queues due to the nature of HA queues. HA queues aim to provide high availability while these queue types (non-durable/exclusive/auto-delete) usually aim to handle temporary tasks.

In RabbitMQ, quorum queues are designed with a focus on data safety and therefore, they do not support certain features such as non-durable, exclusive, or auto-delete queues. This is a significant departure from classic mirrored queues, and it’s done to ensure the safety of all messages. All quorum queues are durable, meaning messages survive broker restarts. Exclusive and auto-delete queues are only supported for non-mirrored classic queues. This design choice enhances reliability in RabbitMQ systems.

There is a feature which allows consumers to be notified when a queue leader fails over. This can be enabled by setting the x-cancel-on-ha-failover value as true.

Consumers will then get an AMQP "basic.cancel" AMQP command from RabbitMQ to be notified that the leader fails. As a result, the queue will not be deleted, and consumption can resume when the new leader is promoted.

Before running the following test, let’s restart the cluster with:

As usual, start a producer on rmq1:

Next, connect our new consumer to rmq2:

When the messages start flowing, disconnect the rmq1 as we did in the previous example:

And move our producer to rmq2:


Observing our consumer log, we can see that the consumer stopped for a few seconds. Then, it received a cancel command from the broker. After that, it consumed the messages again from the “switched” producer.


In simple terms, the "x-cancel-on-ha-failover" feature provides additional information about a failover occurrence, allowing for appropriate actions to be taken, such as automatic subscription to the new leader.

The “x-cancel-on-ha-failover” is specific to classic mirrored queues. Quorum queues do not support this feature.

Publisher confirms are a feature of RabbitMQ designed to ensure reliable publishing. When publisher confirms are enabled on a channel, messages that the client publishes are confirmed asynchronously by the broker. This means that the broker has accepted the messages on the server side. In a cluster (HA), it confirms that the message is accepted by all the queue followers.

But it comes with a trade-off. If RabbitMQ is under high load or slow network, it will take more time to expect a publisher confirm from RabbitMQ.

The following test can help us measure the publisher confirm latency.

Let’s restart the cluster with:

Now start a producer with publisher confirm enabled for every message:

Start a consumer so that messages do not overflow RabbitMQ:

Since the default net tick timeout is 60s, we can create a network disruption under this value and observe the publisher confirm latency.

From the producer’s output, we can see that there was a delay of around >10s when the producer was waiting for the confirm.

If we increase the network disconnection to ~30s, the producer even stopped since it was waiting for the confirm for too long.

Still, we highly recommend publisher confirms for message reliability. The confirmation window does not need to be one, it can vary depending on the producer’s ability. It needs thorough tests to find out the most optimised publisher confirm window for your system.

When using quorum queues in a three-node setup, the system can tolerate network issues affecting a single node without causing any noticeable delay from the publisher’s perspective. This is because quorum queues use a distributed consensus algorithm to ensure that a majority of nodes (in this case, two out of three) agree on the state of the queue before confirming the receipt of a message to the publisher.

So, if one node experiences a network problem, the other two nodes can still reach a consensus and continue processing messages as usual. This results in no visible delay for the publisher, as their messages are still being accepted and processed by the remaining nodes. This makes RabbitMQ’s three-node setup with quorum queues a highly resilient choice for systems where maintaining a steady flow of messages is critical, even in the face of network issues.

While RabbitMQ Classic Mirrored Queues provide high availability, they have limitations and are now deprecated in favour of Quorum Queues. Understanding their behaviour and proper configuration can help manage existing systems effectively. Transitioning to Quorum Queues should be the long-term goal for better performance and reliability.

If using mirrored queues is still necessary for your system, we recommend implementing the following settings for mirrored queues:

  1. Use pause_minority cluster partition setting with an odd number of nodes in the cluster.
  2. Only cluster durable, non-exclusive, non-auto-delete queues.
  3. Mirror to all nodes.
  4. Use automatic synchronisation to minimise the chance of ending up with unsynchronised followers.
  5. Set the HA promotion to "always", promote a follower even if there are no synchronised followers.
  6. Prepare to handle lost messages in the application logic.

We recommended the following mirroring policy for the above:

Quorum queues are now the recommended solution for high availability. They are resilient to network partitions and use the Raft protocol for leader election and message distribution, ensuring better performance and consistency.

Here are some key differences between quorum queues and mirrored queues:

Reliability: Quorum queues are more reliable and predictable. New followers will be replicated to asynchronously in the background, causing no unavailability of the queue.

Maintenance: Quorum queues require less maintenance. They are designed to be safer and provide simpler, well-defined failure-handling semantics.

Limitations: Quorum queues have some limitations and differences in behaviour compared to mirrored queues. For instance, quorum queues do not support exclusive queues and message priority, and they handle network partitions differently.

Mirrored queues were deprecated in RabbitMQ version 3.9, with a formal announcement posted on August 21, 2021. They will be removed entirely in version 4.0.

RabbitMQ Deprecation Announcements for 4.0 | RabbitMQ

Quorum queues are a superior replacement for mirrored queues. They are safer, achieve higher throughput, and are more reliable and predictable. However, they are not 100%-compatible feature-wise with classic mirrored queues, but close.

For more in-depth information on quorum queues, please visit

RabbitMQ Quorum Queues Explained – what you need to know (seventhstate.io)

Below is the Docker compose file for the RabbitMQ cluster setup used in this blog.

For more in-depth information on quorum queues, please visit

To start the cluster, use:

To stop the cluster, use:

To restart the cluster, use:

Lia Anh Nguyen



Migrating from mirrored queues to quorum queues can seem like a daunting task. If you’d like some guidance talk to us about our RabbitMQ consultancy services.


Discover more from SeventhState.io

Subscribe now to keep reading and get access to the full archive.

Continue reading