RabbitMQ Monitoring Tools and the Key Metrics to Monitor

Peak performance and reliability depend on both a well-configured system and continuous RabbitMQ monitoring. RabbitMQ, a message broker connecting services, is a crucial component of any system. As such, it requires special monitoring to ensure any issues in the communication chain are noticed and resolved before they cause a complete outage.

In this post, we describe the key metrics to measure, how RabbitMQ supports the Prometheus and Grafana infrastructure, how our dashboards go beyond default monitoring and how you can utilise the detailed visibility to enhance your monitoring setup.

The Two Important RabbitMQ Monitoring Tools

RabbitMQ 4.0 removes performance metrics from the RabbitMQ Management UI and HTTP API, making a Prometheus-based setup the standard approach to observability. Metrics are exposed through the RabbitMQ Prometheus plugin, and Prometheus scrapes and stores these metrics for monitoring and alerting purposes.

Grafana complements this by visualising the metrics collected by Prometheus. It does not monitor RabbitMQ directly, but provides the dashboards and queries that operators use to understand cluster health, throughput, resource utilisation and workload imbalance.

Together, the two tools form the de facto monitoring stack for RabbitMQ in production environments.

The “Official” Dashboards to Monitor RabbitMQ

The official dashboards provided by the RabbitMQ developers are available from GitHub and the Grafana marketplace. They give a global view of your RabbitMQ cluster. The main one is the “RabbitMQ Overview”, which includes details of the main cluster runtime metrics:

Cluster health, including memory and disk space metrics
Publisher and consumer numbers and throughput
Aggregated message throughput metrics
Aggregated message counts
Connection, channel and queue churn statistics

There are several others for insights into the BEAM (Erlang) Virtual Machine’s memory allocators, the Raft subsystem metrics, the Erlang cluster distribution metrics and for RabbitMQ Stream-related statistics.

These are an important part of any RabbitMQ monitoring solution and will indicate when a cluster is healthy or when RabbitMQ experiences problems.

So what’s missing?

Although they lead you to the issues, the official dashboards lack the details to pinpoint the components (connections, channels or queues) that are responsible for or experiencing issues.

Luckily, the information needed is provided by RabbitMQ itself, and our additional dashboards fill the crucial, but missing, visualisation.

But first, what do you need to see?

Key RabbitMQ Metrics to Monitor

RabbitMQ performance problems usually surface in one of four areas: queues, connections, channels or the underlying cluster resources. Effective monitoring requires focusing on the metrics that reflect message flow, consumer health and broker capacity, rather than relying solely on high-level status indicators.

Queue Metrics

Queues are where messages accumulate, so they must be monitored continuously, even when the system appears idle.

Critical metrics include:

Total message count and unacknowledged messages: Growth indicates slow or stalled consumers. A backlog on a single queue can affect the entire RabbitMQ cluster.

Publish, deliver and acknowledgement rates: Tracking these per second reveals bottlenecks in throughput across the AMQP pipeline.

Consumer count: Too few consumers lead to queue growth; too many increase scheduling overhead.

Queue metrics are particularly important on high-traffic systems such as those observed using Datadog, as queue latency and backlog are often early indicators of downstream application failures.

Connection Metrics

Poorly behaving applications frequently manifest through unstable or noisy connections rather than queue behaviour.

Monitor:

Connection churn: Repeated open/close events usually mean buggy clients or aggressive timeouts.

Incoming/outgoing data rates: Show whether a connection is saturating network or CPU resources on a specific RabbitMQ node.

Connection termination point: Helps identify which node or RabbitMQ instance is acting as a hotspot.

These metrics correlate with logs generated by both the management plugin and the command line.

Channel and Exchange Metrics

Channels exist within connections and directly handle message publishing and consumption. For high-throughput systems, channel behaviour is often more important than connection counts.

Look for:

Unconfirmed and unacknowledged messages: Reveal publisher backpressure or slow consumers.

Consumer prefetch and global prefetch: Misconfiguration can exhaust RAM and trigger memory alarms.

Uncommitted messages or acknowledgements: Indicate stalled transactions.

Routing metrics on an exchange: Useful when diagnosing uneven message distribution.

Cluster and Server Resource Metrics

At the node level, RabbitMQ protects itself aggressively. Monitoring system resource exhaustion is essential, as RabbitMQ will throttle or block publishers to prevent data loss.

Important metrics:

Memory, disk and file descriptor usage: Exhaustion of file descriptors prevents new connections, channels or queues from being created.

Disk and memory alarms: A raised alert means publishers are blocked.

Queue, connection and channel counts: excessive object counts increase scheduling overhead across the RabbitMQ server.

When these metrics are exposed through Prometheus, you must ensure the correct metric families are enabled in the configuration file to maintain long-term observability. The dashboards below visualise and correlate them across clusters, connections, channels and queues.

RabbitMQ Monitoring Dashboards by Seventh State

We created five additional Grafana dashboards to provide a detailed view of what is happening in your RabbitMQ clusters. This helps pinpoint the source of issues and performance degradation while increasing the visibility of connection, channel and queue behaviour.

Clusters

The Clusters Dashboard provides a view of all RabbitMQ clusters within your environment. It exposes metrics that offer a high-level overview, critical for managing multiple clusters effectively.

The main metrics shown are the following for each cluster:

Total message count: The number of messages in the queues including queued and inflight messages.
Unacknowledged message count: Messages consumed but not yet acknowledged by consumers.
Queue count: Total number of queues in the cluster.
Channel count: Total number of channels in the cluster.
Connection count: Total number of connections in the cluster.

Memory, disk and file handle internal RabbitMQ alarms: The column displays a warning if there are any active internal alarms in the cluster.

Connections Overview

This dashboard focuses on RabbitMQ connections, presenting key metrics like connection throughput.

The main metrics shown are for each connection:

Node: The cluster member at which the connection is terminated
Connection Pid: The identifier of the connection. This identifier is present in the logs as well.
Incoming and outgoing data rates: The number of bytes sent or received by the connection in each second.

Channels Overview

The Channels Overview Dashboard offers insights into RabbitMQ channels, including global counts, message rates per channel, and channel utilisation. Channels are crucial for message throughput, making this view invaluable for tuning and capacity planning.

The main metrics shown are for each channel:

Consumers: The number of queue subscriptions created on the channel.
Unacknowledged message count: Messages in flight, not yet acknowledged by the consumer application.
Unconfirmed message count: Messaging in flight from the publishing application, not yet acknowledged by RabbitMQ.
Consumer prefetch: Maximum number of messages in flight for each consumer subscription.
Global prefetch: Maximum number of messages in flight for the channel.
Uncommitted messages: Messages that have been published within a transaction but not yet committed.
Uncommitted acks: Messages that have been acknowledged within a transaction but not yet committed by the consumer.

Queues Overview

This dashboard gives a detailed view of RabbitMQ queues, displaying real-time metrics like message backlog, publish and delivery rates. It’s designed to provide an overview of queue health and ensure messages are being processed.

The main metrics shown are for each queue:

Number of messages in the queue: Total number of messages in the queue, including unacknowledged messages. For best performance, queues should be empty.
Number of consumers: The total number of consumers subscribed to the queue.
Incoming and outgoing message and acknowledgement rates: Rate of new messages inserted into and removed from the queue.
Polling rates: Rate of client applications polling the queue for messages.

Queues Details

This dashboard gives a detailed view of a single RabbitMQ queue, displaying historical and real-time metrics like message backlog, publish and delivery rates. It’s designed to provide an overview of queue health and ensure messages are being processed.

The main historical metrics shown are for a queue:

Incoming message rate: Rate of new messages inserted into the queue.
Acknowledgment rate: The frequency at which acknowledgments from consumers are received, allowing messages to be removed from the queue.
Unacknowledged messages: The count of messages delivered to consumers that are pending acknowledgment before they can be removed from the queue.
Publisher count: Number of channels publishing to this queue.
Consumer count: The total number of consumers subscribed to the queue.
Queue length: Total number of messages in the queue, including unacknowledged messages. For best performance, queues should be empty.
Message polling rates: Rate of client applications polling the queue for messages. The metrics shown are the following:
- Queue Polling with Acknowledgment: In this mode, the queue requires an acknowledgment from the client to confirm message processing before removing the message from the queue.
- Queue Polling without Acknowledgment: Messages are immediately removed from the queue once served, with no need for processing acknowledgment from the client.
- Polling on Empty Queue: This metric tracks the count of poll events when the queue is empty, indicating unnecessary polling activity.

Configuring RabbitMQ

To have these metrics available, you must use a recent RabbitMQ version, at least 3.12. If you are using an older version, some metrics may not show up.

The Prometheus plugin must be enabled by running the following command on all nodes:

rabbitmq-plugins enable rabbitmq_prometheus

You can verify that the plugin is enabled by opening the RabbitMQ host on port 15692, for example http://localhost:15692/metrics.

Configuring Prometheus

These dashboards rely on the built in RabbitMQ Prometheus Plugin’s detailed interface. You can find more information on the detailed endpoint in the RabbitMQ Documentation.

The dashboards rely on the following metric families:

queue_coarse_metrics
queue_consumer_count
channel_queue_metrics
channel_queue_exchange_metrics
channel_metrics
connection_metrics
connection_coarse_metrics
channel_exchange_metrics

You must collect these in Prometheus to have all metrics visible on the dashboards. An example Prometheus configuration is below:

- job_name: 'rabbitmq-details-exporter'
   metrics_path: /metrics/detailed
   scrape_interval: 15s
   params:
     family: ["queue_coarse_metrics", "queue_consumer_count", "channel_queue_metrics", "channel_queue_exchange_metrics", "channel_metrics", "connection_metrics", "connection_coarse_metrics", "channel_exchange_metrics"]
   static_configs:
     - targets:
       - node1
       - node2
       - node3
   metric_relabel_configs:
     - source_labels: [ 'queue_vhost' ]
       regex: 'queue_vhost="(.*)"'
       action: replace
       target_label: "vhost"
       replacement: "$1"

How to get it?

Seventh State’s RabbitMQ dashboards are available at the Grafana Marketplace:

Get in touch…
Should you have any questions, feedback or enquiries, feel free to contact us at contact@seventhstate.io.

Gabor Olah
Technical Lead | Seventh State

White Paper

Observability in RabbitMQ

Observability is key to unlocking the full potential of RabbitMQ.

Our seasoned RabbitMQ consultants have developed a comprehensive 66 page white paper on effective monitoring for ultimate RabbitMQ performance.

Read now

RabbitMQ Monitoring Tools and the Key Metrics to Monitor

The Two Important RabbitMQ Monitoring Tools

The “Official” Dashboards to Monitor RabbitMQ