RabbitMQ’s Anatomy – Understanding Topic Exchanges

Mar 18, 2024

In RabbitMQ (and AMQP in general), an exchange is the abstraction and routing entity to which messages are published to, from external connecting applications. There are four kinds of exchange types, one of which is the TOPIC exchange. Amongst these four exchange types, the TOPIC exchange offers the most flexible routing mechanism. Queues are bound to TOPIC exchanges with the routing key. This is a pattern matching process which is used to make decisions around routing messages. It does this by checking if a message’s Routing Key matches the desired pattern.

A Routing Key is made up of words separated by dots, e.g.: floor_1.bedroom.temperature. The Binding Key for Topic Exchanges is similar to a Routing Key, however there are two special characters, namely the asterisk * and hash #. These are wildcards, allowing us to create a binding in a smarter way. An asterisk * matches any single word and a hash # matches zero or more words. Here are examples of patterns to match on messages from:

– all devices on first floor in bedroom: floor_1.bedroom.*,
– all devices on first floor floor_1.#.

It’s clear that using a Topic Exchange allows for much simpler and more specific Routing. Now let’s take a look at how a Topic Exchange actually works.

Trie

Understanding how a trie data structure is a key to understanding how a Topic Exchange works. The trie data structure is a tree that holds ordered data and is typically used for storing string values. Each node in the tree represents a prefix of a string and holds links to child nodes, which share the same prefix. The child nodes are addressed with the character following the prefix for the node.

This data structure allows us to search for specific strings independent of the structure’s size. The characters of a string are used to traverse the tree while searching it. The trie is used for storing Binding Keys in a Topic Exchange. A Binding Key is split by dots and all the string parts are used as pointers to the next node. Every time a new binding is added to a Topic Exchange, the trie associated with it is updated. And each time a new message has to be routed, the trie is queried to look for the message’s destinations.

Implementation of the trie

A Topic Exchange trie is implemented on top of Mnesia, the built-in distributed database of Erlang. Nodes and edges of a trie are stored in rabbit_topic_trie_node and rabbit_topic_trie_edge tables respectively.

#topic_trie_node{
    trie_node = #trie_node{
        exchange_name,
        node_id
    },
    edge_count,
    binding_count
}
#topic_trie_edge{
    trie_edge = #trie_edge{
        exchange_name,
        node_id, % parent
        word
    },
    node_id % child
}

In this case, trie_node or trie_edge records are the primary keys used to identify records. Both nodes and edges are assigned to one particular Topic Exchange by specifying an exchange_name field in primary key. Nodes are also used to identify bindings, because they are not stored directly in the nodes table you will need to obtain them using node_id from the rabbit_topic_trie_binding table. Edges store information about the connections between parent and child nodes. Edges also contain the part of the Binding Key (word), which is used to traverse the tree. Therefore, traversing through the tree requires a sequence of Mnesia queries.

Topic Exchange internals

An Exchange type is created by implementing the rabbit_exchange behaviour. In the context of tries in a Topic Exchange, the are two interesting operations. Namely, add_binding/3 and route/2, the first implements the adding of a new binding to the structure and the latter is used to determine target for routing.

Binding Operation

The arguments needed to create a binding are:

– source Exchange
– Binding Key
– destination

Every trie starts with the root node,this represents the empty Binding Key. It makes sense, as an empty string is a prefix for any string. The first operation is pretty straightforward – the Binding Key has to be split by dots . and stored in a list. For example the key “a.b.c” will be transformed to [“a”, “b”, “c”]. Let’s call the list Words for later. It will be used for traversing the data structure. Then, recursively the tree is traversed down, starting with root as a current node.

Repeat until the Words list is empty. 1.1. Take the head part of Words list and query Mnesia for child matching it. 1.2. If node is found, use it as a new current node and go to 1.1 with the rest of Words list. Otherwise go to 2.
Create child nodes using rest of the Words list.
When Words list is exhausted, create a rabbit_topic_trie_binding for the current node. It signals that there are bindings associated with it.

Here is an example binding operation. Let’s assume there is a Topic Exchange with two existing bindings: floor_1.bedroom.temperature and floor_1.#. our example trie structure would look like this:

Let’s add a new binding with the Binding Key floor_1.bedroom.air_quality. First we split it with dots: [floor_1, bedroom, air_quality]. There are already keys floor_1 and bedroom, but the latter one is missing. Therefore a new node has to be created. Then, the rest of the key [air_quality] is used to create nodes. Finally a new binding is associated with the newly created node and we have a structure that looks like this:

As you can see, to insert the new node, three read queries were executed to retrieve the edge between the last pair of nodes: {root, floor1}, {floor_1, bedroom} and {bedroom, air_quality}. However, the latter edge was not present, so 2 write operations were executed: the first updates the edge_count for the bedroom node and the second inserts a new edge. At this point the trie structure is ready to create the final node. Therefore, another two write operations will need to be completed:

– One to create the actual node, which corresponds to the given Binding Key,
– The second to create the entry in rabbit_topic_trie_binding, which bounds the node with the destination for messages.

The Mnesia tables used here are an ordered_set type, which means that it is implemented with a binary search tree. Thus, both read and write operations have complexity O(log(n)), where n is the size of the table. It can be observed, that first phase of traversing through the trie requires:

read operation when, a node exists
write operation when a node is not existing.

The final phase of inserting the actual binding requires two extra operations. In a worst case scenario, where there are no nodes and all of them need to be created, the complexity is O(n*2*log(m) + 2*log(k)), where n is the length of the Binding Key, m is the number of nodes in the table and k is number of actual bindings. The m and k are global, so the efficiency of queries depends on the global number of bindings/nodes, not just for the given Exchange. For simplicity it is assumed that the number of edges and nodes are equal, because in this structure (number of edges) = (number of nodes – 1).

Routing Operation

Routing happens when a new message arrives to a Topic Exchange. The trie structure needs to be queried using the Routing Key associated with the message. However, traversing through the trie is not straightforward, as wildcards * and # need to be taken in to account.

As with a binding operation, the beginning of the Routing Key is split by dots. Again, let’s call it Words list [ Word | RestW ] = Words. The process starts with a root node. Then the algorithm of discovering the binding is a recursive exploration of the tree in three ways:

– Look for # in child nodes. If node is found, it is considered as new root and new scans are started with all remaining parts of Words, e.g: if Words is [a,b,c], then start searching again with [a,b,c], [b,c], [c] and [].
– Look for * in childs nodes. If node is found, continue with found node as a root and RestW.
– Look for Word in child nodes. If node is found, continue with found node as a root and RestW.

Exploring will be finished for all cases when all pathways in Words is exhausted. The last step in this process is to look for extra child nodes connected through any of the hash # wildcards. This has to be done because the # wildcard stands for “zero or more”. So here is an example of the new search algorithm. Let the Topic Exchange have the following bindings: floor_1.*.air_quality, floor_1.bedroom.air_quality, floor_1.bathroom.temperature. Now, let’s examine the routing for message published with the Routing Key floor_1.bedroom.air_quality, which will match all the bindings. Here is the trie representation, where the current node is marked as blue and the number on the node represents the number of bindings.

The first step is to find out if there is a hash # child node. in the example above it is not present. Then, the asterisk * child node is queried, but it is also not present. Finally, the algorithm will find a node, matching the head of Words list – floor_1:

Now, the algorithm will consider the blue node as a new root and start again. Again, there is no hash # child, but an asterisks is found. Then, the head of the Words list is consumed and the algorithm moves down:

Here there is only one option available – air_quality:

The Words list is exhausted, so the current node is a result. There is one extra step – the hash # child node has to be queried again, because it also accepts empty lists. However, it is not found, so only the current blue node is considered to be a result. Let’s mark the found node with a green and get back to previous node:

The node was found using an asterisk, but there is one step left. It has to be checked to see if there is a bedroom child node. And, in this case there is one:

There is one word left and the child node is present:

The Words list is empty again, so current is result:

The final step is to query for any bindings associated with the bindings we found. According to the numbers on the found nodes, there are two bindings. They are the final result of the route/2 function.

Now, let’s consider another example, with hash bindings present. There are three bindings present: #.air_quality , floor_1.# and floor_1.bedroom.air_quality.#. Again the floor_1.bedroom.air_quality Routing Key will be used:

In this example we have found a hash node. This will cause the algorithm to go down to that node with all available Routing Key parts:

Let’s emphasise this again: the current node was reached via # edge, so the algorithm visits the current blue node 4 times with different Words lists. They are presented on the figure. One of the Words list is an empty one [], so this node is also appended to the results. There is no floor_1 or bedroom edge going out of this node, but there is air_quality one. So, the algorithm goes to the leaf node using the third Words list. Then:

The current Words list is empty, so this node is also a result of the search. There are no hash child nodes, so the current branch is finished. The algorithm will go back to the root node:

The only option for the algorithm to go down is the head of the Words list:

Again there is a hash child node, so it needs to be visited with all tails of the Words list. Three times in this case:

One of the Words lists is empty, so the current blue node is appended to the result list. As there are no child nodes, the algorithm goes back:

Now the algorithm will go down two times consuming all remaining words. Let’s jump directly to it:

The Words list is empty, so the current node is also part of the result. However there is also a hash # child node. According to the algorithm, if any node is considered as a result, its child hash nodes are matched. So finally there are 5 nodes found:

The final step is to find bindings associated with found nodes. The final results of route/2 function are all 3 bindings.

In term of complexity it is hard to estimate precisely. The algorithm does 3 queries for each node. The # nodes result in duplication of query paths, as it starts the whole algorithm with all remaining parts of Words. However all operations are depending on two factors – the Words list length and the total number of nodes existing in the table.

So assuming the worst case, where the bindings are: #, #.#, #.#.# ... k*#, we can see that each level will run with all possible combinations of Words, some of them will be visited many times with exactly the same Words. Then, the first node is visited n times, second is visited sum(1,n), third sum(1,sum(1,n)) and so on. We can rewrite it as:

The total number of operations is k1+k2+…+kk. When this recursive equation is unwrapped, every new level contains two times more multiplications than the previous one. The level k will contain 2k multiplications. It will be dominant in terms of complexity, so we can bound the complexity by O(2k*n*log(m)), where k is maximum trie depth, n is the length of the Words and m is the total number of nodes.

However, the above example is extreme and bindings like #.#.# make no sense. Then, the average complexity would be close to O(nlog(m)), because it makes no sense to put two subsequent hash # parts of the key. The overhead introduced by a single hash node should not be significant, because in such case the traversing trie with different Words stops after the hash edge.

Evaluation

This section will cover the performance of a Topic Exchange. The experiments will demonstrate the characteristics of a Topic Exchange under different conditions.

Synthetic tests

Two experiments will be presented, in order to experimentally confirm or reject the assumptions made in the previous Routing Operation section:

First, the relation between the number of bindings and routing operation time. The bindings are fixed and the routing key length is adjusted. The linear dependency is expected.
Secondly, the routing key length is fixed and the number of bindings is varying.

Both experiments are performed under following conditions:

– Tests are made on single RabbitMQ node.
– Routing time is measured by checking time of evaluation rabbit_exchange_type_topic:route/2.
– Measurements are made 50 times and average results are presented on the figures.
– The Bindings Keys are random, ensuring that there are no two subsequent hashes in any Binding Key.
– The Binding Key part has 70% chance to be a word, 20% chance to be an asterisk and 10% to be a hash.
– The Routing Keys are created from existing Binding Keys – For example the Routing Key with length n, will be created from existing Binding Key with the length n. Any hashes or asterisks, are replaced by random strings. It ensures that operation must traverse through at least n levels of trie.

The above figure presents the results of the three experiments. Now, let’s slightly modify the conditions to visualise the impact of a # hash key in Trie structure. There is only one binding added, which is just two subsequent hashes #.#. Then, the performance will look like this:

The red curve bends, as we expected. When there are more # bindings on the query path, the relation between the Routing Key length and query time is no longer linear. This effect can also be observed in 10k bindings series – the green curve also bends slightly. This can be explained in the same way – there are more Bindings Keys starting with a #, this increases query time for all queries.

Let’s check it in the RabbitMQ Management UI:

Here we have roughly 50 bindings like the ones above, if we replace them we will see a more linear relation and get a better overview of the impact of the performance from our hash # as seen below:

Again, the time to find relevant routes has improved. Now, let’s examine the way the number of bindings impacts query time. As we explained in the previous section, a logarithmic relation is expected:

This example also follows the expected behaviour. All the bindings are stored in a single Mnesia table. Querying any node has its own complexity. Where there are more entries in the table, the query time grows. As the table has an ordered_set type, the query time has logarithmic complexity, what is actually observed.

Summing up, the previous experiments align to the theory we started with. The expectations about the impact of Route Key length and number of bindings to the routing operation time was confirmed. The huge impact of hash a # wildcard has also been confirmed and the scale of it was presented.

Real-world example

The two previous examples measured the time of a single query. While this is still valuable, it does not necessarily reflect a real world use case. The test is synthetic and focuses on a single query, but is a Topic Exchange performance overhead also observable when the overall performance of RabbitMQ is taken into account?

This section will present a performance evaluation of RabbitMQ integration to MongooseIM. MongooseIM is Erlang Solutions’ highly scalable, instant messaging server. The RabbitMQ component in MongooseIM simply reports each users’ activity, which may be:

– User became online/offline
– User sent/received message

Only sent/received messages will be discussed in this case. The Routing Key of the message activity follows simple pattern <username>.{sent, received}.

In order to evaluate the performance of the component, there was a load test designed. There were simulated XMPP clients connecting to the MongooseIM server. The simulated clients were exchanging messages with each other. Each message generated events, which were published to RabbitMQ. Then, we had a number of AMQP clients connecting to RabbitMQ, to consume generated events.

This is the outline of the experiment’s architecture:

For the purpose of this post, only results which were directly connected to a Topic performance were covered. Let’s define the key performance indicator as the Time To Delivery, the amount of time between a message being sent by a XMPP user and being received by Consumer of RabbitMQ’s queue. This value will be presented in the figures to follow.

Tests conditions were as follows:

– 90k XMPP users
– 10k AMQP consumers
– ~ 4,8k payloads/s from MongooseIM to RabbitMQ
– Payload size ~120B
– topic exchange with Binding Keys like user_N.*
– 20k bindings in total

In this case, the performance is presented in the following graph. It shows the 95th – 99th percentiles of the Time To Delivery, as well as the maximum observed Time To Delivery in a given time window.

The latter test had similar condition. The only difference was different Exchange type:

– 90k XMPP users
– 10k AMQP consumers
– ~ 4,8k payloads/s from MongooseIM to RabbitMQ
– Payload size ~120B
– direct exchange with Binding Keys like user_N.chat_msg_sent, user_N.chat_msg_recv
– 40k bindings in total

Under those condition performance was better, which is illustrated on following figure.

While the previous section showed the performance characteristics of a Topic Exchange, those examples provide an overview on a bigger scale. Both tests had identical characteristics apart from the exchange type being direct or topic (and consequently number of bindings). However, the difference in performance is significant in favor of the Direct Exchange. It allowed us to effectively decrease the Time To Delivery, which is a factor of efficiency in the case of the presented tests.

Summary

Now you’ve seen the basics of the Topic Exchange internals, a brief overview of its implementation, a theoretical overhead introduced by traversing the trie structure, as well as some performance evaluation. As it was observed, a Topic Exchange is not the fastest method and there are many factors which may influence its performance. However it is not true that Topic Exchanges are slow. In fact, they are generally fast in a typical RabbitMQ usage. These test conditions were specific. If there are a few bindings or the trie depth is not deep, the Topic Exchange overhead is usually negligible. Still, it is important to understand the underlying mechanism, as the example with MongooseIM’s RabbitMQ component presented – using different Exchange types resulted in a significant improvement in performance.

Learn more about our work with RabbitMQ.