RabbitMQ Introducing Khepri
RabbitMQ has been using Erlang’s built in distributed database Mnesia since its inception. Due to its reliance on Mnesia, RabbitMQ was having many limitations and issues, especially around mirroring and high availability. Data safety and high availability are important for many users of RabbitMQ today, migration to Khepri will enable these users to rely on RabbitMQ for even more use cases.
The newly introduced Khepri based metadata database aims to work around these limitations and provide stability for RabbitMQ and its replicated features.
Khepri: A New Dawn
Khepri is built on top of the same underlying technology as quorum queues, which are now a tried and trusted solution to provide data safety for queues in RabbitMQ. However, to make RabbitMQ even more stable, the data about which queues, exchanges, users, etc. exist in the system need to be stored in a highly reliable, consistent manner as well. (The queue and stream contents are stored separately.)
Khepri vs. Mnesia: A Comparative Analysis
Mnesia was built to handle configuration metadata, however the primary goal for it was to replicate more-or-less static data. It is a leaderless database and relies on user configuration to designate the most up to date version of the information.
In distributed systems this can lead to stale reads and inconsistent databases, which were an issue for how RabbitMQ uses Mnesia. Because of its leaderless nature, data can be committed which then will be lost if another leader appears in the system.
Committing a transaction in Mnesia requires all participants to commit as well, which leads to scalability issues. In Mnesia the two phase commit procedure is used, in which the participants need to lock the rows or tables which are involved in the transaction. Under high load this can lead to a lot of contention and transaction restarts which lower the throughput of the database.
Khepri aims to work around all this by using the Raft protocol to replicate changes to peers, and to allow committing a transaction by only the majority of peers. Because Khepri is coordinated by the leader, every transaction is serialised. This serialisation makes transaction handling more efficient and less resource-intensive compared to systems where every node must agree on each transaction.
| Mnesia | Khepri | |
| Requires configuration | No | No |
| Partition tolerance | Restarts on failure | Does not need restart |
| Write consistency | Connected peers | Majority only |
| Read consistency | Connected peers | Majority only |
| Availability | At least one node is running | Majority of nodes must be available |
| Transaction Efficiency | Low | High |
| Split brain | Can happen with partitioned peer groups | Partitioned peers become read only or stop serving requests |
| Large datasets | No | No |
Split brain issues are common with any application which uses Mnesia. RabbitMQ works around this by using its automated partition handling mechanisms, however these are not bullet proof and can lead to data corruption in certain error scenarios and settings. Khepri resolves this by depending on the Raft protocol, which ensures consistency across all available peers, while unavailable peers will stop serving requests.
Neither Mnesia, nor Khepri supports very large datasets, both of them are enough for RabbitMQ’s needs. RabbitMQ does not store the messages in the metadata store. The data is kept in memory, so there is a natural limit on how large datasets can be had, additionally, Khepri needs to update the in memory structures, which can trigger garbage collection. This can lead to increased latency if the data is very large.
Khepri’s Architecture Explained
Khepri is a tree based key-value store. If we want to store a binding for the queue `test-queue`,exchange test-exchange, routing key test-routing-key, in vhost my-vhost we’d insert into the database the following:
khepri:put("my-vhost/test-exchange/test-queue/test-routing-key", BindingData).
The reason for this is because in RabbitMQ, these 4 values make sure that the binding is unique.
In memory, the data is stored in a tree format, where each node can host some payload, though typically only the leaf nodes contain data.
For users of RabbitMQ, there will not be many visible changes in how RabbitMQ works in most scenarios. Khepri will provide the durability and replication of the metadata of the system instead of Mnesia.
The new database of RabbitMQ works very similarly to how a queue behaves today. There is a single leader for the cluster which will handle all consistent data operations and it will take care of notifying the replicas about the changes. Every replica in the system writes the data to disk, as well. When data is updated in Khepri, it will automatically populate local, in-memory cache tables, which can be concurrently accessed, avoiding that the database leader becomes a bottleneck for reads.
Similarly to Quorum queues, when the majority of replicas accept the write, it will be considered committed. This means that, unlike Mnesia, not all writes will be visible on all nodes at the same time. This is not an issue for most applications, but in certain cases, for example high queue churn, this may lead to inconsistent message delivery.
Early Benchmarks and Beta Access
From early benchmarks we can observe that RabbitMQ performs much better under high load or high latency situations when Khepri is in use.
RabbitMQ is loaded during this scenario. If RabbitMQ is not loaded, the difference between Mnesia and Khepri is smaller. As load or latency increases, the difference gets larger.
The difference in performance is mostly because Khepri does not lock rows or tables but serialises all transactions. For RabbitMQ access patterns, this behaves much better.
Khepri will be replacing Mnesia in RabbitMQ 4.0, however you can try the beta right now in RabbitMQ 3.13.x, by enabling the feature flag.
Enabling the feature flag can be done on the Management Interface under Admin / Feature Flags or through the command line by running rabbitmqctl enable_feature_flag khepri_db. To be able to enable this, you must remove any Classic Queue Mirroring policies.
We do not recommend enabling this feature in production, as there is no way back to Mnesia once the data is converted.
Transition from Mnesia to Khepri
Users of RabbitMQ will need to do a careful review before upgrading to RabbitMQ 4.0, where Khepri will be automatically enabled and installations will auto-convert from Mnesia to Khepri.
Most of the changes due to this database migration are not user-facing and client applications do not need to change anything, however RabbitMQ 4.0 will introduce many more breaking changes, which users need to be aware of, such as the full removal of Classic Queue Mirroring, removal of transient and non-exclusive Classic Queues, and metric delivery changes.
More information about the breaking changes can be found here.
Khepri in Action
The main benefit of Khepri will be that users of RabbitMQ will no longer need to worry about network partitions and their effect on Mnesia. This potentially opens the way so that RabbitMQ can be deployed in environments, where until now it was not recommended. Such environments would include places where the networks are not very stable, such as multi data-centre installations.
We anticipate that Khepri will enable the deployment of a greater number of nodes than is currently possible, with many more internal entities, such as bindings.
Conclusion
Replacing Mnesia with Khepri in RabbitMQ is a very big leap towards better performance and better stability. It will allow users to deploy RabbitMQ in many more configurations, however, it will be introducing design changes in many systems as well, due to its quorum based consistency nature.

If you need help with any aspect of your RabbitMQ or you’d simply like a Health Check to see how you’re performing, reach out to myself or any of our seasoned engineers.
Lajos Gerecs
RabbitMQ Consultant, Seventh State

RabbitMQ 4.0 is here!
Are you ready? Let’s find out…
We’ve developed a simple online checker to self assess your readiness for a RabbitMQ 4.0 upgrade. It’s free and simple to use, with instant results.



