RabbitMQ Introducing Khepri

RabbitMQ has been using Erlang’s built in distributed database Mnesia since its inception. Due to its reliance on Mnesia, RabbitMQ was having many limitations and issues, especially around mirroring and high availability. Data safety and high availability are important for many users of RabbitMQ today, migration to Khepri will enable these users to rely on RabbitMQ for even more use cases.

The newly introduced Khepri based metadata database aims to work around these limitations and provide stability for RabbitMQ and its replicated features.

Khepri: A New Dawn

Khepri is built on top of the same underlying technology as quorum queues, which are now a tried and trusted solution to provide data safety for queues in RabbitMQ. However, to make RabbitMQ even more stable, the data about which queues, exchanges, users, etc. exist in the system need to be stored in a highly reliable, consistent manner as well. (The queue and stream contents are stored separately.)

Khepri vs. Mnesia: A Comparative Analysis

Mnesia was built to handle configuration metadata, however the primary goal for it was to replicate more-or-less static data. It is a leaderless database and relies on user configuration to designate the most up to date version of the information.

In distributed systems this can lead to stale reads and inconsistent databases, which were an issue for how RabbitMQ uses Mnesia. Because of its leaderless nature, data can be committed which then will be lost if another leader appears in the system.

Committing a transaction in Mnesia requires all participants to commit as well, which leads to scalability issues. In Mnesia the two phase commit procedure is used, in which the participants need to lock the rows or tables which are involved in the transaction. Under high load this can lead to a lot of contention and transaction restarts which lower the throughput of the database.

Khepri aims to work around all this by using the Raft protocol to replicate changes to peers, and to allow committing a transaction by only the majority of peers. Because Khepri is coordinated by the leader, every transaction is serialised. This serialisation makes transaction handling more efficient and less resource-intensive compared to systems where every node must agree on each transaction.

	Mnesia	Khepri
Requires configuration	No	No
Partition tolerance	Restarts on failure	Does not need restart
Write consistency	Connected peers	Majority only
Read consistency	Connected peers	Majority only
Availability	At least one node is running	Majority of nodes must be available
Transaction Efficiency	Low	High
Split brain	Can happen with partitioned peer groups	Partitioned peers become read only or stop serving requests
Large datasets	No	No

Split brain issues are common with any application which uses Mnesia. RabbitMQ works around this by using its automated partition handling mechanisms, however these are not bullet proof and can lead to data corruption in certain error scenarios and settings. Khepri resolves this by depending on the Raft protocol, which ensures consistency across all available peers, while unavailable peers will stop serving requests.

Neither Mnesia, nor Khepri supports very large datasets, both of them are enough for RabbitMQ’s needs. RabbitMQ does not store the messages in the metadata store. The data is kept in memory, so there is a natural limit on how large datasets can be had, additionally, Khepri needs to update the in memory structures, which can trigger garbage collection. This can lead to increased latency if the data is very large.

Khepri’s Architecture Explained

Khepri is a tree based key-value store. If we want to store a binding for the queue `test-queue`,exchange test-exchange, routing key test-routing-key, in vhost my-vhost we’d insert into the database the following:

khepri:put("my-vhost/test-exchange/test-queue/test-routing-key", BindingData).

The reason for this is because in RabbitMQ, these 4 values make sure that the binding is unique.

In memory, the data is stored in a tree format, where each node can host some payload, though typically only the leaf nodes contain data.

For users of RabbitMQ, there will not be many visible changes in how RabbitMQ works in most scenarios. Khepri will provide the durability and replication of the metadata of the system instead of Mnesia.

The new database of RabbitMQ works very similarly to how a queue behaves today. There is a single leader for the cluster which will handle all consistent data operations and it will take care of notifying the replicas about the changes. Every replica in the system writes the data to disk, as well. When data is updated in Khepri, it will automatically populate local, in-memory cache tables, which can be concurrently accessed, avoiding that the database leader becomes a bottleneck for reads.

Similarly to Quorum queues, when the majority of replicas accept the write, it will be considered committed. This means that, unlike Mnesia, not all writes will be visible on all nodes at the same time. This is not an issue for most applications, but in certain cases, for example high queue churn, this may lead to inconsistent message delivery.

Early Benchmarks and Beta Access

From early benchmarks we can observe that RabbitMQ performs much better under high load or high latency situations when Khepri is in use.

RabbitMQ is loaded during this scenario. If RabbitMQ is not loaded, the difference between Mnesia and Khepri is smaller. As load or latency increases, the difference gets larger.

The difference in performance is mostly because Khepri does not lock rows or tables but serialises all transactions. For RabbitMQ access patterns, this behaves much better.

Khepri will be replacing Mnesia in RabbitMQ 4.0, however you can try the beta right now in RabbitMQ 3.13.x, by enabling the feature flag.

Enabling the feature flag can be done on the Management Interface under Admin / Feature Flags or through the command line by running rabbitmqctl enable_feature_flag khepri_db. To be able to enable this, you must remove any Classic Queue Mirroring policies.

We do not recommend enabling this feature in production, as there is no way back to Mnesia once the data is converted.

Transition from Mnesia to Khepri

Users of RabbitMQ will need to do a careful review before upgrading to RabbitMQ 4.0, where Khepri will be automatically enabled and installations will auto-convert from Mnesia to Khepri.

Most of the changes due to this database migration are not user-facing and client applications do not need to change anything, however RabbitMQ 4.0 will introduce many more breaking changes, which users need to be aware of, such as the full removal of Classic Queue Mirroring, removal of transient and non-exclusive Classic Queues, and metric delivery changes.

More information about the breaking changes can be found here.

Khepri in Action

The main benefit of Khepri will be that users of RabbitMQ will no longer need to worry about network partitions and their effect on Mnesia. This potentially opens the way so that RabbitMQ can be deployed in environments, where until now it was not recommended. Such environments would include places where the networks are not very stable, such as multi data-centre installations.

We anticipate that Khepri will enable the deployment of a greater number of nodes than is currently possible, with many more internal entities, such as bindings.

Conclusion

Replacing Mnesia with Khepri in RabbitMQ is a very big leap towards better performance and better stability. It will allow users to deploy RabbitMQ in many more configurations, however, it will be introducing design changes in many systems as well, due to its quorum based consistency nature.

Lajos Gerecs | RabbitMQ Consultant, Seventh State

If you need help with any aspect of your RabbitMQ or you’d simply like a Health Check to see how you’re performing, reach out to myself or any of our seasoned engineers.

Lajos Gerecs
RabbitMQ Consultant, Seventh State

RabbitMQ 4.0 is here!

Are you ready? Let’s find out…
We’ve developed a simple online checker to self assess your readiness for a RabbitMQ 4.0 upgrade. It’s free and simple to use, with instant results.

RabbitMQ 4.0 Readiness

RabbitMQ Introducing Khepri

Khepri: A New Dawn

Khepri vs. Mnesia: A Comparative Analysis

Khepri’s Architecture Explained

Early Benchmarks and Beta Access

Transition from Mnesia to Khepri

Khepri in Action

Conclusion

RabbitMQ 4.0 is here!

Like this:

🟠 Request a RabbitMQ Health Check >

🔵 Take me to the Free 4.0 Self Assessment Tool >

🟣 Explore RabbitMQ Support Packages >

🔴 Talk to us about your Legacy RabbitMQ >

🟢 Talk to us about your Compliance needs >

RabbitMQ Introducing Khepri

Khepri: A New Dawn

Khepri vs. Mnesia: A Comparative Analysis

Khepri’s Architecture Explained

Early Benchmarks and Beta Access

Transition from Mnesia to Khepri

Khepri in Action

Conclusion

RabbitMQ 4.0 is here!

Like this:

Discover more from SeventhState.io