RabbbitMQ: Quorum Queues
RabbitMQ launched new version 3.8 with some interesting new features with main focus on reliability, operations, and observability. Main highlights are -
- Quorum Queues
- Feature Flags
- Prometheus and Grafana Monitoring Support
- OAuth 2.0 Support
- Single Active Consumers
This blog will cover Quorum Queues. If you wish to read more new releases in RabbitMQ 3.8, please refer here.
What are Quorum Queues?
Quorum is the minimum number of voting members who must be present to conduct business in the name of the group. Quorum queues are built on this notion. This means that the majority of mirror(replica) nodes agree on the state of the queue and its contents. Quorum queues are durable, replicated FIFO queue based on RAFT Consensus algorithm.
RAFT Consensus Algorithm
Consensus is a fundamental problem in fault-tolerant distributed systems. It involves multiple servers agreeing on the same information. Typical consensus algorithm makes progress when the majority of the servers are available.
RAFT is a distributed consensus algorithm that solves the problem of getting multiple servers to agree on a shared state even in the face of failures. RAFT states that each node can stay in any of the three states- Leader, Candidate, and Follower. RAFT works by electing a leader in the cluster.
Leader is responsible for accepting client requests and managing the replication of the log to other servers. The data flows only in one direction: from leader to follower. A Follower only responds to the leader or candidate(s). Any request to follower node is redirected to the leader node. Candidate can ask for votes to become the next leader in case of leader’s unavailability.
RAFT consensus algorithm works broadly in 2 stages:
- Leader Election: A new leader needs to be elected in case of failure of an existing one. Election is initiated when a follower node times out while waiting for a heartbeat from the leader node. At this point, the follower node becomes a candidate node and requests other follower nodes to vote for the new leader. Once any candidate receives votes from the majority of nodes, it becomes the new leader.
- Log Replication: The leader needs to manage log sync between all the follower nodes through replication. A log entry is composed of —
- Command: Specified by the client to execute
- Index: Position of log entry of the node
- Term Number: To ascertain the time of entry of the command
To understand the detailed working of RAFT consensus algorithm, please refer —
- GeekForGeeks — Very technical perspective
- FreeCodeCamp — Nicely written academic paper summary
- PyGotham — Video resource
Quorum Queues: Alternative of Mirrored Queues
Mirrored Queues in RabbitMQ are solution for High Availability (HA). There is a queue master and then there are multiple mirrors(replicated nodes). Any message written to the queue master is replicated on mirrors. In the event of the loss of a broker, a mirror is promoted to master and the queue is available without loss.
What is the problem with Mirrored Queues?
One of the main problems with mirrored queues is blocked synchronization. If a mirror node is lost, after it comes back online, it tries to synchronize with the queue master. There are two possible ways to do that
- Forced Synchronization — During the sync, the queue is unavailable. No new messages are processed or accepted until the sync is complete.
- Automatic Synchronization — Mirror node is in “sync” again, once all the new messages that queue master received during the mirror’s downtime are processed.
Also, when a mirror node goes down, all the messages are lost. As mentioned in the documentation —
They are also never durable (even if declared as such).
Quorum queues completely avoid this issue by not throwing away any data and providing non-blocking synchronization as it is built on the RAFT consensus algorithm.
When to use Quorum Queues
Quorum queues are purpose-built. They are good for systems that require fault-tolerance and data safety, compared to systems that require low latency and advanced queue features. For example, high-frequency order management systems can benefit from quorum queues as no order loss and data persistence is the key requirement. Whereas quorum queues might not be a good fit for instant messaging systems. Read up more for yourself on the RabbitMQ official page or CloudAMQP Blog.
Any suggestions or thoughts, let me know:
Insta + Twitter + LinkedIn + Medium | Shivam Aggarwal