Technology

The Raft Consensus Algorithm

JavaScript frameworks make development easy with extensive features and functionalities. Here are our top 10 to use in 2022.

Written by

spydraadmin

Published on

January 1, 1970

Last updated on

February 24, 2023

Copy link

Replicated State Machines

Service that is replicated on multiple machines:

All instances produce the same behavior
Fault-tolerant: Available for service as long as a majority of the servers are up

Raft Basics

Leader based:

Pick one server to be a leader
It accepts client requests, manages replication
Pick a new leader if the current one fails

Server states:

Leader
Follower: totally passive
Candidate: used for electing new leaders

Time divided into terms:

Terms have numbers that increase monotonically
Each term starts with an election
One or more candidates attempting to become leader
Winning candidate (if any) serves as leader for the rest of the term
Terms allow detection of stale information
Each server stores current term
Checked on every request
Term numbers increase monotonically
Different servers observe term transitions differently

Request-response protocol between servers (remote procedure calls, or RPCs). 2 request types:

RequestVote
AppendEntries (also serves as heartbeat)

Leader Election

All servers start as followers
No heartbeat (AppendEntries)? Start election:
Increment current term
Vote for self
Send RequestVote requests to all other servers
Election outcomes:
Receive votes from majority of cluster: become leader, start sending heartbeats to prevent additional elections
Receive AppendEntries from some other server in the new term: revert back to follower
Timeout: start a new election
Each server votes for at most one candidate in a given term
Election Safety: only one server can be elected leader in a given term
Availability: randomized election timeouts reduce split votes

Log Replication

Handled by leader
When client request arrives:
Append to local log
Replicate to other servers using AppendEntries requests
Committed: entry replicated by leader to majority of servers
Once committed, apply to local state machine
Return result to client
Notify other servers of committed entries in future AppendEntries requests
Log entries: index, term, command
Logs can become inconsistent after leader crashes
Raft maintains a high level of coherency between logs (Log Matching Property):
If entries in different logs have same term and index, then
They also have the same command
The two logs are identical up through that entry
AppendEntries consistency check preserves above properties.
Leader forces other logs to match its own:
nextIndex for each follower (initialized to leader's log length)
If AppendEntries fails, reduce nextIndex for that follower and retry.
If follower receives conflicting entries but consistency check passes, removes all conflicting entries

Safety

Must ensure that the leader for the new term always holds all of the log entries committed in previous terms (Leader Completeness Property).
Step 1: restriction on elections: don't vote for a candidate unless the candidate's log is at least as up-to-date as yours.
Compare indexes and terms from last log entries.
Step 2: be very careful about when an entry is considered committed
As soon as it is replicated in the majority? Unsafe!
Committed => entry in current term replicated on majority
All preceding entries also committed because of Log Matching Property

Persistent Storage

Each server stores the following in persistent storage (e.g. disk or flash):
Current term
Most recent vote granted
Log
These must be recovered from persistent storage after a crash
If a server loses its persistent storage, it cannot participate in the cluster anymore

Implementing Raft

Every tiny detail matters.

Client Interactions

Clients interact only with the leader
Initially, a client can send a request to any server
If not leader, it rejects request, returns info about most recent leader
Client retries until it reaches leader
If leader crashes while executing a client request, the client retries (with a new randomly-chosen server) until the request succeeds
This can result in multiple executions of a command: not consistent!
Goal: linearizability: System behaves as if each operation is executed exactly once, atomically, sometime between sending of the request and receipt of the response.
Solution:
Client generates unique serial number for each request (client id, request number)
When retrying, client reuses the original serial number
Servers track latest serial number processed for each client, plus associated response (must be stored persistently)
When leader detects duplicate, it sends old response without re-executing request

Other Issues

Cluster membership
Log compaction

Get started with Spydra

Spydra is a next-generation API driven Decentralization-as-a-Service platform to enable business and developers to leverage the full potential of Blockchain and the Decentralized Web Infrastructure.