Consensus and RAFT’s Log replication
Problems that are related to the Consensus problem:
- The ordering of messages (Make sure that all servers in the group receive the same updates in the same order as each other→Totally ordered multicast)
- The up/down status of a suspected failed process (To keep servers' own local lists where they know about each other, and when anyone leaves or fails, everyone is updated simultaneously→Group membership/Failure detection)
- Who the leader is (Elect a leader among the group of servers and let everyone in the group know about it→Leader Election)
- Who has access to the critical resource (To ensure mutually exclusive access to a critical resource like a file→Mutual Exclusion)
What is Consensus?
So, we have N processes and each process P has input variable $x_p$: initially either 0 or 1 ****and ****output variable $y_p$: initially undecided (can be changed only once)
Consensus problem: Design a protocol that
- all non-faulty processes set their output variables to 0
- Or all non-faulty processes set their output variables to 1
- There is at least one initial state that leads to each outcomes 1 and 2 above
Every process contributes a value. The goal is to have all processes decide same (some)
value and decision once made can’t be changed.
Consensus protocol must have all these three properties:
- Termination (Progress). All non-faulty processes eventually decide on a value
- Agreement. All processes that decide do so on the same value.
- Validity. The value that has been decided must have proposed by some process.
Many problems in distributed systems are equivalent to consensus, such as Perfect Failure Detection, Leader election (select exactly one leader, and every alive
process knows about it) or Agreement (harder than consensus). So consensus is a very important problem.